EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 19 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/42927 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Tech Talk12 / 19
00:00
PunktwolkeVirtuelle RealitätMeterPhasenumwandlungArithmetisches MittelVorgehensmodellDienst <Informatik>AnalysisDatenmodellSystemprogrammierungQuellcodeKollaboration <Informatik>ComputersicherheitDesintegration <Mathematik>SupercomputerInformationsspeicherungSoftwareEndliche ModelltheorieCodeDatentypInteraktives FernsehenProgrammierumgebungSkriptspracheProzess <Informatik>MultiplikationArchitektur <Informatik>VertikaleSkalierbarkeitCloud ComputingStandardmodell <Elementarteilchenphysik>InformationsspeicherungIntegralDienst <Informatik>DatenstromDifferenteSoftwareVollständigkeitBetriebsmittelverwaltungÄhnlichkeitsgeometrieQuick-SortDatenmissbrauchMomentenproblemBenutzerbeteiligungNotebook-ComputerMobiles EndgerätProzess <Informatik>Gemeinsamer SpeicherMinkowski-MetrikProdukt <Mathematik>ProgrammierumgebungKollaboration <Informatik>CASE <Informatik>PunktwolkeSchnittmengeProjektive EbeneQuaderSkalierbarkeitPhysikalisches SystemMIDI <Musikelektronik>eCosZentralisatorDeskriptive StatistikGrenzschichtablösungVirtualisierungCodeSensitivitätsanalyseComputersicherheitBootenDatenverwaltungProgrammierungSkriptspracheMereologieWellenpaketSoftwareentwicklerEndliche ModelltheorieComputerarchitekturLoginAnalysisSystemplattformBitFeuchteleitungDifferenzkernStreaming <Kommunikationstechnik>EntscheidungstheorieGüte der AnpassungEinflussgrößeEnergiedichteZusammenhängender GraphDatentransferEigentliche AbbildungART-NetzFaserbündelInstantiierungHilfesystemMultiplikationsoperatorMinimumSupercomputerEinfach zusammenhängender RaumOrtsoperatorKeller <Informatik>SystemzusammenbruchPortal <Internet>Software Engineering
Transkript: Englisch(automatisch erzeugt)
00:01
Hello, good afternoon, everyone. Yeah, as Andrew already introduced, it's about the EcoCloud or EcoScience Research Data Cloud here. It's basically kind of a big project, not just technical development, so we have with the Department of Environmental and
00:23
Energy, they are developing the essential environmental measures which we want to make available that there is a whole lot of work streams around species trade data, making those available, getting access to the daily spatial weather grids and other existing data streams.
00:42
Then there's a whole work bundle around standardized modeling and analysis capability, which came out of PCL, basically, to have a robust set of well-defined and reliable models. The technique of the most technical side for us is the cloud platform, which gives you access to compute and
01:06
research and makes it easy to access your data. Log in, go on and start working without caring about where the data comes from and what else is otherwise set up within the level as possible.
01:23
Another big part is about training and skills development. This will be part of the EcoAdd and EcoPathways programs. That bit will develop a whole lot of training material, course material, runs workshops and similar things and also engages industry. And the last stream is about trusted data.
01:44
So that's I'm not too familiar with that bit. It's more about the technical side, but it's more about knowledge about reliable data, where does it get from, who is using it, and which data can I use to do proper decision making.
02:06
A bit of history about it. So we've released PCL about three to four years ago. It's been very well received. It gave us easy access to data and modeling and reduced technical barrier to access data and use it.
02:23
What we have found though is that there's an increased demand on customizing the work and having an interactive workflow to work with models and data. Another big use case is users want to use sensitive data, which can't be released publicly anywhere.
02:44
Many more users are coming on and they want to use their own data, which they have compiled or retrieved from somewhere, which is publicly inaccessible. Another big thing will be that we'll try to increase the interoperability with our systems.
03:03
And all of these we took together and all the lessons we have learned through PCL and we're trying to put into the EcoCloud platform. So the biggest challenges with the EcoSciences is that it's a very diverse community
03:21
and it's a very diverse set of data we're going to integrate or our users want to use. So it's about spaces data, climate data, marine data, all described in different ways and often doesn't make sense to an early career researcher who comes out from an
03:40
ecology background. Another big thing is portability. If you develop something, share your code, it should be able to run anywhere, not just within the EcoCloud platform. As I said, we want to provide the ability to work with sensitive data. So security is a big thing here.
04:05
Data discovery, data access and data usage is still a very big challenge while their work with data description and data portals has been moved forward very well over the last couple of years accessing actually the data and using it is still a bit of a challenge.
04:28
Then we want to provide some way to work with large data sets which I possibly download or require some HPC environment and have some way to get easy access and integrate projects.
04:46
And as our users are using their own private data often, we need to integrate CloudStor, NextCloud, Anet, CloudStor or Dropbox and similar things.
05:04
The users we are targeting are researchers, academics, undergraduates, postgraduates, definitely high-end users who know how to code and use HPC environments but still want to share their work. Big data users
05:21
will be included as well. And of course to help with collaboration there should be some useful opportunities to collaborate with proper software engineers as well to optimize your code, help with coding or even produce new third-party products on top of Ecogab. The opportunities there of course research, science,
05:52
the ability to publish your code on models you're developing along with your data you've used, various workshops, software carpentry, data carpentry, provide resources to run curriculums
06:06
and over the research community. So on the technical side there are being three components developed. So the EcoCloud Drive, that's your
06:22
easy cloud managed online storage for code, scripts, small data sets that will always be there. So even if everything crashes next time you come in your last piece of work is still available. The EcoCloud Explorer which we hope will increase
06:41
our help with data discovery and access will mostly be powered by CSIRO knowledge network and our work there will be to provide help with using the actual data through for instance providing code snippets. And of course there will be the compute side of it.
07:01
So at the moment we have opted for providing Jupyter notebooks with R, Python and RStudio already available and also access to virtual desktop tools at the moment provided by COSRO which is a turn project. Three minutes Gerhard. Sorry, yeah, three.
07:22
So our big architecture is sort of a user comes in there will be a dashboard from which you can access everything. There's a separate centralized user management. So all services are independent, you don't know about the users, but they use this user management service to identify and get access to various things.
07:44
From there on you can publish clone scripts from other users, find out scripts, browse through examples how to use various data stores or use certain processing. You can explore data, try to get the data in in an easy way.
08:03
Start a new project that essentially gives you your compute environment to pick whatever you want. R, Python, virtual desktop and use the tools that are available there. These environments are usually also customizable so you can install your own software as well.
08:25
And our tech stack it's a fully it's a microservice architecture fully OpenID Connected and OAuth2 enabled so all APIs that will be offered and are available will also be accessible through third party systems.
08:42
Our development itself is mostly in Python and JavaScript. It's been designed to be horizontally scalable because that's just easier. Vertical scalability is still there, but there are challenges with resource allocations sometimes. And the whole system also allows to go multi-node, multi-cloud
09:06
with the work. And the way we have built it up is at the bottom there's of course OpenStack provided by Connected Cloud with all the services we are using. The Greybox, Sahara, Nokia and Trove we are currently not using but certainly looking into it how we can make them useful to our users.
09:26
On top of that, the whole OpenStack infrastructure is managed by Kubernetes which takes care of orchestration, security, all sorts of monitoring and due to that we can do wire Kubernetes, we can up-scale everything.
09:44
Kubernetes itself orchestrates everything that runs within EcoCloud. So there's our the Jupyter notebooks, there will be web processing services, positive web UI, the various tools we developed that provide APIs
10:02
and Coursera Virtual Desktop sits a bit outside of it, but will seamlessly integrate with the EcoCloud. And all the work we are deploying here and that's running within the EcoCloud allows you also to access external services. At the moment we are
10:23
integrating, we're providing tight integration with Knowledge Network, various data services like data.graph.au and other things. CloudStories as mentioned before, your Google Drive, your Dropbox and external web processing services that would be of high interest for us just to offload data transfer and compute services.
10:46
And that's pretty much it for me. Thank you very much. Okay. Thank you.