EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18 - TIB AV-Portal

EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18

00:00

2

Zugehöriges Material

Australian Research Data Commons (ARDC)

Formale Metadaten

Titel

EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18

Serientitel

Anzahl der Teile

19

Autor

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/42927 (DOI)

Herausgeber

Australian Research Data Commons (ARDC)

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Webinar/Tutorial

Abstract

Gerhard Weis from Eco DEVL. (1.6.18) In June & July's TechTalk events, representatives from each of the DEVLs will introduce their projects from a developers' perspective: the problem the project is trying to solve, the tech stacks deployed and to be developed, the approaches of their software development and community engagement while developing tools and applications. The demonstration/discussion of 8 projects are scheduled as follows: DEVL #1 June (1st): Astronomy (Robert Shen), Marine Science (Roger Proctor), Terrestrial Ecology (Gerhard Weis), and Climate (Clare Richards).

Tech Talk12 / 19

1

23:02

Exploring GLAM data (with Jupyter notebooks) - Sept 18

2

18:58

The Prosecutions Project - Sept 18

3

17:29

Using NetCDF in Jupyter notebooks - Oct 18

4

21:33

Scientific Data in the Cloud - Oct 18

5

16:53

ESIP EnviroSensing Cluster Pt. 2 - Cluster Projects & Highlights - Nov 18

6

08:02

ESIP EnviroSensing Cluster Pt. 1 - Cart before the horse: system QA and data QC practices for sensor networks - Nov 18

7

06:04

ESIP EnviroSensing Cluster Pt. 4 - An Integrated Sensor Data Management System (ISDMS) - Nov 18

8

17:41

ESIP EnviroSensing Cluster Pt. 3 - Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) - Nov 18

9

11:41

Marine Data enhanced Virtual Laboratory DEVL #1 - June 18

10

11:21

Humanities, Arts and Social Sciences DEVL #2 - July 18

11

13:20

Geoscience DEVL #2 - GeoDEVL - July 18

12

10:50

EcoCloud DEVL #1 - EcoScience Research Data Cloud & Data Enhanced Virtual Laboratory (RDC & DEVL) - June 18

13

13:43

Climate Science Data Enhanced Virtual Laboratory #1 - June 18

14

14:46

Technology of the Characterisation Virtual Laboratory (C-DEVL project) #2 - July 18

15

15:44

[Molecular] Bioscience DEVL + RDC Projects #2 - July 18

16

13:27

Astro DEVL #1: ASVO - MWA Node - June 18

17

31:06

ESIP Information Quality Cluster: Vision, Objectives, Accomplishments and Status - March 2019

18

13:03

Australia National Computational Infrastructure - Implementing a Data Quality Strategy to simplify access to data - March 2019

19

10:28

ESIP Information Quality Cluster - A Brief Overview of Maturity Models for Consistemt Data Quality Ratings - March 2019

Automatisches Abspielen

Sprache

Text

Bild

00:00

PunktwolkeVirtuelle RealitätMeterPhasenumwandlungArithmetisches MittelVorgehensmodellDienst <Informatik>AnalysisDatenmodellSystemprogrammierungQuellcodeKollaboration <Informatik>ComputersicherheitDesintegration <Mathematik>SupercomputerInformationsspeicherungSoftwareEndliche ModelltheorieCodeDatentypInteraktives FernsehenProgrammierumgebungSkriptspracheProzess <Informatik>MultiplikationArchitektur <Informatik>VertikaleSkalierbarkeitCloud ComputingStandardmodell <Elementarteilchenphysik>InformationsspeicherungIntegralDienst <Informatik>DatenstromDifferenteSoftwareVollständigkeitBetriebsmittelverwaltungÄhnlichkeitsgeometrieQuick-SortDatenmissbrauchMomentenproblemBenutzerbeteiligungNotebook-ComputerMobiles EndgerätProzess <Informatik>Gemeinsamer SpeicherMinkowski-MetrikProdukt <Mathematik>ProgrammierumgebungKollaboration <Informatik>CASE <Informatik>PunktwolkeSchnittmengeProjektive EbeneQuaderSkalierbarkeitPhysikalisches SystemMIDI <Musikelektronik>eCosZentralisatorDeskriptive StatistikGrenzschichtablösungVirtualisierungCodeSensitivitätsanalyseComputersicherheitBootenDatenverwaltungProgrammierungSkriptspracheMereologieWellenpaketSoftwareentwicklerEndliche ModelltheorieComputerarchitekturLoginAnalysisSystemplattformBitFeuchteleitungDifferenzkernStreaming <Kommunikationstechnik>EntscheidungstheorieGüte der AnpassungEinflussgrößeEnergiedichteZusammenhängender GraphDatentransferEigentliche AbbildungART-NetzFaserbündelInstantiierungHilfesystemMultiplikationsoperatorMinimumSupercomputerEinfach zusammenhängender RaumOrtsoperatorKeller <Informatik>SystemzusammenbruchPortal <Internet>Software Engineering

Transkript: Englisch(automatisch erzeugt)

00:01

Hello, good afternoon, everyone. Yeah, as Andrew already introduced, it's about the EcoCloud or EcoScience Research Data Cloud here. It's basically kind of a big project, not just technical development, so we have with the Department of Environmental and

00:23

Energy, they are developing the essential environmental measures which we want to make available that there is a whole lot of work streams around species trade data, making those available, getting access to the daily spatial weather grids and other existing data streams.

00:42

Then there's a whole work bundle around standardized modeling and analysis capability, which came out of PCL, basically, to have a robust set of well-defined and reliable models. The technique of the most technical side for us is the cloud platform, which gives you access to compute and

01:06

research and makes it easy to access your data. Log in, go on and start working without caring about where the data comes from and what else is otherwise set up within the level as possible.

01:23

Another big part is about training and skills development. This will be part of the EcoAdd and EcoPathways programs. That bit will develop a whole lot of training material, course material, runs workshops and similar things and also engages industry. And the last stream is about trusted data.

01:44

So that's I'm not too familiar with that bit. It's more about the technical side, but it's more about knowledge about reliable data, where does it get from, who is using it, and which data can I use to do proper decision making.

02:06

A bit of history about it. So we've released PCL about three to four years ago. It's been very well received. It gave us easy access to data and modeling and reduced technical barrier to access data and use it.

02:23

What we have found though is that there's an increased demand on customizing the work and having an interactive workflow to work with models and data. Another big use case is users want to use sensitive data, which can't be released publicly anywhere.

02:44

Many more users are coming on and they want to use their own data, which they have compiled or retrieved from somewhere, which is publicly inaccessible. Another big thing will be that we'll try to increase the interoperability with our systems.

03:03

And all of these we took together and all the lessons we have learned through PCL and we're trying to put into the EcoCloud platform. So the biggest challenges with the EcoSciences is that it's a very diverse community

03:21

and it's a very diverse set of data we're going to integrate or our users want to use. So it's about spaces data, climate data, marine data, all described in different ways and often doesn't make sense to an early career researcher who comes out from an

03:40

ecology background. Another big thing is portability. If you develop something, share your code, it should be able to run anywhere, not just within the EcoCloud platform. As I said, we want to provide the ability to work with sensitive data. So security is a big thing here.

04:05

Data discovery, data access and data usage is still a very big challenge while their work with data description and data portals has been moved forward very well over the last couple of years accessing actually the data and using it is still a bit of a challenge.

04:28

Then we want to provide some way to work with large data sets which I possibly download or require some HPC environment and have some way to get easy access and integrate projects.

04:46

And as our users are using their own private data often, we need to integrate CloudStor, NextCloud, Anet, CloudStor or Dropbox and similar things.

05:04

The users we are targeting are researchers, academics, undergraduates, postgraduates, definitely high-end users who know how to code and use HPC environments but still want to share their work. Big data users

05:21

will be included as well. And of course to help with collaboration there should be some useful opportunities to collaborate with proper software engineers as well to optimize your code, help with coding or even produce new third-party products on top of Ecogab. The opportunities there of course research, science,

05:52

the ability to publish your code on models you're developing along with your data you've used, various workshops, software carpentry, data carpentry, provide resources to run curriculums

06:06

and over the research community. So on the technical side there are being three components developed. So the EcoCloud Drive, that's your

06:22

easy cloud managed online storage for code, scripts, small data sets that will always be there. So even if everything crashes next time you come in your last piece of work is still available. The EcoCloud Explorer which we hope will increase

06:41

our help with data discovery and access will mostly be powered by CSIRO knowledge network and our work there will be to provide help with using the actual data through for instance providing code snippets. And of course there will be the compute side of it.

07:01

So at the moment we have opted for providing Jupyter notebooks with R, Python and RStudio already available and also access to virtual desktop tools at the moment provided by COSRO which is a turn project. Three minutes Gerhard. Sorry, yeah, three.

07:22

So our big architecture is sort of a user comes in there will be a dashboard from which you can access everything. There's a separate centralized user management. So all services are independent, you don't know about the users, but they use this user management service to identify and get access to various things.

07:44

From there on you can publish clone scripts from other users, find out scripts, browse through examples how to use various data stores or use certain processing. You can explore data, try to get the data in in an easy way.

08:03

Start a new project that essentially gives you your compute environment to pick whatever you want. R, Python, virtual desktop and use the tools that are available there. These environments are usually also customizable so you can install your own software as well.

08:25

And our tech stack it's a fully it's a microservice architecture fully OpenID Connected and OAuth2 enabled so all APIs that will be offered and are available will also be accessible through third party systems.

08:42

Our development itself is mostly in Python and JavaScript. It's been designed to be horizontally scalable because that's just easier. Vertical scalability is still there, but there are challenges with resource allocations sometimes. And the whole system also allows to go multi-node, multi-cloud

09:06

with the work. And the way we have built it up is at the bottom there's of course OpenStack provided by Connected Cloud with all the services we are using. The Greybox, Sahara, Nokia and Trove we are currently not using but certainly looking into it how we can make them useful to our users.

09:26

On top of that, the whole OpenStack infrastructure is managed by Kubernetes which takes care of orchestration, security, all sorts of monitoring and due to that we can do wire Kubernetes, we can up-scale everything.

09:44

Kubernetes itself orchestrates everything that runs within EcoCloud. So there's our the Jupyter notebooks, there will be web processing services, positive web UI, the various tools we developed that provide APIs

10:02

and Coursera Virtual Desktop sits a bit outside of it, but will seamlessly integrate with the EcoCloud. And all the work we are deploying here and that's running within the EcoCloud allows you also to access external services. At the moment we are

10:23

integrating, we're providing tight integration with Knowledge Network, various data services like data.graph.au and other things. CloudStories as mentioned before, your Google Drive, your Dropbox and external web processing services that would be of high interest for us just to offload data transfer and compute services.

10:46

And that's pretty much it for me. Thank you very much. Okay. Thank you.