Open Science, Knowledge Sharing and Reproducibility as Drivers for the Adoption of FOSS4G in Environmental Research
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 295 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/43572 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
FOSS4G Bucharest 201912 / 295
15
20
28
32
37
38
39
40
41
42
43
44
46
48
52
54
57
69
72
75
83
85
87
88
101
103
105
106
108
111
114
119
122
123
126
129
130
131
132
137
139
140
141
142
143
144
147
148
149
155
157
159
163
166
170
171
179
189
191
192
193
194
195
196
197
202
207
212
213
214
215
216
231
235
251
252
263
287
00:00
DruckertreiberOffene MengeRechenwerkProgrammierumgebungTreiber <Programm>ForcingKoordinatenBitNatürliche ZahlInteraktives FernsehenTopologieKonditionszahlWald <Graphentheorie>Minkowski-MetrikHasard <Digitaltechnik>MomentenproblemSuite <Programmpaket>FlächeninhaltZellularer AutomatVorlesung/Konferenz
01:33
MenütechnikLokales MinimumProgrammierumgebungSpezialrechnerWald <Graphentheorie>Hasard <Digitaltechnik>MaßstabTermEinflussgrößeMengeOpen SourceKontextbezogenes SystemEinflussgrößeTermFrequenzProgrammierumgebungComputeranimation
02:04
Wald <Graphentheorie>EinflussgrößeMaßstabTermHasard <Digitaltechnik>CodeAusnahmebehandlungStichprobeZeitabhängigkeitGemeinsamer SpeicherKollaboration <Informatik>Exogene VariableOffene MengeEinflussgrößeExogene VariableTermMengeVorlesung/Konferenz
02:43
CodeZeitabhängigkeitKollaboration <Informatik>AusnahmebehandlungStichprobeART-NetzTermFunktion <Mathematik>EinflussgrößeMenütechnikInformationDatenbankStabTypentheorieSpezialrechnerHinterlegungsverfahren <Kryptologie>Befehl <Informatik>Minkowski-MetrikSCI <Informatik>ProgrammierumgebungGesetz <Physik>RotationsflächeURLGemeinsamer SpeicherInternetworkingStichprobenumfangProgrammierumgebungSensitivitätsanalyseHinterlegungsverfahren <Kryptologie>Exogene VariableKollaboration <Informatik>Mailing-ListeKontextbezogenes SystemTaylor-ReiheOffene MengeWeb SiteEinfügungsdämpfungTorusBenutzerfreundlichkeitMathematikART-NetzMengeSoundverarbeitungNatürliche ZahlDokumentenserverInverser LimesCodeDifferenteAbstandMIDI <Musikelektronik>XML
04:57
Digital Rights ManagementProgrammierumgebungDokumentenserverInformationDigital Object IdentifierWärmeausdehnungSpannweite <Stochastik>TopologieKommensurabilitätChi-Quadrat-VerteilungAuflösung <Mathematik>ProgrammierumgebungMetadatenDienst <Informatik>IdentifizierbarkeitElektronische PublikationDigital Object IdentifierDeskriptive StatistikKollaboration <Informatik>GrenzschichtablösungObjekt <Kategorie>MereologieCodeAuflösung <Mathematik>DokumentenserverPunktMultiplikationsoperatorEndliche ModelltheorieMengeProgrammbibliothekComputersimulationKontextbezogenes SystemOrdnung <Mathematik>Computeranimation
07:01
Dienst <Informatik>Numerische TaxonomieSoftwareAutorisierungMengeVollständigkeitPunktSoftwareInformatikDatenfeldCASE <Informatik>MultiplikationsoperatorBildverstehenFormation <Mathematik>Computeranimation
08:34
Selbst organisierendes SystemPortal <Internet>SpezialrechnerDynamisches RAMNonstandard-AnalysisDigitalfilterDateisystemPunktwolkeMathematikSpeicherabzugElementargeometrieVerzeichnisdienstPunktService providerMereologieOffene MengeAggregatzustandGesetz <Physik>Computeranimation
09:53
Offene MengeGerade ZahlOpen SourceNummernsystemExpertensystemKomplex <Algebra>SpezialrechnerWurm <Informatik>StandardabweichungBildschirmsymbolMenütechnikElektronischer ProgrammführerSummierbarkeitLokales MinimumKontinuumshypotheseE-MailProzess <Informatik>Streaming <Kommunikationstechnik>ProgrammierumgebungCodeEinflussgrößeFunktion <Mathematik>StabDatenbankInformationPeer-to-Peer-NetzWeg <Topologie>MereologieCodeImplementierungPunktExpertensystemInformationFunktion <Mathematik>FlächeninhaltKomplex <Algebra>Prozess <Informatik>Offene MengeCoxeter-GruppeGanze FunktionKartesische KoordinatenSoftwaretestAuswahlaxiomBildschirmsymbolOpen SourceGrundraumKollaboration <Informatik>RuhmasseCASE <Informatik>Ordnung <Mathematik>DifferenteDokumentenservert-TestComputeranimation
12:59
Offene MengeVersionsverwaltungWechselsprungZahlenbereichSingularität <Mathematik>SoftwareSupercomputerKomplex <Algebra>FeldgleichungRuhmasseWellenpaketGanze FunktionTrennschärfe <Statistik>ProgrammbibliothekCodeMultiplikationsoperatorMereologieResultanteVersionsverwaltungOffene MengePunktATMVerkehrsinformationInstallation <Informatik>BildschirmmaskeRechenschieberMehrplatzsystemFlächeninhaltQuellcodeVirtuelle MaschineSoftwareJust-in-Time-CompilerDienst <Informatik>Gemeinsamer SpeicherNotebook-ComputerOpen SourceWort <Informatik>BildverstehenRechter WinkelMaschinencodeDatenverarbeitungssystemSupercomputerBenutzerfreundlichkeitForcingKomplex <Algebra>LastMinimumKugelkappe
17:14
CodeOpen SourceServerSoftwaretestBenutzerprofilBefehlsprozessorMultiplikationKonfiguration <Informatik>HypercubeSpeicherabzugNotebook-ComputerSpezialrechnerKollaboration <Informatik>Singularität <Mathematik>Airy-FunktionDruckertreiberOffene MengePeer-to-Peer-NetzProzess <Informatik>RechenbuchVerknüpfungsgliedHochdruckFunktion <Mathematik>Wort <Informatik>RechenschieberBildschirmfensterSchreiben <Datenverarbeitung>DatenverarbeitungssystemZellularer AutomatStellenringCASE <Informatik>BildschirmmaskePhysikalisches SystemBinärcodeOpen SourceCodeForcingSupercomputerSingularität <Mathematik>Prozess <Informatik>SoftwareInformationSpeicherabzugMehrrechnersystemServerVirtuelle MaschineLokales MinimumMomentenproblemSystemaufrufNotebook-ComputerZahlenbereichKomplex <Algebra>InstantiierungMultiplikationsoperatorRechter WinkelInternetworkingPasswortPunktSkriptspracheStapeldateiSoftwaretestComputeranimationFlussdiagramm
21:15
RechenbuchVersionsverwaltungCASE <Informatik>ResultanteProjektive EbeneCodeProgrammbibliothekUmsetzung <Informatik>ÄhnlichkeitsgeometrieVisualisierungPlug inSoftwareNatürliche ZahlOpen SourcePaarvergleichOffene MengeBinärcodeVorlesung/Konferenz
22:02
Formale GrammatikDigital Object IdentifierSoftwareSoftwareVersionsverwaltungStochastische MatrixÄhnlichkeitsgeometrieDigital Object IdentifierPunktVerschlingungCoxeter-GruppeCodeComputeranimation
22:26
CodeSoftwareStichprobeWald <Graphentheorie>Prozess <Informatik>StandardabweichungProgrammierungMenütechnikEin-AusgabeGammafunktionZahlenbereichVersionsverwaltungWeg <Topologie>Open SourceDokumentenserverAusnahmebehandlungVerschlingungZellularer AutomatNotebook-ComputerPräprozessorCodeReelle ZahlPunktProgrammbibliothekMathematikStandardabweichungBildschirmfensterPhysikalisches SystemProgrammierspracheCASE <Informatik>Deskriptive StatistikPublic-domain-SoftwareArithmetisches MittelElementargeometrieSkriptspracheWort <Informatik>BetriebssystemXML
24:10
ZahlenbereichVersionsverwaltungCodeOpen SourceWeg <Topologie>DokumentenserverAusnahmebehandlungÜberlagerung <Mathematik>CodeOpen SourceMultiplikationsoperatorFitnessfunktionAdditionMereologieWeg <Topologie>Gemeinsamer SpeicherOffene MengeVersionsverwaltungMathematikProjektive EbeneForcingInformationStellenring
25:32
Offene MengePeer-to-Peer-NetzOktave <Mathematik>SoftwareSpezialrechnerStrategisches SpielData MiningNotebook-ComputerWellenpaketComputeranimation
26:28
Notebook-ComputerProzess <Informatik>CodeMomentenproblemDienst <Informatik>Neuronales NetzSoftwareindustrieOffene MengeProgrammbibliothekMultiplikationsoperatorVererbungshierarchieService providerSchreiben <Datenverarbeitung>BildschirmmaskePublic-domain-SoftwareBitWellenpaketStrategisches SpielRechter WinkelSichtenkonzeptCASE <Informatik>Endliche ModelltheorieYouTubeVersionsverwaltungOpen SourceThumbnailRichtungPunktSystemaufrufInternetworkingMengeSchwebungBlackboxVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:07
So, welcome. I will try to, let's say, bring you closer to some drivers, which helps the adoption of post-4G in environmental research. So, again, the topic is environmental research.
00:23
And my name is Jonuz, Jonuz Isefescu, and I'm a technical coordinator of Envidat. You'll see what Envidat is in a minute. And I just want to mention that this is a team effort. So a lot of people working at the Swiss Federal Institute for Forests, Snow, and Landscape Research, WSL, in Switzerland, have contributed to these works, and they are still contributing.
00:46
So this work is ongoing, and we hope to continue it further on. As I mentioned, we are talking about environmental research. So that's why I will start with a little bit of a background.
01:01
And just say, once, very brief, that we are doing research for people and environment. So the interaction between people and the environment. And therefore, we have five big research areas, forest, landscape, biodiversity, natural hazards, snow, and ice.
01:22
And therefore, we are having a lot about research about climate change, about how a tree will adapt in the future to the new climatic condition. We have a lot of snow research. And of course, drought and all the climatic changes that are coming upon us, even in Switzerland.
01:40
So in this context, we are doing research since more than 130 years ago. That means that we have a lot of data. And this data covering, of course, more than 100 years, we want to make it public. And we are considering actually to be a treasure, an environmental data treasure.
02:04
These are some of the most of the longest period, long-term measurements of data in Switzerland. And therefore, we are actually, our mission in Avidat is to make this data accessible,
02:22
to provide access to international researchers, to you, actually, to this data. Because WSL is acknowledging the responsibility to make this research data accessible and is committed to ensure the availability on long-term of these data sets.
02:45
And in this context, we are working to make data plus methods plus the code available. And you see the star, so in the limit of the not sensitive data.
03:01
Because, for example, some of the data sets will never be open because they are sensitive data, like data protected by law, like certain species, endangered species locations of flowers, of insects, of mushrooms. So this cannot be just published on the internet. Or maybe sample location for different inventories.
03:20
So this will always be not open data. But in general, we want to provide access to this data because we are supporting the ongoing cultural revolution in research towards openness, shared data, knowledge sharing. Of course, international global knowledge sharing and opportunities. And this is very important for distance collaboration.
03:42
So all this sounds nice. But why actually are we doing this? Yes, we have a big heart, you might say. But also because we need to publish. And because this is the academic track, I think each of you, you need to publish. And that's why I said opportunities for distant collaboration.
04:00
Because if you share data, of course, new opportunities arise to make papers. And also because recently, mid of 2018, this commitment, fair data commitment entered into effect. And many publisher, you have here a big list. And I will mention AGU, all the AGU journals, Ubiquity Press,
04:24
Wiley, of course, Science, Elsevier, Nature, Taylor and Francis, and so on. You have the complete list on the website. And here are requesting, actually, all the researchers to specify or
04:40
to upload the data used for that research in an appropriate repository. Ensuring findable, accessible, interoperable, and reusable data sets. And this is a big change. And luckily, at WSL in Switzerland, we have our own portal. Yes, we have our own portal, responsive, of course.
05:02
And it's a repository for environmental data. And because we have this in-house, we can, of course, tweak it and develop it further and provide new tools. And this is what I'm going to tell in the next. But the main important point that we provide,
05:20
main service that we provide to our researchers with this portal is the data publication with appropriate metadata and document object identifiers. So similar to publications, because we consider also the data to be a major part of research, yeah?
05:42
So we need to bring this on par with the publications. So as you see, you have the title, metadata, we have the authors, you have a contact point, of course, you have the license, you have description of the data set, you have data and resources, of course, the actual files that are being reused, and citation, important in science.
06:03
And of course, a DOI, you can see a DOI in several places with the prefix and the postfix in the similar way like a publication. So you can consider us to be like a journal for data, but we are not a data journal, you just need to really document the curated metadata in here.
06:22
So in this context, data is important, but also technical publications. Remember I've said, let's consider data, yes? Then consider the methodology, to be clear, and the code. And part of the methodology, of course, is also technical publication.
06:42
Like this is one example I've chosen to documentation how to run a high resolution model, actually, and this is important. This is not, in that sense, a journal paper to be published, but a technical documentation. And this can be also published here, because it helps understanding together with the data set. And of course, in collaboration with our library, we provide also,
07:04
we allow to link things from our library, like published papers, published books, yes? So you can have everything in one place. So the idea to document the research as complete as possible. And this brings me to another point, which is authorship.
07:24
So authorship is important for us. That's why we also, we allow to our researchers to define their roles in producing those data sets. Because remember, data set and the authors of a data set is different from the paper, right? Writing the paper, authorship of the paper,
07:42
is independent of the authorship of the data set being used. So, it's important to specify for each author their roles, where they contributed to the collection, to the curation of data set, to the data and they curated, they went in the field and collected new data. They validated the data.
08:02
They published it, of course, in Avidat. They wrote maybe even software, they are maybe not scientists, they are computer scientists only. And they're writing software that is being used for processing this data. This can be also recognized. And of course, the entire supervision. We had the idea, who coordinated, and so on and so forth.
08:22
And if you want more details, you can go and read the definitions here in case you also want to use it. Because I think it's a very important way to specify the authorship for data which is different from the authorship from the paper. And of course, with this authorship, the important point of this authorship is that it allows us to have more involvement with the portal.
08:45
So our researchers are taking care what they are uploading there. So we have automatically curated data because also their names are in state. And because we have curated data, we're also contributing them and disseminating them all over the world.
09:00
Like on GEOS, geoportal.org, I hope you know it, supported by ESA. And it's available also there because WSL is a GEOS provider, data core provider. We also, in North America, we have the Earth data portal from NASA. So, and the global change master directory.
09:21
So this is also updated every 15 days. So twice a month. And of course, also, if we think about the European science cloud, maybe some of you are aware about it. So we're also being harvested by EODET and B2Find, which probably will be a part of EOC Hub and European Open Science Cloud.
09:44
So we are trying to make this data available so people can use it. And of course, publish papers together with our researchers, of course. That's the main idea. So now let's go to the open science part. So open science, knowledge sharing, reproducibility.
10:01
They are very much interrelated. And if you are waiting to get a talk on open science, sorry, I can't do that. I'm not an expert. However, because also science is complex anyway, and open science is even more so. And this is one of the taxonomy, you have always the source, so
10:21
you'll have the presentation, you can always check the sources. But there are many things here to consider. From open access, open data, reproducibility here, even the definition is not clear, yes? So there are different definitions. Evaluation, policies, tools, repositories, and so on and so forth.
10:41
So each of those, so if you'd like to describe open science, I think you need a conference only about open science, probably. So in this case, I'm just telling you it's a complex matter. So I'll try to really reduce the complexity, simplify it. So we can talk about it. And for this reason, I have chosen this definition.
11:01
It's a very simple definition from open science ASAP. And which says, open science is the idea that scientific knowledge of all kinds should be openly shared as early as practical in the discovery process. This is a very good point being made.
11:20
Because if you collaborate, if you discover, if you let it, if you share it early, then there are more opportunities for these collaborations. And also, related to this is the crisis of irreproducibility, or reproducibility crisis.
11:41
Because we are in the academic track. So I assume all of you are scientists. You also tried, you found a new method. You either tried it yourself or gave it to your student. Implement this and try it on our data. And they couldn't do it. Because not all the information is available. Even if you had the paper and the method describe, you don't have the data.
12:05
So maybe there are some things hidden, which are not explicitly described. And the code, and I will come back to the code, because the code is the most important, I will tell you why. So you need to reproduce, and this is not coming immediately, right?
12:21
This kind of reproducibility will come after five years or maybe even ten years. Right, so you published, you forgot, you changed your job. You finished your PhD, you went and worked and suddenly your professor is going to knock on your door. Do you still have the code? I need to reproduce this. And this is the problem.
12:40
And we are trying also to work on this and to support this for the future. Again, that's the reason, yes? Because science output is measured with publications. So for this one, I tried to simplify the open science to a few bigger areas without getting into details.
13:04
Open educational resources like trainings are handled by universities, of course, by massive online open courses. The entire open access area and publication is handled, of course, by open access journals and libraries. So I also don't get into detail. What I'm going to focus on is open source and open data.
13:24
Because this is something that we can support our researchers with, with the MV.portal, which I mentioned before, and our VSLIT. Again, in support of open science. And to start this, I will just mention some useful open science tools.
13:42
Because this actually helped implement the vision of open science. And this, of course, there are many. Again, I'm not trying to be complete. It's just our selection and our support for our researchers. And we think versioning. So code is important, right? It's part of the methodology.
14:02
So, you should be versioning it. Correct. So you have the same version of a publication. You know at the time of the publication which version of the code was available. For example, you can also encapsulate in a container your software. So be sure that it runs also maybe in ten years.
14:20
And hopefully, after praying, of course, maybe in ten years. You never know. But if you are going to containerize it and so on, you have a good chance. So container, for those of you who don't know, it's like a virtual machine. So you have everything, it's running there, it's tested. You run your code there, you get your results on this virtual machine, this container.
14:41
And then you can also keep this one and save it together with your publication and source code and everything. And you have a good chance of reproducing this, the same research, years later. And most importantly is how you document. And how you document your methods in a paper. Everyone knows, so everything is being done.
15:00
But how you document your code is also important. And that's why we are trying to bring to our researchers to use Jupyter notebooks, and they also started to use this. Because they allow it, first of all, to document. It's like better documentation of the code, better than commenting the code. Allows them to run it interactively, which is important.
15:22
And also to put data in code-driven narratives because of this interactive cell-by-cell running. So it's a much better way of documenting the code and assuring the common understanding and knowledge sharing in the methodology. Or at least the part of the methodology covered by the code.
15:41
So to give you more example, we have here the open styles tool offering listed. So very little at the moment, but we are working on it. So first of all, an internal JIT lab. So JIT versioning. I think you heard about JIT. Not all the researchers, so some researchers are being shy of sharing their code too early.
16:02
And like last year, until this year, for example, GitHub didn't have a private mode for the reporter. They're always public. So our WSLIT had this offering to save and version their code on the premise without publishing it too early.
16:21
To containerize complex software, again to increase reproducibility, to make it in a virtual machine of some form in a container. To install Jupyter and JupyterHub, like multi-user's way of using Jupyter notebooks, which I've mentioned in the previous slide. And of course, Envidat is a big part of the process because in the end,
16:42
all the data and software we encourage the researchers to upload in our repository, in the Envidat. This is a service which WSL provides for our researchers, a portal. And as a bonus for our researchers, they have a user-friendly access to a complex high-performance computer, which is like a B computer, a lot of nodes and cores,
17:01
through these Jupyter notebooks. And of course, I'm showing you all this. This is a GitHub installation because of this word, collaborate, allows also researchers to collaborate. And this is important, another point of using versioning and GitLab or GitHub.
17:21
And this is one simple slide to illustrate how this works with Jupyter notebooks on WSL's high-performance computer, which is called Hyperion. It's very easy. So usually, to write code for HPC is complex. And for certain cases, you still have to do it like this.
17:42
So this is for smaller kind of software. But for a normal PhD starting there, he wants to use the HPC, they can go to this address. Don't try it because this is only in intranet. So you won't get anything. They can log in with their user and password. They can spawn their own Jupyter node instance.
18:03
So one node for small software, up to at the moment, four nodes, each one 10 cores. So for 40 cores and four times, so 128 of gigabytes of RAM, for a maximum of eight hours, they can spawn that machine. And they start coding interactively on the HPC
18:22
and trying this code interactively in Python, Octave, and R at the moment. But more canals can be installed. So in this way, they have easy use access to the complex HPC. These are more technical details, but I think I will spare it. This is how it works. You are actually spawning or communicating with JupyterHub server.
18:44
So if you want to install it, you need the JupyterHub servers. And you need to install it on the master node of the HPC. And then you can use the compute nodes. And with the batch job that you usually use, like submit jobs with SBash, you can reserve the appropriate number of notebooks.
19:03
Usually one if there are many computer or many for software that is completed. So you can let it run on several nodes. And if you want to containerize, you can also check. So, but this, again, not for Windows software. For Linux software, you can containerize with Singularity.
19:22
And also test it if it runs on the HPC, just to make sure that what you did on your local computer, yeah, can also be run on a HPC or later, in whatever computing system you'll have later in five years. So this is also a support that WSL-IT is providing for researchers to containerize those software.
19:41
And you have the information in the paper from Singularity, if you are more interested. By the way, Singularity, it's a form of Docker, like Docker, yeah, containers. But better appropriate for research. You can read more in the paper in the source there, if you are interested in Singularity. Now we come back to the core.
20:00
So it was a lot of information, I know. But actually, I'm coming to the point which is Force4G, and then I will give you an example. So how does come Force4G? Yeah, so we've discussed Jupyter notebooks, writing code, putting data together with the code and so on. How does this work?
20:21
So in the end, the code that people are writing now is open. Right? And the Jupyter notebook. But is this enough? Because one of the first example that happened with the JS script here, Python script, is that was done on Windows using some proprietary software.
20:44
So, by the way, nothing I get answered there, and RGS, it's a great software. The problem is, how do you consider reproducibility in five years, when the license server is gone? Yes, so you don't have access to the license. And as a solution for helping reproducibility,
21:02
then you need some software without license, and preferably also with the open source code available. Just in case, because even with the open source software, binaries might not be compatible in the future, especially some old versions of the binaries. That's why it's also a good idea to also have the code.
21:22
So this is how all this come in place, because if you want to run it the same calculation, to get to the same results or similar results, not exactly, but similar results, comparable results in the future, then it's a good bet to use open source software. And in our case, researchers have been using, of course,
21:41
the Project 4 library, GDAL, Augur libraries, Geopandas with Python, of course, Python here are translated something from Matlab, for example, in Octave, and of course, for visualization, another is QGIS. It's very popular. Now, in the end, yes,
22:01
so if you are considering this, it's open source, and you can also publish this software like a plugin, and of course, if you want to keep it running, you can even install it on the compatible version of QGIS, and also let it run in a similarity container, for example, in the future. And of course, you give a DOI and so on.
22:24
But now let's give you an example how this goes. So main point, publish data, code, and documentation together. I have, we have also with all my colleagues prepared also an example that you can also try. Just go search for this. You have the link in the paper, and you have the link in the presentation.
22:43
So you can also try it yourself. And this is online. You can try it and so on. I'm not going to detail of this example. You can read more about it and what it's for. So it's a real example. It's about a pre-processing of Forex accessibility and wood harvesting.
23:00
So it's a real world example. And in this case, you have the data, documentation of the code, and so on together, and the code together as Jupyter notebooks. Using Python, which is a, let's say, standard programming language for geospatial domain, using, as I mentioned, Geopandas, GDAL, and so on from OAGO package, of course.
23:23
So standard libraries. You can run this interactively, cell by cell. You see the cells. You see the descriptions, and so on. And the most important point is that the data is also there, meaning that you don't need to really fetch the data
23:41
within a folder or whatever. It's being fetched by the script itself. So you have it already in Avidot. They are open. They are being... What you need to specify is your temp folder where they have to be unpacked. Yeah? So this is a very small change depending on the operating system. It works Windows and everything else. Linux, Mac OS, and you can run it.
24:05
Coming to conclusions now. So we're thinking about this kind of best practices, which are actually not yet finalized. It's just the beginning, I would say. But the major one is that we ask our WSL researchers, and I would like to say this is only for our WSL researchers
24:22
that come in contact with us, with Avidot, for example. We ask them to think about reproducibility and appropriate knowledge sharing from the first time they are starting working a WSL in a research project, because otherwise things are forgotten, so it's normal,
24:41
to version and document their code from the beginning. It's easier also for them to keep track of their data sources and all their changes, right? So a lot of people are mixing existing data with new collections of data, and this also gets lost. This kind of information, which is actually important if we consider reproducibility. And to publish data, not just data as the paper requests,
25:03
but also with additional methods like installation, how they install something or how they use something, and the code together in, of course, Avidot. While we tell them that this doesn't cover all their needs, there are many things more they should keep in mind. There is nothing like one size fits all,
25:22
but we allow them to use the available tools. So to improve this part with open methodology, open source and open data, with all the logos that I've mentioned before, and of course, coming to the crux of it, using where possible, as much as possible, Phosphor-G.
25:44
And you have it here. So thank you so much, and I'm open for questions.
26:09
What was your strategy for getting researchers to start using Jupyter notebooks? Did you have, like, people who couldn't help? I would like to say that we have a strategy, but we don't.
26:22
We are giving our best, so a lot of people are voluntarily giving, for example, trainings, like a colleague of mine, Matias Heni, so he's a great guy. He is giving, for example, trainings for the new PhD, new people coming to WSL on Git, on Jupyter notebooks,
26:41
and there it's also describing or mentioning the entire tools that are offered. Yeah, in the end, we are not, so researchers need to have freedom, yes? So we are offering services to researchers,
27:00
but we don't even attempt to force them to use something anyway. So forcing research is a bad idea all the time, so we provide services, and in the end, if they make sense, if they think it makes sense, they will use it. If not, they will not, right? But it's not that simple, because this sounds nice on paper,
27:20
but there are problems, because sometimes a new PhD is coming, and he is actually getting an existing model, which was maybe written in Fortran or whatever, so there is always an inheritance problem, right? So he's new, he wants, he would like to do it, for example, Jupyter notebooks and like this, but he has a model, which is also,
27:42
maybe the code is available, but it's so complex, it's like a black box, right? So I'm saying sounds nice, and it's very appropriate for new works, and we hope that with the time, people will start using more and more, but I wouldn't say that we have a strategy. We are giving our best to provide
28:00
open science support tools to researchers. So thank you, it's very interesting, and the only concern that I have is that we saw what happened with, for instance, metadata,
28:20
which is much easier than these, and convincing people in putting metadata, which is definitely easier than doing that, and it was very, very difficult, so let us see if you are able, or we are able to convince people in documenting
28:41
and doing, it is important in my opinion, it is really important, and I don't know, I see also something more, probably I'm completely crazy, but putting together the documentation that means in the narrative form
29:02
what you will do, decode the data, if we are able to collect enough, probably at the end, we will also have some artificial intelligence, we were speaking before about artificial intelligence,
29:21
writing code for us, simply putting in a narrative form. So this would be nice, this would actually be nice, at the moment we still need researchers, so artificial intelligence cannot do research for us, and I would say luckily for us, but yes I would say that without leaving a bit the faraway thoughts
29:43
like artificial intelligence, so I'm pretty sure this will come, at least I think big software companies like Google and Microsoft are working in this direction, but just imagine having a library also of code, so you have a library of publication at the moment,
30:01
right, this is coming, this is already there, then now is coming a library or a collection of data, slowly, slowly more researchers are publishing their data, and if this is continued we will have a library of code, and you have a lot of libraries of code already, SourceForge, GitHub and so on,
30:20
so yes I think it's possible to do it with artificial intelligence, but this will not be in the near future, at least I don't think so, but for researchers coming, having no idea about how to statistically for example process their data properly, taking a look at the existing code and learning from those, and these are really scientific,
30:40
it's not like non-curated like SourceForge or GitHub where you have everything, this is linked with publications, with methodologies, you know what it's for, code is open, it's narratively documented, explained, this is a huge opportunity for new PhDs to learn, right,
31:00
and it will save their time immensely, instead of oh I'm doing tutorial on the internet, how do I do now, I wait to ask my supervisor, I get another piece of code like it is now, so they can just take a look, see with the example, I think this will speed up research, actually it's a valid point, how do we do this at the moment with the tools that we have right now,
31:23
to bring this, to make research faster, because in the environmental domain we need it. Okay, thank you. So I think time's up, so I will stop the questions.