We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Open Science and Reproducibility

00:00

Formal Metadata

Title
Open Science and Reproducibility
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Open Science comprises a range of scientific work practices that make research transparent and comprehensible. It is thus closely related to reproducibility as a cornerstone for the trustworthiness of research. The publication of and access to research data and software is indispensable in this context in order to be able to understand the results. The FAIR principles developed for research data have now also been transferred and adapted to research software. Many existing local and topic-specific initiatives want to promote cultural change in science towards open science and thus increase the quality of scientific work and the robustness of results. In the lecture, the role of the German Reproducibility Network GRN as a platform for networking such groups will be presented, so that different scientific communities can learn from each other and thus go step by step on the way to more reproducibility.
Keywords
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
And we're very happy to have Bernadette Fritz with us here today. She is a climate researcher and physicist. And currently she is responsible for user support in the area of high performance computing for earth system modeling and research data at the data center of the Alfred Wegner Institute, Helmholtz Center for Polar and Marine Research, PrimaHafen.
She's also a founding member of DERCE and also of the German Reproducibility Network, Antonia just mentioned. And she's a member of the Helmholtz Working Group, Open Science. So a lot of great activities towards openness and reproducibility. Bernadette will now show us in more detail how closely open science and
reproducible methods are linked to the cultural change of scientific communication and practices. We're very happy to have you here. Welcome Bernadette, the stage is yours. Thank you, Alia, for this nice introduction. Good morning from my side to everyone. It's amazing to see that many people attend this meeting.
And I think this reflects that reproducibility is an important topic in our work as researcher. It's a pleasure for me to start this meeting with some thoughts about reproducibility and open science.
During Christmas vacancy, I, like I think millions of other people around the world was playing with a new chat GPT. I had some fun and also long, serious discussions with friends about new possibilities and the limitations of the technique.
At the time, I also started thinking about today's talk. And so at some point I also had a chat about open science and reproducibility. Don't worry, I won't be giving a chat GPT generated talk to me, but one show you only one detail of this.
Even if one should have in mind that issues like openness and reproducibility of chat GPT itself and of correctness of answers. I just want to take this one sentence as a starting point for my presentation.
By making research more open and accessible, open science helps to increase reproducibility and build trust and scientific fun. I think the sentence roughly expresses what is all the represented, for
example, in the Turing Bay with this figure, make your data, the tools, and the code used open in order to make the resulting findings comprehensive and reproducible. Although in this picture, the reproducibility is shown as a high mountain,
we should not be discouraged. Personally, I don't see reproducibility as a single point, but rather as a spectrum where even small steps can make a difference. But what does reproducible mean?
Anyone who has ever assembled furniture from a well-known Swedish furniture maker knows the problem that even though you have all components and documentation, you may not get the desired result, or maybe only after the second or third attempt.
So as such a simple reproducibility experiment can cause problems, it is even more so with complex workflows in the lab or on the computer. So I will show you a similar slide as Antonia.
I think we have to start with the terminology first. Reproducibility and related terms like replicability and repeatability are understood quite differently in different contexts. Numerous articles have examined how well these definitions work together.
Here I picked out a few examples. A particularly straightforward definition is provided in Schwab et al. In this view, reproducibility refers to the ability to duplicate an article's drawings using the data and software given, so I can trust the paper's
charts and other content. In a broader sense, Gunderson defines reproducibility as the ability of a scientist to reach the same conclusions from the data as the authors of that. Thus it goes beyond just reproducing pictures and instead focuses more on
the knowledge coming from the data. I also want to provide the definition of the Association of Computing Machinery, ACM, here because I'm coming from the computational science sector.
The concept introduced graded hierarchy from repeatability to reproducibility and then to replicability. Interestingly, in the previous definition of ACM provided also in the slide from Antonia, the meaning of terms reproducibility and replicability
shows clearly that you always have to look closely at what exactly is meant by the respective terms. When discussing reproducibility or replicability, you have to clarify a number of questions.
Which criteria are applied? For example, in computer simulations, should the result be a bit accurate or which deviations are tolerable? What standards are developed? What data is required? How accurate must the protocol be followed?
Therefore, right from the start of the work, there must be an awareness of what could have an impact on the result and what needs to be locked in the protocol accordingly. Illustration from a comment in Nature from Nozick and Eric in 2020 sums it up
in a humorous way. There are also slightly different versions of the definition of open science. Probably the most comprehensive is provided by UNESCO and includes a number of other factors in addition to the classic components open access, open
source. They define open science as an inclusive construct that combines various movements and practices. And open science aims to make scientific knowledge openly available, accessible, and reusable for everyone, but also tries to increase scientific
collaborations and sharing of information for the benefits of science and society. And I think sharing information is one of the crucial points with respect to rapid disability.
So let's start with data because we are in the love data week. If I want to understand the results from a publication, I need the underlying data. For some time, more and more journals have been asking for the relevant data
when submitting a paper. In this context, data publication, which defines a data set, is becoming established. This process then identifies and ensures that I can access this data set even after 5, 7, 10 years or so.
When we talk about data today, the key word fair immediately comes up. Findable, accessible, interoperable, and reusable. However, have in mind, fair does not automatically mean open. The exact wording in the fair definition is accessible under well-defined
conditions. There may be legitimate reasons to share data from public access. These include, for instance, personal privacy and competitiveness. Often fired by open science, the fair principles do not address explicitly and deliberately issues related to the openness of data.
Depending on the community, different data repositories have been established that meet the specific requirements of different data types and formats. You can easily find the repository with your specific requirement.
For example, the RWE has been hosting the Pansia repository together with Martin Bremen for many years. And it aims at archiving, publishing, and distributing georeference data from Earth system research.
Each data set can be identified, shared, published, and cited using a digital object identifier toy. And data sets can also be linked to articles. Such data collections exist in many institutions and scientific disciplines
in order to permanently secure their data and to link them for interdisciplinary research. Various consortia are currently working in NFDI, Natsunale Foschungstaden in Foschruptur, to set up an sustainable research data infrastructure in Germany.
If data is so valuable to be called the new oil, then in this picture, the software is a refinery. Software plays an important role in almost all areas of research. And the quality and robustness of research results
often crucially depends on the quality of the research software used, which is, sorry, expressed in the slogan, better software, better research. The fair principles can also be applied to software, and made in a slightly modified form.
As far as possible, software should also be made available as open source. In Helmholtz, there is a model guideline for sustainable research software, which has already been adapted and approved in some centers. It contains, among others, information on development practice.
But not all scientists who develop research software for their work have the necessary knowledge. In this respect, training and education are of great importance.
We have a great platform with the Helmholtz Federated IT Services HIFAS, where courses and workshops are offered regularly, in cooperation with the Helmholtz Information and Data Science Academy HIDA. Also in our today's workshop, there will be the possibility to learn about software
publication and citation, which has become important, also with respect to reproducibility. Publishing your software can help others to reproduce the result easily. I come from scientific computing, and have done a lot in the field of high performance computing.
So here is an example from Earth System Modeling to show what we are already doing with respect to reproducibility. Earth System Modeling aims to better understand the Earth System, and to be able to estimate further development.
In order to reflect the complexity of the system, the models include a number of different models, like ocean, atmosphere, and so on. Some are developed by us at ARVI. For instance, the finite element volume, sea ice ocean model, PHESOM, and the biogeochemistry model, RECOM.
Others come from other developer teams. The interaction between the components can be challenging. So in order to facilitate the Earth System Modeling, a standardized framework to download, configure, compile, run, a variety of Earth System Models on various HPC platforms was developed.
These ASM tools also provide many information about the workflow. These information can be stored in a database. This encourages reuse of simulation results,
because the needed metadata about the simulation is available. On the other hand, it helps also in cases when problems with repeatability of corporate disabilities occur. The example consults a column 4-1 of a corporate model with ocean, sea ice,
atmosphere, and biogeochemistry to study global carbon fluxes. Unfortunately, here we have the case of two executables, one created in July, the other in December last year, deliver different results. Even if the globally averaged sea surface temperature differs only very slightly,
a look at the spatial distribution shows that the regional differences can be quite large. Here shows the difference in the sea surface temperature of the ocean between the two versions. We checked the use get versions of the code of all components and possible private code
modifications which are not checked in but were locked as additional information. We checked versions of use compilers and libraries. So far, we have not been able to identify any reason for the deviations and must continue to search.
This example shows why I initially spoke of a spectrum with respect to open science and biodiversity and that we are in a process of doing step by step. The current documentation and openness of the code and the dependencies on the two issues,
compilers and libraries, is apparently not enough here and has to include further information in the future. An important step in this path to more openness in general was taken last year, when in September 2022, the Helmholtz Association adopted a new open science
policy by the Assembly of Members. The policy stipulates that scholarly publications, research data and research software are published openly in Helmholtz. In this way, Helmholtz shows a clear commitment to open science in accordance to the principles
as open as possible and as closed as necessary. The policy also highlights the role of data and software publications. In this context, the work done for the provision of open data and open software has also come to be recognized by introducing new indicators,
which are established as incentives within the framework of the program-oriented research. In order to promote the topic of rebel disability, there are now a number of initiatives at different levels.
One of them is the German Rebel Disability Network. The mission is to increase credibility and transparency in various scientific disciplines. The GRN wants to provide a platform for the many existing grassroots movements
in re-participability and open science, to better network and learn from each other across disciplines, and to disseminate best practices. GRN activities span multiple levels, including researchers, institutions and other stakeholders, for instance funders, publishers and academic societies.
Local or topic-specific initiatives on the subject of rebel disability and open science can become members of this network, but also entire institutions that want to work towards this goal. Currently, GRN has 33 members.
GRN is also working on position papers and already published one about open data and one about the connection of quality research and good working conditions. Further, a discussion paper about research communication systems
that supports computational rebel disability in research will be published soon. We are organizing a symposium for members this year, and GRN is part of an international network of similar rebel disability networks in other
countries. Further information on GRN can be found at the address given above, and if you are interested, please contact us via mail or subscribe to our mailing list. I would like to close my talk again with a picture from the beginning.
More openness can help us to make our research comprehensible and purposeful. Like climbing a mountain, many steps are necessary. In the following workshop, some of these steps will be discussed in more detail.
Events like this help raise awareness on the topic and spread best practices. Thank you for your attention and I look forward to your questions. Thank you.