Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives

Video thumbnail (Frame 0) Video thumbnail (Frame 397) Video thumbnail (Frame 1162) Video thumbnail (Frame 1953) Video thumbnail (Frame 2380) Video thumbnail (Frame 2831) Video thumbnail (Frame 3322) Video thumbnail (Frame 4369) Video thumbnail (Frame 5201) Video thumbnail (Frame 5883) Video thumbnail (Frame 6380) Video thumbnail (Frame 7665) Video thumbnail (Frame 8310) Video thumbnail (Frame 8860) Video thumbnail (Frame 9921) Video thumbnail (Frame 10344) Video thumbnail (Frame 11102) Video thumbnail (Frame 11966) Video thumbnail (Frame 12717) Video thumbnail (Frame 13615) Video thumbnail (Frame 14202) Video thumbnail (Frame 15194) Video thumbnail (Frame 15979) Video thumbnail (Frame 16717) Video thumbnail (Frame 17387) Video thumbnail (Frame 18041) Video thumbnail (Frame 18687) Video thumbnail (Frame 19432) Video thumbnail (Frame 20093) Video thumbnail (Frame 20812) Video thumbnail (Frame 22172) Video thumbnail (Frame 22721) Video thumbnail (Frame 23228) Video thumbnail (Frame 23679) Video thumbnail (Frame 24140) Video thumbnail (Frame 24886) Video thumbnail (Frame 25557) Video thumbnail (Frame 26567) Video thumbnail (Frame 27069) Video thumbnail (Frame 27477) Video thumbnail (Frame 27953) Video thumbnail (Frame 28369) Video thumbnail (Frame 28940) Video thumbnail (Frame 29872) Video thumbnail (Frame 30263)
Video in TIB AV-Portal: Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives

Formal Metadata

Title
Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives
Title of Series
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2019
Language
English

Content Metadata

Subject Area
Abstract
Many journals today encourage authors to share code and data used in studies, but only a minority enforces it, and rarely give journals guidance on how these materials should be shared. Since computational reproducibility encompasses and naturally follows sharing of the used materials, the adoption by journals is even less developed. The project “Opening reproducible research (o2r)“ designed and implemented solutions to support the publication of reproducible research considering different stakeholders, such as authors, readers, reviewers, and publishers. On top of that, we identified a couple of incentives for authors to make their research accessible, for example, by easily creating interactive (geo)scientific publications. In my talk, I will give a brief overview of the concepts and solutions developed in o2r. On top of that, I will provide insights into our follow-up project o2r2 including concrete cooperations with publishers.>
Presentation of a group Projective plane
Presentation of a group Projective plane
Arithmetic mean Table (information) Computability Resultant Directed graph
Theory of relativity Table (information) Term (mathematics) Similarity (geometry) Independence (probability theory) Directed graph
Computational physics Observational study Observational study Function (mathematics) Projective plane
Observational study Different (Kate Ryan album) Function (mathematics) Mathematical analysis Arrow of time Local ring
Observational study Function (mathematics) Multiplication sign Weight Figurate number
Observational study Different (Kate Ryan album) Function (mathematics) Mathematical analysis Figurate number Directed graph
Dot product Observational study Function (mathematics) Square number Special unitary group Graph coloring Resultant Numerical analysis
Projektive Geometrie Computational physics Observational study Function (mathematics) Phase transition Phase transition Projective plane Universe (mathematics)
Vapor barrier Phase transition Universe (mathematics)
Phase transition Multiplication sign
Mechanism design Phase transition Projective plane Materialization (paranormal) Icosahedron Sphere
Phase transition Mathematical analysis
Process (computing) Connectivity (graph theory) Mathematical analysis Mathematical analysis
Mathematical analysis Physics Probability density function
Link (knot theory) Sheaf (mathematics) Mathematical analysis Probability density function
Process (computing) Transformation (genetics) Connectivity (graph theory) Outlier Open set
Thermodynamisches System Process (computing) Mathematical analysis Open set Entire function
Functional (mathematics) Connectivity (graph theory) Lie group
Computer programming Addition Connected space Connectivity (graph theory) Disintegration Moment (mathematics) Extension (kinesiology)
Functional (mathematics) Materialization (paranormal) Resultant
Parameter (computer programming) Function (mathematics) Resultant
Inclusion map Mathematics Multiplication sign Materialization (paranormal) Mathematical analysis Parameter (computer programming) Set (mathematics) Resultant Substitute good
Inclusion map Mereology Substitute good
Rule of inference Line (geometry) Connectivity (graph theory) Time zone Mereology Mereology Variable (mathematics) Subset Subset Connected space Velocity Event horizon Function (mathematics) Resultant
Subset Wavelet Slide rule Range (statistics) Mereology Line (geometry) Figurate number Parameter (computer programming) Resultant
Subset Slide rule Group representation Multiplication sign Configuration space Mereology Function (mathematics) Figurate number
Subset Goodness of fit Process (computing) Mereology Electric current
Thermodynamisches System Process (computing) Mathematical analysis Electric current
Addition Observational study Process (computing) Process (computing) Phase transition Phase transition Materialization (paranormal) Stress (mechanics) Ordinary differential equation Open set Electric current Statistical hypothesis testing
Geometry Observational study Phase transition Universe (mathematics) Stress (mechanics) Statistical hypothesis testing
Observational study Phase transition Stress (mechanics) Cartesian coordinate system Statistical hypothesis testing Physical system
Observational study Vapor barrier Phase transition Sampling (statistics) Stress (mechanics) Figurate number Statistical hypothesis testing
Observational study Process (computing) Phase transition Multiplication sign Order (biology) Stress (mechanics) Statistical hypothesis testing
Statistical hypothesis testing Observational study Phase transition Autoregressive conditional heteroskedasticity Multiplication sign Physical law Stress (mechanics) Stress (mechanics) Statistical hypothesis testing
Observational study Observational study Many-sorted logic Phase transition Multiplication sign Stress (mechanics) Open set Statistical hypothesis testing
Connected space Group action Observational study Process (computing) Process (computing) Phase transition Data analysis Parameter (computer programming) Mathematical analysis Stress (mechanics) Statistical hypothesis testing
Connected space Process (computing) Inverse element Mathematical analysis Diameter
Marquez I'm from the Institute for 2 informatics at the University of Minster ch today I would like to talk a bit about our project or to which stands
for opening reproducible research so I think it's a good idea to start at the beginning with some
kinds of of an agenda so at the beginning of
my talk I would like to talk about reproducibility in general so if there's a problem just a little spot I yes there is a problem that's why I'm here that's why I'm working at this project and so it is to project and
afterward so I would like to I would like to talk a bit about or core concept which is the executable research compendium and what you can get out of it if you use it and finally if a brief overview of our in awful a project or to to I would
not be able to go into detail with all these topics but few free to us any questions at the end of this talk or after this talk on tonight at a conference dinner or by up to its target top or even you free so am
let's start with a quick definition of what we understand reproducibility to mean so with the reproducibility we mean computational reproducibility meaning that you can achieve the same results as reported in a paper using the same source code files and the same data flies and mice there's usually some confusion with
related terms for example replicability which means that you can come to toward similar conclusions with independent experiments so you collect new data to implement a new analyzes and then we achieve similar conclusions I would say that
reproducibility is more important than our replicability somewhat united that of their replicability is actually more important than reproducibility but all that I can convince you a bit that's on the disability is also very important for scientific work in all
project we focus on the geosciences for example landscape ecology geography planetology geochemistry so disciplines like that but I hope that so if you work with dates and coded you can also take on some of the messages that I have for you
so the beginning of our project we Ranariddh's little reproducibility study so I try to find papers that had cold and data attached and this is already the 1st issue so most of the articles that I looked at didn't have any code any data that nothing OK the
1st issues lack of materials but at the end of found some papers that had our code attached and data and I try to execute the analysis and this is already the 2nd issue most of the papers that it caught attached on not executable so I tried to open the Pfizer among local programming environment I try to run the court and what happens I had a lot of arrows a lot of Technische technical issues for example the library was deprecated the defied
directories for different so there were for the for the computer off the origin and then this and
not in general I would say and so yeah so this is the 2nd the 2nd issue that most papers on not executable up I got some papers running I looked at the output many at the figures and I compare these figures to doors figures in the original publication and this is the 3rd issue so most of the all-purpose different from what was reported in the papers I think this is a good
time to emphasize that I don't want to blame the original authors adjusting to encourage them to publish code and data under the net
executable and reproducible way so yeah I have 2 examples from a Jew scientific paper at the top man you see the paper the figure from the paper this is the original 1 and you can see what I got out of the source called when I reproduced or executed the analysis so there are a couple of
differences for example a background matters like 2
different the colors are a bit different to stay is a bit different so these are about design related issues could argue that the sun not so dramatic but let's have a look at this green square there 2 adults and these 2 dots are not available in origin and that so the
actual results the numbers also
different and this this is this is a rather serious issue of it so if you come back to the question is that of reproducibility is it a pretty big problem yes it is but
as that this a couple of research going on on this topic on reproducible research 1 of these projects is project to we had a 1st phase which was to use project starting in 2016 we were 3 researchers so much shortsighted from the university library of men and could eat dinner this was also from the Institute for G informatics the 1st phase recently came to an end but we asked for a follow-up grants and we got that accepted so we have 2 and a half more years to do some additional work but I would come to that
at the end of this talk as already mentioned to collaboration between the university library also of university library of men stuff with the
Institute of 2 informatics and publish so we work with Connes for example of prepare any course in a city we have a couple of ghosts 1st of all we would like to identify the barriers the barriers that prevent you scientists from publishing open reproducible research so I already
mentioned some technical issues but they also some cultural issues so court culture issues so for example there's a lack of time and lack of incentives for preparing code and data for publication another problem is the
lack of knowledge so most people don't know actually when they achieve reproducible research so when the app notice supplemental
materials there have been no mechanism to check if the supplements executable and if they are reproducible and finally there is also
something that is called the feel of being scooped which means that other researchers take debates take the court and publish a paper that the original author actually wanted to publish it doesn't happen very often but so I I would say the spheres stupid abstract but I'm still very common in our project we would like to be design and evaluate ways to overcome this burials and we would like to develop an approach to reap the benefits of reproducible research so let's be honest just being able to
execute the analysis in achieving the same results as nice but they are a lot of other and further opportunities that we get if data is available and if the source code is available in an executable way and finally we want to implement a platform that realizes the approach
and to test it with real users so we have a rather user-centered approach these are the basic
ideas so 1st of what we want to provide an easy way to publish the paper the data and in the end the analysis together usually it is 3 components are completely separated and disconnected and we were 1 2 would be develop an approach that allows office too easy published these things together then we want to integrate this with existing publication procedures surrealist
have to be realistic their publishers who have their own publication processes and this I have to be able to integrate our things into their infrastructure they will not change the the entire infrastructure for of our technology and finally we want to investigate potential
and sensors which might motivate office and readers to and publish open reproducibly research and to use it the core concept and our research is the executable research compendium get to always fought executable research to be published 1 way is to replace the P.
D. F. file entirely so researchers would not publish or submit PDF Phys but executable research company this is rather disruptive but there's another opportunity for
example to use executable research compendia as supplemental material so now if you haven't PDF the destroyed on to supplement a Matua section and then you have a link which brings you to executable research compendium enough to a folder which contains the dates and the cold and where you have to find all by itself or to run
it and to modify things and if we have that this paves the way for a new opportunities and possibilities for that but what of it offers 1st published and this is how it looks like this is to execute
research company some of you have seen that yesterday a poster but just a quick repetition the Executive Ruiz of compendium has 4 components which is the data ideally submitted as raw data so what we usually suggest researchers is to stop doing manual data processing so you should go
to X a file and indeed the outliers by yourself or manipulate the data by a set doing trade their data transformations we rather suggest doing that in the script because in 6 months if you look at your data you might not remember what you did that the data might not know why you deleted something or why the changed something and this is also difficult for others to follow all you did something for you did the data processing
and it should be included as a 5 and as open access of course because rather focused on open technologies and suggest using to submit yes he's then we have the software the software comprises the source code scripts what should be also open source and to encapsulate all that stuff in a Docker container to make it very easy run of the and if we have that in an executable where it can be it provide button basically we can click and we run the inter entire analysis just over with a single click we don't have to don notifies you don't have to
open them in your local environment we don't have to find out what to say or to stop and run it you can just click on the Singer button and that's it then we have the
documentation the documentation includes the x-ray optical so the paper on some instructions on how to run the code and metadata to me the dates are very interesting with respect to software and data so these 2 components include a lot of interesting information we can search for the for example libraries functionalities and all that stuff and finally we have
something that's could ride bindings lie
bindings link the 3 components so data software and documentation so we provide a way to link the text in the paper to the source code and the data to avoid that is 3 components are completely disconnected or separated and therefore we have you I budgets which had uses 2 insect with papers but also additional programming I would come to that in a moment so now if we have code and data
available in accessible and execute the way we get a lot of additional opportunities and these opportunities summarized in this extended workflow for readers this is not only for readers but also interesting for office so let's start with the 1st step discovery if we have data and cold accessible and executable we can't we have additional opportunities to search for papers the for papers that use a certain
functionality from a certain library implemented in a certain programming language and is also at this interesting father they can make their work much better find to get citations the next step inspection
then we doesn't reveals can look at the materials that we use to produce 2 results from the paper and some of us
can easily convey holiday of chief the results from the paper so it was somewhat Sundance difficult for for office to explain how that it's something they can just refer to the source code to the data and so then explain how they
there and the next of manipulation many papers have some parameters in the analysis and be interesting to know how these parameters affect the final output so if
the if the author changes if you change the parameters so what happens then this is interesting for it readers and reviews but also for was to show that the results are you the
robust fragile finally substitution substitution means that you can substitute materials for example the data set by an own data set if you have another 1 or are you can substitute the analysis if you want I already had a couple of times that you are the 2 like the fair principles and this is also due
confined here in this workflow so we have
discovery which has switches refers to find ability inspection which refers to accessibility reusability which refers to manipulation substitution and in pair interoperability is in part covered by a substitution and of course you also argue that each of 2 steps also contribute to read as understanding when
reading a scientific paper coming back to the UI
bindings remote you remember what the component of the ESE of lexically to research company in the bindings bindings connect those parts of the script and data subsets that we used to compute a specific computational results that's a Fig. 1 in a paper is also what I've presented yesterday on the postal so let's say we want to create an insect to Fig. 4 Fig. 1 in a
paper 1st step would be to specified the results which is Fig. 1 the 2nd step would be to select all the source code lines that I needed to run Fig. 1 in the 2nd
step it is required to specify the parameter that should be and that should be injective A-Z the author says OK this parameter set to 0 and 2 comma decimal 0 and note I would like to make it a bit more objective to allow some other values and this can be done by configuring a user interface which in this case it's a slider with a range from 0 comma decimal 1 2 3 comma decimal 5 and it can be changed with a step size 0 comma decimal 1 is a small explanation that explains what the Purdue I which it actually does and that's it this is all we need to create an interactive figure and this is how it looks
like on the left side we still have to static representation of the paper with the static Figueroa and on the right side we can manipulate the figure so this is the slide of the configure in the previous step and if we change the slide on the value of the slide and then we can see immediately the output on the right side immediately is
true for this example but in other examples of course it takes a bit of time maybe a few minutes to recompute a figure with the new
value is that some special cases where we would need some further development for some pre rendering the all ports to this company are
executable research compendia versus common practice which is the PDF files is a good resource compendium of for for example 1 click wreckages so it's not any
more needed to don not the audit stuff to run it
in the local environment and so on to find out how to run it you can just click on a button and reproduced analysis it opens everything so you can look at all the data or
the court you can it's it's completely took their transparent nothing is hidden and you have
additional opportunities for reuse so you have some new interaction possibilities you can manipulate things you can look at that materials and so on but it's also important to integrate yes an existing scientific process so this is what I what I mention it at the beginning and I talked about the Publisher so we have to we have to think further than just defining and implementing he's and this is what we would like to do in
the 2nd phase of all so in all our
to 2 approaches we have another 2 and a half
years to work on the problem we have no we have to research was not free anymore this to the collaboration between the university library of men style and the Institute for geo informatics and the publishers conventicles publications and as the we have
a couple of new goes so we want tool implement some pilot applications so we want to integrate or concepts and technologies and tools and features into the infrastructure of the publishers and this is an
interesting step I think so we can be sure that the that publishers don't want to change the whole infrastructure just include or services but they're interested in these things and this reproducibility think and intake and take to it you think and so yeah we have to find old how we can integrate or things into their submission system the Best is some
barriers that we have to eliminate for all of us readers readers and publishers so samples need easy ways to create this year sees and particularly the interactive figures we have to provide low effort features for readers to use executable research compendium compendium and we want to find
always followed the reading process affects the understanding of the reader then we have reviewers reveals previously look at the P. file maybe a supplement to she as an order have more possibilities but it might be that
it also takes a bit more time so we have to investigate this issues as well and finally the publisher so how can we integrate all want technologies into the infrastructure of the publisher finally even Weishan we want to evaluate a couple of
aspects so that 1st of all the technical aspects of stress tests for example what
happens if 1 hundreds readers try to reproduce the paper at the same time for example then we have some user aspects of for example lobbying we want to law colleges users use the platform and the interactive tours restaurant to support to provide some
sort of support for Office to create executed research compendia and for readers to use it and we want to run some user studies to evaluate usability user experience and stuff like that 1st still
have some time all the formant
4 minutes and including discussion OK they're just a quick run through her what I mentioned here is that we want to
integrate all what see into the into the into the infrastructure of the publishers this is the 1st
idea that we have with the publication process it starts with the off local creates an unvalidated we such compendium which is this 1 here so it is basically the workspace including the OS groups and the data which then becomes an executable wrist compendium on all platforms when it is reviewed it becomes a Braque a review it execute elusive companion and if it gets
accepted then it becomes a published executable research which can then be used by other readers to create new executable research compendia OK so I am open for any
questions or comments thank you much and
thank you and others like
you would
Feedback