Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives
Video in TIB AVPortal:
Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives
Formal Metadata
Title 
Open Reproducible Research in the Geosciences: Obstacles, Solutions, and Incentives

Title of Series  
Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2019

Language 
English

Content Metadata
Subject Area  
Abstract 
Many journals today encourage authors to share code and data used in studies, but only a minority enforces it, and rarely give journals guidance on how these materials should be shared. Since computational reproducibility encompasses and naturally follows sharing of the used materials, the adoption by journals is even less developed. The project “Opening reproducible research (o2r)“ designed and implemented solutions to support the publication of reproducible research considering different stakeholders, such as authors, readers, reviewers, and publishers. On top of that, we identified a couple of incentives for authors to make their research accessible, for example, by easily creating interactive (geo)scientific publications. In my talk, I will give a brief overview of the concepts and solutions developed in o2r. On top of that, I will provide insights into our followup project o2r2 including concrete cooperations with publishers.>

00:00
Presentation of a group
Projective plane
00:16
Presentation of a group
Projective plane
00:46
Arithmetic mean
Table (information)
Computability
Resultant
Directed graph
01:18
Theory of relativity
Table (information)
Term (mathematics)
Similarity (geometry)
Independence (probability theory)
Directed graph
01:53
Computational physics
Observational study
Observational study
Function (mathematics)
Projective plane
02:28
Observational study
Different (Kate Ryan album)
Function (mathematics)
Mathematical analysis
Arrow of time
Local ring
03:02
Observational study
Function (mathematics)
Multiplication sign
Weight
Figurate number
03:35
Observational study
Different (Kate Ryan album)
Function (mathematics)
Mathematical analysis
Figurate number
Directed graph
03:58
Dot product
Observational study
Function (mathematics)
Square number
Special unitary group
Graph coloring
Resultant
Numerical analysis
04:18
Projektive Geometrie
Computational physics
Observational study
Function (mathematics)
Phase transition
Phase transition
Projective plane
Universe (mathematics)
05:07
Vapor barrier
Phase transition
Universe (mathematics)
05:32
Phase transition
Multiplication sign
05:54
Mechanism design
Phase transition
Projective plane
Materialization (paranormal)
Icosahedron
Sphere
06:37
Phase transition
Mathematical analysis
06:59
Process (computing)
Connectivity (graph theory)
Mathematical analysis
Mathematical analysis
07:39
Mathematical analysis
Physics
Probability density function
08:10
Link (knot theory)
Sheaf (mathematics)
Mathematical analysis
Probability density function
08:41
Process (computing)
Transformation (genetics)
Connectivity (graph theory)
Outlier
Open set
09:28
Thermodynamisches System
Process (computing)
Mathematical analysis
Open set
Entire function
10:17
Functional (mathematics)
Connectivity (graph theory)
Lie group
10:43
Computer programming
Addition
Connected space
Connectivity (graph theory)
Disintegration
Moment (mathematics)
Extension (kinesiology)
11:35
Functional (mathematics)
Materialization (paranormal)
Resultant
12:02
Parameter (computer programming)
Function (mathematics)
Resultant
12:27
Inclusion map
Mathematics
Multiplication sign
Materialization (paranormal)
Mathematical analysis
Parameter (computer programming)
Set (mathematics)
Resultant
Substitute good
12:57
Inclusion map
Mereology
Substitute good
13:24
Rule of inference
Line (geometry)
Connectivity (graph theory)
Time zone
Mereology
Mereology
Variable (mathematics)
Subset
Subset
Connected space
Velocity
Event horizon
Function (mathematics)
Resultant
13:52
Subset
Wavelet
Slide rule
Range (statistics)
Mereology
Line (geometry)
Figurate number
Parameter (computer programming)
Resultant
14:47
Subset
Slide rule
Group representation
Multiplication sign
Configuration space
Mereology
Function (mathematics)
Figurate number
15:18
Subset
Goodness of fit
Process (computing)
Mereology
Electric current
15:44
Thermodynamisches System
Process (computing)
Mathematical analysis
Electric current
16:06
Addition
Observational study
Process (computing)
Process (computing)
Phase transition
Phase transition
Materialization (paranormal)
Stress (mechanics)
Ordinary differential equation
Open set
Electric current
Statistical hypothesis testing
16:39
Geometry
Observational study
Phase transition
Universe (mathematics)
Stress (mechanics)
Statistical hypothesis testing
17:02
Observational study
Phase transition
Stress (mechanics)
Cartesian coordinate system
Statistical hypothesis testing
Physical system
17:43
Observational study
Vapor barrier
Phase transition
Sampling (statistics)
Stress (mechanics)
Figurate number
Statistical hypothesis testing
18:03
Observational study
Process (computing)
Phase transition
Multiplication sign
Order (biology)
Stress (mechanics)
Statistical hypothesis testing
18:33
Statistical hypothesis testing
Observational study
Phase transition
Autoregressive conditional heteroskedasticity
Multiplication sign
Physical law
Stress (mechanics)
Stress (mechanics)
Statistical hypothesis testing
18:55
Observational study
Observational study
Manysorted logic
Phase transition
Multiplication sign
Stress (mechanics)
Open set
Statistical hypothesis testing
19:18
Connected space
Group action
Observational study
Process (computing)
Process (computing)
Phase transition
Data analysis
Parameter (computer programming)
Mathematical analysis
Stress (mechanics)
Statistical hypothesis testing
19:55
Connected space
Process (computing)
Inverse element
Mathematical analysis
Diameter
00:00
Marquez I'm from the Institute for 2 informatics at the University of Minster ch today I would like to talk a bit about our project or to which stands
00:08
for opening reproducible research so I think it's a good idea to start at the beginning with some
00:14
kinds of of an agenda so at the beginning of
00:17
my talk I would like to talk about reproducibility in general so if there's a problem just a little spot I yes there is a problem that's why I'm here that's why I'm working at this project and so it is to project and
00:32
afterward so I would like to I would like to talk a bit about or core concept which is the executable research compendium and what you can get out of it if you use it and finally if a brief overview of our in awful a project or to to I would
00:47
not be able to go into detail with all these topics but few free to us any questions at the end of this talk or after this talk on tonight at a conference dinner or by up to its target top or even you free so am
01:02
let's start with a quick definition of what we understand reproducibility to mean so with the reproducibility we mean computational reproducibility meaning that you can achieve the same results as reported in a paper using the same source code files and the same data flies and mice there's usually some confusion with
01:21
related terms for example replicability which means that you can come to toward similar conclusions with independent experiments so you collect new data to implement a new analyzes and then we achieve similar conclusions I would say that
01:36
reproducibility is more important than our replicability somewhat united that of their replicability is actually more important than reproducibility but all that I can convince you a bit that's on the disability is also very important for scientific work in all
01:56
project we focus on the geosciences for example landscape ecology geography planetology geochemistry so disciplines like that but I hope that so if you work with dates and coded you can also take on some of the messages that I have for you
02:13
so the beginning of our project we Ranariddh's little reproducibility study so I try to find papers that had cold and data attached and this is already the 1st issue so most of the articles that I looked at didn't have any code any data that nothing OK the
02:29
1st issues lack of materials but at the end of found some papers that had our code attached and data and I try to execute the analysis and this is already the 2nd issue most of the papers that it caught attached on not executable so I tried to open the Pfizer among local programming environment I try to run the court and what happens I had a lot of arrows a lot of Technische technical issues for example the library was deprecated the defied
02:56
directories for different so there were for the for the computer off the origin and then this and
03:03
not in general I would say and so yeah so this is the 2nd the 2nd issue that most papers on not executable up I got some papers running I looked at the output many at the figures and I compare these figures to doors figures in the original publication and this is the 3rd issue so most of the allpurpose different from what was reported in the papers I think this is a good
03:29
time to emphasize that I don't want to blame the original authors adjusting to encourage them to publish code and data under the net
03:37
executable and reproducible way so yeah I have 2 examples from a Jew scientific paper at the top man you see the paper the figure from the paper this is the original 1 and you can see what I got out of the source called when I reproduced or executed the analysis so there are a couple of
03:56
differences for example a background matters like 2
03:59
different the colors are a bit different to stay is a bit different so these are about design related issues could argue that the sun not so dramatic but let's have a look at this green square there 2 adults and these 2 dots are not available in origin and that so the
04:16
actual results the numbers also
04:18
different and this this is this is a rather serious issue of it so if you come back to the question is that of reproducibility is it a pretty big problem yes it is but
04:33
as that this a couple of research going on on this topic on reproducible research 1 of these projects is project to we had a 1st phase which was to use project starting in 2016 we were 3 researchers so much shortsighted from the university library of men and could eat dinner this was also from the Institute for G informatics the 1st phase recently came to an end but we asked for a followup grants and we got that accepted so we have 2 and a half more years to do some additional work but I would come to that
05:07
at the end of this talk as already mentioned to collaboration between the university library also of university library of men stuff with the
05:16
Institute of 2 informatics and publish so we work with Connes for example of prepare any course in a city we have a couple of ghosts 1st of all we would like to identify the barriers the barriers that prevent you scientists from publishing open reproducible research so I already
05:33
mentioned some technical issues but they also some cultural issues so court culture issues so for example there's a lack of time and lack of incentives for preparing code and data for publication another problem is the
05:48
lack of knowledge so most people don't know actually when they achieve reproducible research so when the app notice supplemental
05:56
materials there have been no mechanism to check if the supplements executable and if they are reproducible and finally there is also
06:05
something that is called the feel of being scooped which means that other researchers take debates take the court and publish a paper that the original author actually wanted to publish it doesn't happen very often but so I I would say the spheres stupid abstract but I'm still very common in our project we would like to be design and evaluate ways to overcome this burials and we would like to develop an approach to reap the benefits of reproducible research so let's be honest just being able to
06:39
execute the analysis in achieving the same results as nice but they are a lot of other and further opportunities that we get if data is available and if the source code is available in an executable way and finally we want to implement a platform that realizes the approach
06:55
and to test it with real users so we have a rather usercentered approach these are the basic
07:02
ideas so 1st of what we want to provide an easy way to publish the paper the data and in the end the analysis together usually it is 3 components are completely separated and disconnected and we were 1 2 would be develop an approach that allows office too easy published these things together then we want to integrate this with existing publication procedures surrealist
07:25
have to be realistic their publishers who have their own publication processes and this I have to be able to integrate our things into their infrastructure they will not change the the entire infrastructure for of our technology and finally we want to investigate potential
07:43
and sensors which might motivate office and readers to and publish open reproducibly research and to use it the core concept and our research is the executable research compendium get to always fought executable research to be published 1 way is to replace the P.
08:01
D. F. file entirely so researchers would not publish or submit PDF Phys but executable research company this is rather disruptive but there's another opportunity for
08:13
example to use executable research compendia as supplemental material so now if you haven't PDF the destroyed on to supplement a Matua section and then you have a link which brings you to executable research compendium enough to a folder which contains the dates and the cold and where you have to find all by itself or to run
08:32
it and to modify things and if we have that this paves the way for a new opportunities and possibilities for that but what of it offers 1st published and this is how it looks like this is to execute
08:45
research company some of you have seen that yesterday a poster but just a quick repetition the Executive Ruiz of compendium has 4 components which is the data ideally submitted as raw data so what we usually suggest researchers is to stop doing manual data processing so you should go
09:06
to X a file and indeed the outliers by yourself or manipulate the data by a set doing trade their data transformations we rather suggest doing that in the script because in 6 months if you look at your data you might not remember what you did that the data might not know why you deleted something or why the changed something and this is also difficult for others to follow all you did something for you did the data processing
09:31
and it should be included as a 5 and as open access of course because rather focused on open technologies and suggest using to submit yes he's then we have the software the software comprises the source code scripts what should be also open source and to encapsulate all that stuff in a Docker container to make it very easy run of the and if we have that in an executable where it can be it provide button basically we can click and we run the inter entire analysis just over with a single click we don't have to don notifies you don't have to
10:10
open them in your local environment we don't have to find out what to say or to stop and run it you can just click on the Singer button and that's it then we have the
10:18
documentation the documentation includes the xray optical so the paper on some instructions on how to run the code and metadata to me the dates are very interesting with respect to software and data so these 2 components include a lot of interesting information we can search for the for example libraries functionalities and all that stuff and finally we have
10:41
something that's could ride bindings lie
10:44
bindings link the 3 components so data software and documentation so we provide a way to link the text in the paper to the source code and the data to avoid that is 3 components are completely disconnected or separated and therefore we have you I budgets which had uses 2 insect with papers but also additional programming I would come to that in a moment so now if we have code and data
11:12
available in accessible and execute the way we get a lot of additional opportunities and these opportunities summarized in this extended workflow for readers this is not only for readers but also interesting for office so let's start with the 1st step discovery if we have data and cold accessible and executable we can't we have additional opportunities to search for papers the for papers that use a certain
11:37
functionality from a certain library implemented in a certain programming language and is also at this interesting father they can make their work much better find to get citations the next step inspection
11:53
then we doesn't reveals can look at the materials that we use to produce 2 results from the paper and some of us
12:02
can easily convey holiday of chief the results from the paper so it was somewhat Sundance difficult for for office to explain how that it's something they can just refer to the source code to the data and so then explain how they
12:16
there and the next of manipulation many papers have some parameters in the analysis and be interesting to know how these parameters affect the final output so if
12:28
the if the author changes if you change the parameters so what happens then this is interesting for it readers and reviews but also for was to show that the results are you the
12:39
robust fragile finally substitution substitution means that you can substitute materials for example the data set by an own data set if you have another 1 or are you can substitute the analysis if you want I already had a couple of times that you are the 2 like the fair principles and this is also due
13:00
confined here in this workflow so we have
13:03
discovery which has switches refers to find ability inspection which refers to accessibility reusability which refers to manipulation substitution and in pair interoperability is in part covered by a substitution and of course you also argue that each of 2 steps also contribute to read as understanding when
13:26
reading a scientific paper coming back to the UI
13:30
bindings remote you remember what the component of the ESE of lexically to research company in the bindings bindings connect those parts of the script and data subsets that we used to compute a specific computational results that's a Fig. 1 in a paper is also what I've presented yesterday on the postal so let's say we want to create an insect to Fig. 4 Fig. 1 in a
13:53
paper 1st step would be to specified the results which is Fig. 1 the 2nd step would be to select all the source code lines that I needed to run Fig. 1 in the 2nd
14:07
step it is required to specify the parameter that should be and that should be injective AZ the author says OK this parameter set to 0 and 2 comma decimal 0 and note I would like to make it a bit more objective to allow some other values and this can be done by configuring a user interface which in this case it's a slider with a range from 0 comma decimal 1 2 3 comma decimal 5 and it can be changed with a step size 0 comma decimal 1 is a small explanation that explains what the Purdue I which it actually does and that's it this is all we need to create an interactive figure and this is how it looks
14:48
like on the left side we still have to static representation of the paper with the static Figueroa and on the right side we can manipulate the figure so this is the slide of the configure in the previous step and if we change the slide on the value of the slide and then we can see immediately the output on the right side immediately is
15:12
true for this example but in other examples of course it takes a bit of time maybe a few minutes to recompute a figure with the new
15:20
value is that some special cases where we would need some further development for some pre rendering the all ports to this company are
15:33
executable research compendia versus common practice which is the PDF files is a good resource compendium of for for example 1 click wreckages so it's not any
15:45
more needed to don not the audit stuff to run it
15:48
in the local environment and so on to find out how to run it you can just click on a button and reproduced analysis it opens everything so you can look at all the data or
16:00
the court you can it's it's completely took their transparent nothing is hidden and you have
16:06
additional opportunities for reuse so you have some new interaction possibilities you can manipulate things you can look at that materials and so on but it's also important to integrate yes an existing scientific process so this is what I what I mention it at the beginning and I talked about the Publisher so we have to we have to think further than just defining and implementing he's and this is what we would like to do in
16:38
the 2nd phase of all so in all our
16:42
to 2 approaches we have another 2 and a half
16:46
years to work on the problem we have no we have to research was not free anymore this to the collaboration between the university library of men style and the Institute for geo informatics and the publishers conventicles publications and as the we have
17:04
a couple of new goes so we want tool implement some pilot applications so we want to integrate or concepts and technologies and tools and features into the infrastructure of the publishers and this is an
17:18
interesting step I think so we can be sure that the that publishers don't want to change the whole infrastructure just include or services but they're interested in these things and this reproducibility think and intake and take to it you think and so yeah we have to find old how we can integrate or things into their submission system the Best is some
17:45
barriers that we have to eliminate for all of us readers readers and publishers so samples need easy ways to create this year sees and particularly the interactive figures we have to provide low effort features for readers to use executable research compendium compendium and we want to find
18:05
always followed the reading process affects the understanding of the reader then we have reviewers reveals previously look at the P. file maybe a supplement to she as an order have more possibilities but it might be that
18:20
it also takes a bit more time so we have to investigate this issues as well and finally the publisher so how can we integrate all want technologies into the infrastructure of the publisher finally even Weishan we want to evaluate a couple of
18:35
aspects so that 1st of all the technical aspects of stress tests for example what
18:40
happens if 1 hundreds readers try to reproduce the paper at the same time for example then we have some user aspects of for example lobbying we want to law colleges users use the platform and the interactive tours restaurant to support to provide some
18:57
sort of support for Office to create executed research compendia and for readers to use it and we want to run some user studies to evaluate usability user experience and stuff like that 1st still
19:12
have some time all the formant
19:18
4 minutes and including discussion OK they're just a quick run through her what I mentioned here is that we want to
19:26
integrate all what see into the into the into the infrastructure of the publishers this is the 1st
19:32
idea that we have with the publication process it starts with the off local creates an unvalidated we such compendium which is this 1 here so it is basically the workspace including the OS groups and the data which then becomes an executable wrist compendium on all platforms when it is reviewed it becomes a Braque a review it execute elusive companion and if it gets
19:56
accepted then it becomes a published executable research which can then be used by other readers to create new executable research compendia OK so I am open for any
20:09
questions or comments thank you much and
20:13
thank you and others like
20:16
you would