What is good scientific practice for research software?

Video in TIB AV-Portal: What is good scientific practice for research software?

Formal Metadata

What is good scientific practice for research software?
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Software has become an essential component in basically every part of the research cycle. Still, there are no comprehensive guidelines how core principles of good scientific practice like transparency and reproducibility can and should be applied to research software. Further open questions are how to guarantee high quality of such software, how current and future researchers need to be trained and incentivized to generate sustainable software. To ensure the frictionless reuse and long term availability of software, dedicated framework conditions and infrastructure for research software need to be established. At the same time knowledge about open/libre software licenses needs to be disseminated. These and numerous other issues were so far only partially and inconsistently addressed by German universities and funding/research institutions. The Alliance of Science Organisations which consists of all large German research organisations (DFG, Fraunhofer Society, Helmholtz Association, Leibniz Association, Max Planck Society) and the universities has launched the ad-hoc working group "Research Software" as part of its initiative "Digital Information" to propose solutions for these issues. This working group aims to address the general questions regarding research software and to compile a set of guidelines and recommendations for the German and international research community. In this talk those questions and their potential solutions will be presented and discussed.

Related Material

The following resource is accompanying material for the video
Service (economics) Computer animation Software Meeting/Interview Lecture/Conference Software Information Family
Computer virus Group action Perfect group Computer animation Software
Sign (mathematics) Universe (mathematics) Resultant
Direction (geometry) Real number Resultant
Scripting language Medical imaging Service (economics) Software Meeting/Interview Software Volume (thermodynamics) Information Form (programming)
Crash (computing) Computer animation Software Software Mereology Resultant
Point (geometry) Service (economics) Game controller Database Function (mathematics) Medical imaging Sign (mathematics) Computer animation Software Software Right angle Information Whiteboard Resultant
Service (economics) Computer animation Software Software Information Office suite Computer Arithmetic progression
Code Simultaneous localization and mapping Sheaf (mathematics) Maxima and minima Database Mathematical analysis Data analysis Mereology Shareware Meeting/Interview Species Maize Partition (number theory) Scripting language Real number Heat transfer Mass Menu (computing) Protein Computer animation Software Estimation Dew point Utility software Compilation album Data structure
Random number Parametrische Erregung Real number Port scanner Bit rate Mereology Nominal number Tablet computer Software Autocorrelation Information Abstraction Pairwise comparison Position operator Service (economics) Variety (linguistics) Principal ideal Mathematical analysis Statistics Shape (magazine) Permutation Sign (mathematics) Error message Computer animation Software Phase transition Inference Non-parametric statistics Functional (mathematics) Task (computing) Resultant
Source code Process (computing) Software Source code Design by contract Software testing Mereology
Source code Service (economics) Sine Game controller Software Source code Phase transition Design by contract Software testing Information
Source code Arm Software developer Software Design by contract Software testing Student's t-test
Source code Service (economics) Voting Software Software Design by contract Software testing Right angle Information Library (computing)
Context awareness Process (computing) Open source Software Personal digital assistant Code Multiplication sign Order (biology) Website Right angle Term (mathematics)
Game controller Code Java applet Real number Projective plane Expression Source code Term (mathematics) Proper map Attribute grammar
Direction (geometry) Address space
Software Meeting/Interview Code Direction (geometry) Software 1 (number) Self-organization Selectivity (electronic)
Computer program Service (economics) Standard deviation Open source Software developer Multiplication sign Direction (geometry) Projective plane Mathematical analysis Horizon Mereology Cartesian coordinate system Softwarewissenschaft Medical imaging Type theory Software Software Freeware Position operator
Service (economics) Group action Software engineering Information Software Spacetime Softwarewissenschaft
Presentation of a group Computer animation Information Digitizing Universe (mathematics) Self-organization Digital signal Information Mereology Computer
Area Shift operator Virtual reality Computer animation Strategy game Integrated development environment Order (biology) Strategy game Software framework Integrated development environment Codec Spacetime
Software Software Bit Identity management Local Group
Point (geometry) Divisor Fiber bundle Cellular automaton Internet service provider Electronic mailing list Representation (politics) Self-organization Lattice (order)
Group action Goodness of fit Software Self-organization Basis <Mathematik>
Scripting language Service (economics) Game controller Usability Insertion loss Data model Type theory Arithmetic mean Type theory Software Different (Kate Ryan album) Function (mathematics) Software Endliche Modelltheorie
User interface Service (economics) Mathematical analysis Data analysis Function (mathematics) Mereology Cartesian coordinate system Data model Process (computing) Online service provider Type theory Software Function (mathematics) Software
Scripting language Point (geometry) Service (economics) Surface of revolution Data model Web service Type theory Software Integrated development environment Computer configuration Function (mathematics) Software output Information
Service (economics) Different (Kate Ryan album) Software Compilation album Video game Energy level Information Field (computer science) Compilation album Resultant
Service (economics) Software Computer hardware Compilation album Right angle Information
Standard deviation Group action Context awareness Game controller Service (economics) Open source Code Multiplication sign Numbering scheme Mereology Field (computer science) Frequency Computational physics Goodness of fit Mechanism design Lecture/Conference Software 5 (number) Computing platform Metropolitan area network Physical system Source code Default (computer science) Standard deviation Software engineering Software developer Term (mathematics) Cartesian coordinate system Inclusion map Mechanism design Process (computing) Software Web service Freeware Physical system
Energy level Open set Address space Self-organization
Service (economics) Computer animation Meeting/Interview Software Information
ch thank you very much for the kind introduction continuing
this alright family of pleasure being here and that today would like to talk about well good scientific practice and why and how we can apply this in in
research software and actually I'm I'm not
offering many our solutions here and and I'm more he to of put some questions out there and discuss them with you and I think it's a perfect place and I'm very glad to have you guys here and there to talk to you later about all these kind of things and yet what I'm mainly in mid-May representing a a group here and I will get a little into this and that that's maybe start with uh something
some more fundamental so in science and technology always formed of some some viruses so whenever
signs found something new it was implemented as technology and then discuss how basically uh to improve signs again uh 1 example would be the telescope which helped mankind humankind to find place in the universe
in which was basically a result of the research that became a itself of research to
or is he going do complete other direction the microscope the microscope was the base year scientific and real result of scientific research and then became a new scientific tool that helps us to well see the unseeable and these are 2 examples of of the wall of the interplay between science and technology and still today if you wanna
do science you need tools and
I'm pretty sure everybody who was doing science is using software so so there are actually tools that are found everywhere and nobody candidate designs without software and we should be aware of and this can be at a small
dedicated to all of these forms of looking at some microscope images it can be you nice article that he wrote it can be a Python script that can be up and make fire that is doing something for you and it can be even tho the dirty X so she volume puts some form in the end that is letting something for you this is scientific research effort also well known a high-performance
cluster of the software that is it needs to run and hide from Zgusta for 3 months before you do any result and hopefully the doesn't crash and all this this is part of a science today the so it's
unquestionable that science has a strong and even growing dependency on and on software if you think about big data that is
hitting science well about what people images that in software to crimes this data right you need these tools there is no signs without software today and minus catching maybe botflies that at some point you would use this to a generator a database of the butterflies again and researchers need maybe to classify all this so software is essential for science today the it can also
like let's say a microscope 1 like a telescope software can be also is the result of research found several of the board members of scientists out there working on generating better and software for solving a problem and for this we have to keep in mind that as with any other scientific output and there has to be a certain control of quality it has to stay accessible we had this in the previous talk well maybe sometimes of disappear or
matching disappears software should not disappear so it needs to be accessible it should decidable on we need to refer to the tools that the that the use and and all these kind of things need to be insured to well ensure also also of proper scientific progress and to we work in science and with with any other tool the and the big issue is that this is widely
ignored and that everybody is there the 1st thing they come into the office they turn off the computer but it's widely ignored how relevant software is in research and I hope and I'm I'm
pretty sure you are in the same boat we want change that and so all of us have kind of interest in software in research and this is why we use on forget to to have you all here in this room the while I'm about
partitions so I'm I'm busy working in research and I'm doing research and I'm writing on suffer and it's not very uncommon that you open up a papery 1 go through the a section of methods and you want read something and then you bump into these kind of things like in-house scripts as a base the the code was not given as part of the publication of all this was maybe an essential step in the data analysis so and it's it's like I'm not mentioning what kind of chemicals I used to perform a certain reaction
and this is very bad well there is another example may read about this this is a paper of the recent published in
PNAS and they checked and commonly used tools in the basically um f R and I lose the analysis to be the brain scans
and they saw a rather high false positive let's alot of things were detected as real results for as real findings that we're actually not so this actually blue maybe 15 years of research in that not very good and so this means software is an essential part again and and phase this
kind of um parts of the process have impact on research and science in our understanding of the world and there are
several issues and some of them mentioned that operate before so well what 1 1 things I also shows you we before so that the source code is simply not published was not available and or even worse from
proprietary and and will have the chance that I get the sine of if somebody does not publish the dirty script and in the paper well I might send the also paper as an e-mail and get it get the source code why it's a proprietary company I have a very low chance that I've ever see the source code so I cannot dive into this and see and what is if you have happening under the hood there is no
quality control in and in the research phase he happening I myself are often reviewing papers that are about suffer but it's not that the that the judges me please start in the source code and check if this stuff is working popping in of the style is good it's well-documented on this and things there are well often
download metatest that's nothing that is required so and good practices coming from such a development that standard in in in the industry are not widespread use in science it's
getting better but it's not that they're which should be also missing documentation arm is 1 4 of that because it's well it's effort in writing what might choose to know so why is doing so the and while there's also she was discontinued some the development was again a PhD student is riding on the tool for example is what causes for 3 or 4
years while many is doing a postdoc the software it stays as is and that's that and done well somebody at 1 of may use it and the libraries might change we we saw before that there dependencies might might change in due to this this offer will not be used to not be usable in more so the
long availability is not guaranteed on so many votes made offer the lab moves to another place that your elders that is written in the paper is broken broken nobody can access the stuff anymore well formed also the missing side ability right to be we need to build kind
of a reputation for this we need to sites offering this is still a kind of kind of an issue the and why is this the case well
1st it's kind of lack of awareness and prevention so many people are just not aware and as said despite the fact that the 1st thing the than they they turn of the computer and do some data crunching there that people are not necessarily aware of the relevance of software in research but if they are where they sometimes do not have the skills they we needed a basic education for that they are not and capable of doing this was easy mu to click some stuff and Axel together but making all reproducible ask about of this needs uh some skills and on everybody has shows also lack of time right and you need to invest resources time and money to to do to do this to get this done and this is linked to the next step there is a lack of incentives if I apply for a job summit nobody will they the um well look at my sources as well on the right road very nice coat nobody they lost OK where you nation the science papers and in order to to promote it's not so that that that they really you appreciate writing good code
what well while another thing is also the lack of funding the expression long-term funding will use this offers often going to develop suffers often a by-product of of of the project that deals with the scientific solution which is fine and but that once again if if let's say the project is done the PhD has to leave and the the offer is discontinued because there's no money for via also the regular mentioned
before there's no real quality control and uh people do not have to do this if they want publish a paper like this they can go go to Java published papers without a proper review of code and this kind of things and it might be in this maybe the mean um I mean think that you might even not publish your source code because you do not want others to make make use of this because the you that this is a large resource of this and who you just basically gather our condo attribution by using it and if you spread it while others can do the same thing can happen and this is also going back to the
incentives the and likely and there are numerous well initiatives starting addressing this that some of you guys direction wasn't very pleased to see you here
and that just to mention some of them
and as others there are many more but it's really a small selection and ones dating back to 1998 race you suffer Coventry which is basically the some of your nonprofit organization that aims to educate researchers in the use of certain software tools how to uh write code how to how to think about code on any suffer Coventry instructors to anybody
in exactly high within that so when we are basically trying to bring these commentators calls to people on that need it most scientists and the same direction over there
again suffer standards it is a great example of an of an initiative that is trying to push the the borders of and wealth of for development in science the there are other images like the WSS T W S S S P and then you have the Free Software Foundation recently published a position paper that well what motivates to have publicly funded so horizon 22 funded some of the some of the projects that this offer developed as part of this should be open source but and free but free software basically dance has some of analysis some stuff and this had to do there a society forge which is based in the vast uh the DFG-funded and prodding this direction and it was just recently closed these g program research suffer sustainability which are should promote the software that is already in a in a set state either as a prototype of or as a demonstrator and as said this was in april I think this was close to that the budget is nothing roughly 7 million euros and they 130 applications of quite a quite a big interest in this and this is for the DFG uh test than for this time but I type of projects also the Helmholtz Association has a task of
which is with access to and re-use of research software and they organized a brilliant workshop last year where a lot of the German community came together
and with a great exchange this so this was really brilliant I liked the lot and there's also something for our own resources research software engineers there is now a new group R is your ideas the uh Stefan Yoshi makes you have exactly thank you and he's here so that anybody's interested this anecdote they talk to him he's very pleased to to share some info about this and several more and if you're sitting in the audience and you're 1 of those pretty sorry but this is you know this of the
the the space is limited the but this not something
as I would like to talk about and this is part of the following so in 2008 the alliance of science organizations in Germany they started a so-called priority initiatives digital information and why in this as an alliance of recession they all the big players are included Helmholtz from overlapper's and so on and H K emotion electron conferences face either on presentation of the German universities so lowering the clock
and full of trying to world bring fresh air in regarding digital information into of the scientific community and the world just
to give you some background about this the priority areas of initiatives or research data which will research environments national licensing natural hosting strategies lead a friend which works in open access and you might be aware of this initiative because of the deal so they are basically doing the negotiation with shift publishers and and in order to welcome to come to agreement on off will see that this is space 1 of the
activities they are uh rather famous for right now and essence to some 16
actually since the workshop was as mentioned before as we have now what there is now this ad-hoc Working Group research suffer which I am part of and uh then identity as well and is as well so we have some some people in the audience uh that are busy involved in
this and this is basically what I would like to talk a lot talk about a little bit of it said basically on we represent the German on research community and we would like to bring these are the awareness about and this issue software in science into the
institution so he is a complete list of that and as said all the cell and its you all these organizations have 2 representatives in there and we have meetings and tried to compile the recommendations so now we have a rather diverse
backgrounds we scientist we are also a scientific so sentences factor providers so with this we try to bring a perspective or bring all the different perspectives into into a 1 room and and try to find solutions that match and everybody and it's it's not easy but stuff the um um quite endeavor but it's a very good starting point and my personal learned a lot in there as well I modusoperandi while is
rather boring well we just a comedy basis in government but the main the idea they to bring that kind of ideas together at compile recommendations and bring this back into the organizations and ideally in all
of the generate some some things to the can read if they decide where to put the money later on and and other things that I will talk about In a 2nd so and the common working group the and our guiding principle in a way is that I'm good scientific practice has to be also applied to resource and this is everything else this is actually under this assumption and in and the
question is now what does this mean and this is the as said in the beginning this is not solved this is something we need to discuss and this is why more happy to be here to talk with you about these kind of issues so
some things are mentioned already before its reproducibility it's controllability its transparency so forceful basically we need also quality in software and so free also well we should provide and ensure a reserve you reusability of software but actually some of loss of what is actually software and we have our kind of walking model and in this working model we decide would distinguish between 3 different types of software and this is well this can be debated and we had to the debates about this basically but this is our current working model while the the small tools that on 1 side the see the the dirty or a script that you quickly act together maybe even a set the back so she died on now and
those are things that can be a rather quickly generated and then a part of a distant well and dedicated data analysis process then there can be software applications that were developed
for the world may be more general purpose and this can be also seen as research output and as the 3rd they are as their online services and infrastructure that can world of what this what's on the way you and do stuff online by clicking through a web interface in doing in our analysis there and it it
can be also that softer started let's say as uh small Python script was then developed into a proper education and then at some point somebody's says OK I would like to make a web service out of this all this is basically a possible and basically there can be a revolution from a piece of software from this you wanted to 3 1 can also
even extend this input basic research environments or the the tools for exchange and data and so and there's about that and what this is that option
and it's very important that all these kind of levels are relevant and need to be addressed medium and Mendoza or in its own in own dedicated ways that is an exact needs and possibilities are rather dependent on the community you are and working and this is something that must be discussed in each of these communities so we can just formulate rather general and recommendations but it might be very different at the same my field the life sciences in we have different ideas of different needs than somebody that's the and and and in climate research for example so Is there a common questions for
example what means reproducibility in my community on your community do indeed a big identical compilation was sufficient just to have something and I can uh that using the roughly the same results from so this needs to be discussed and also how long do you need to provide access to the stuff we had this before 10 years he Steve did the DG is is asking for for 10 years but well why only this it doesn't have
to be longer well with with every year that you put on top of that the effort to uh make this uh accessible right was growing and also your dependency on hardware so it's not a trivial thing to think of but it has to be addressed and has to be discussed in each of the different communities so what do we
actually want what our aims which is all of which
is so free 5th what's center would give users a chance what would you ask him for a while you become part the different no different ideas so at at 1st our working group so that an ad hoc working group research often would like to raise awareness for this issue of this is very basic but it's it's very essential telling people yes research is heavily depending on software and this is why it needs to be addressed this that are important issues we would like to have standards for and softer implemented in good scientific practice recommendations soul it maybe a few scientist man you you might be where the DFG has some of this nite booklet nice booklet with a good scientific practice recommendations and um so Fisher the India it the we also introduce standards and mechanisms for quality control and and it make this part of the ready processes we also would like to and create institutional platforms or would like to motivate for this to make in some software and workflows and code basically uh a publishable and by doing so make it all societal so that you can clearly tell what kind of suffer used and this is then also linked to the reputation system as soon as something will it can help to uh build your your reputation inside of the community sure we need also to foster education of these computational skills and as said before the base year in R is a nice transition is easy transition but competition skills you might have to have a proper and teaching before you really get from from 0 to something from a prone productive so we need teaching the and again here the maybe the uh referencing to members of a couple the the and we might be also maybe a new career path we mention already before the research software engineers and even there it's not clear which to make the distinction we just had recently a discussion about that and we need maybe software librarian sold people who come more from from a rather traditional field but are able to handle data scientist might be also a new Newton path well and we also need to raise the awareness about legal aspects of what kind of of licenses can I use and should I use and ideally make open source the default this is good scientific practice that we have access to the source well and we
also should should try to facilitate the transition from let's say the small pre-tax to really services or lot applications that can be used by others and very important long-term funding we need for it uh we need new funding schemes to make software development sustainable over long time periods that so there's a
lot to do the lot of open
questions and and I said glad to the to discuss them with the U. S. and we are as community or as our own work and we
are at representing the German research community but in a way that should be sold still not on them on international level so I'm I'm also pleased to see that you people not on Germany also from other countries and we need to um addresses together so
that now let's do this together thank you so much and i'm umm yeah thank you
for a kind of at the the