Keynote: Incentives and rewards in scientific software communities

Video in TIB AV-Portal: Keynote: Incentives and rewards in scientific software communities

Formal Metadata

Keynote: Incentives and rewards in scientific software communities
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Why do scientists write, and share scientific software? Writing and sharing scientific software is a means to communicate scientific ideas for finding scientific concensus, no more and no less than writing and sharing scientific papers is. Important factors for successful communication are agreeing on an open source license, technological solutions to sharing development, using a language that is understood by many, a system of peer review that fosters evolution and diversity, and sufficient possibility for contributors to be recognized as individual, and awarded. This talk will discuss all these items mostly in the light of the ecosystem of R, a free software environment for statistical computing and graphics, which is largely driven by academics.
Service (economics) Multiplication sign Software Reduction of order Set (mathematics) Information Quicksort
Web page Slide rule State observer Computer program Statistics Open source Software developer Code Multiplication sign Port scanner Student's t-test Open set Räumliche Statistik Computer Term (mathematics) Software Text editor Identity management Software developer Open source Computer Line (geometry) Statistics Twitter Software Integral domain Video game Text editor Quicksort
Word Telecommunication View (database) Software developer Projective plane Shared memory Set (mathematics) Right angle Denial-of-service attack Open set Quicksort Twitter
Open source Software Key (cryptography) Telecommunication Telecommunication Quicksort
Cumulant Process (computing) Multiplication sign Video game Selectivity (electronic) Quicksort Procedural programming Extension (kinesiology) Number Physical system
Software Natural number Set (mathematics) Quicksort Protein Rule of inference Measurement Protein Measurement Time domain
Wiki Statistics Ordnungsstatistik Software Divisor Lie group Quicksort Protein Measurement Time domain
Programming language Software developer Interactive television Generic programming Analytic set Data analysis Run-time system Formal language Frequency Goodness of fit Lie group Quicksort Physical system
Greatest element Exterior algebra Software Source code Quicksort Computer Compiler
Standard deviation Word Open source Source code Authorization Quicksort Software maintenance Computer-assisted translation Communications protocol Window
Scripting language Implementation Email Server (computing) Constraint (mathematics) Open source Computer file Source code Virtual machine Database Instance (computer science) Directory service Mereology File Transfer Protocol Message passing Uniform resource locator Software Repository (publishing) Gastropod shell Website Lipschitz-Stetigkeit Quicksort Address space Descriptive statistics
Statistics Variety (linguistics) Open source Linear regression Variety (linguistics) Cross-platform Execution unit Run-time system Statistics Computer Run-time system Software Different (Kate Ryan album) Software Order (biology) Computing platform Computational science Quicksort Freeware Algebra Window Computing platform Library (computing)
Computer program Group action Workstation <Musikinstrument> Computer Statistics Computer Local Group Computer science Right angle Software testing Endliche Modelltheorie Quicksort Extension (kinesiology) Compilation album Asynchronous Transfer Mode
Computer program Statistics Service (economics) Divisor Data analysis Computer Local Group Software Source code Key (cryptography) Server (computing) Projective plane Open source Computer Code Computer network Instance (computer science) Line (geometry) Binary file Statistics CAN bus Software Personal digital assistant Interface (computing) Revision control Computing platform Social class Library (computing) Extension (kinesiology)
Service (economics) Cumulative distribution function Source code Set (mathematics) Web service Software Social class Source code Programming language Server (computing) Interface (computing) Surface Open source Code Computer network Instance (computer science) Binary file CAN bus Interface (computing) Revision control Computing platform Social class Right angle Quicksort Extension (kinesiology)
Functional (mathematics) Source code Workstation <Musikinstrument> Revision control Software Boundary value problem Error message Form (programming) Source code Server (computing) Software developer Expression Binary code Open source Code Computer network Binary file CAN bus Software Interface (computing) Revision control File archiver Computing platform Social class Extension (kinesiology)
Source code Context awareness State of matter Server (computing) Software developer Open source Virtual machine Code Computer network Binary file Revision control CAN bus Process (computing) Software Interface (computing) Revision control Computing platform Social class Quicksort Window Computing platform Form (programming) Social class Extension (kinesiology)
Cumulative distribution function Generalized linear model Computer Revision control Software Touch typing Physical system Source code Server (computing) Binary code Open source Code Computer network Bit Binary file CAN bus Interface (computing) Revision control Computing platform Social class Quicksort Resultant Window Library (computing) Extension (kinesiology)
System administrator Revision control Mathematics Different (Kate Ryan album) Software Information Contrast (vision) Computing platform Source code Service (economics) Server (computing) Software developer Open source Code Computer network Bit Binary file CAN bus Software Interface (computing) Revision control Computing platform Social class Quicksort Window Library (computing) Extension (kinesiology)
Source code Service (economics) Server (computing) Open source Code Computer network Binary file Variable (mathematics) CAN bus Software Interface (computing) Revision control Computing platform Website Social class Information Quicksort Extension (kinesiology)
Inclusion map Subject indexing Amicable numbers Installation art Order (biology) Quilt Electronic mailing list Quicksort Icosahedron Extension (kinesiology) Maß <Mathematik> Mutual information
Revision control Service (economics) Software Authorization Information Mereology Generalized linear model Maß <Mathematik> Metadata Landing page Descriptive statistics Address space
Service (economics) Execution unit Functional (mathematics) Source code Maxima and minima Infinity Generalized linear model Field (computer science) Subject indexing Software Information Traffic reporting Maß <Mathematik>
Beat (acoustics) Email Open source Twin prime Code Source code Binary code Mathematical analysis Maxima and minima Price index Gauge theory Dynamic random-access memory Infinity Sphere Revision control Inclusion map File archiver Musical ensemble Ranking Permian Summierbarkeit Maß <Mathematik> Window Physical system
Graph (mathematics) Plotter File format Time series Mathematical analysis Metadata Web service Software Software framework Information Source code Service (economics) Server (computing) Open source Mathematical analysis Code Computer network Instance (computer science) Binary file CAN bus Series (mathematics) Interface (computing) Revision control Computing platform Social class Extension (kinesiology)
Service (economics) Database Mathematical analysis Instance (computer science) Formal language Formal language Similarity (geometry) Mathematics Software Series (mathematics) Software Interface (computing) Data mining Authorization Right angle Information Quicksort Extension (kinesiology) Reverse engineering Electric generator
Service (economics) Graph (mathematics) State of matter Graph (mathematics) File format Electronic mailing list Mathematical analysis Mathematical analysis Twitter CAN bus Series (mathematics) Software Interface (computing) HTTP cookie Reading (process) Reverse engineering
Point (geometry) Service (economics) Statistics Web service Dependent and independent variables Interface (computing) Software Formal language Spacetime
Service (economics) Software Open source Software Computational science Right angle Information Quicksort
Scripting language Open source Computer file Decision theory Decision theory Closed set Computer file Mathematical analysis Parameter (computer programming) Parameter (computer programming) Run-time system Statistics Revision control Goodness of fit Software Operator (mathematics) Revision control Right angle Quicksort
Scripting language Scripting language Decision theory Computer file Revision control Numbering scheme Parameter (computer programming) Quicksort Statistics
Computer program Service (economics) Scripting language State of matter Projective plane Set (mathematics) Transformation (genetics) Run-time system Field (computer science) Software Authorization Information Procedural programming Cycle (graph theory) Library (computing) Arc (geometry) Probability density function
Rule of inference Service (economics) Service (economics) Constraint (mathematics) Graph (mathematics) Interactive television Mathematical analysis Parameter (computer programming) Transformation (genetics) Open set Software Order (biology) File archiver Video game Force Library (computing) Social class
Service (economics) Scripting language Observational study Computer program Interactive television Transformation (genetics) Group action Time domain CAN bus Latent heat Prototype Software Telecommunication Software Moment <Mathematik> Selectivity (electronic) Information Quicksort Local ring Library (computing) Arc (geometry)
Observational study Constructor (object-oriented programming) Computer program Group action Number Time domain CAN bus Latent heat Software Telecommunication Procedural programming Quicksort Local ring Physical system
Keyboard shortcut Group action Run-time system Computer Time domain Telecommunication Googol Software Information Local ring Abstraction Library (computing) Default (computer science) Biostatistics Service (economics) Observational study Temporal logic Projective plane Computer program Group action Statistics CAN bus Latent heat Revision control Multivariate Analyse Interpolation Local ring
Biostatistics Rule of inference Default (computer science) Keyboard shortcut Service (economics) Computer file Statistics System call Landing page Computer Revision control Googol Software Revision control Authorization Information Interpolation Website Library (computing) Default (computer science) Window
Email Keyboard shortcut Open source Authentication Execution unit Compiler Open set Counting Computer Software maintenance Twitter Googol Object (grammar) Software Software design pattern Repository (publishing) Information Abstraction Library (computing) Default (computer science) Window Enterprise architecture Execution unit Service (economics) Variety (linguistics) Temporal logic Run time (program lifecycle phase) Shared memory Code Menu (computing) Statistics Computer programming Similarity (geometry) CAN bus Component-based software engineering Software Revision control Design by contract Faktorenanalyse Authorization Quicksort Multivariate Analyse Interpolation Address space
Scripting language Service (economics) Scripting language Shared memory Parameter (computer programming) Perspective (visual) Software Software Conservation law Text editor Information Arithmetic progression Address space
Scripting language Laptop Computer program Scripting language Software Software Energy level Student's t-test Address space
Service (economics) Scripting language Software Information Address space
that so thank you for this nice
introduction and so my name is that suppose as hosts and but Professor at the Institute for Geoinformatics University of Münster but very honored to be asked to look to be invited
to give the set of all this talk I didn't have to think long about as saying yes I have to say that is the 1st time a talk about this topic so I'm sort of gradually going into a set of from 2 medics going somewhat into doing things that matter science like things reducibility and so on
and of talk about and so some of the things might be sort of you know strong strongly opinionated or just my observations or might be sort of slightly historically incorrect and I also have to apologize for uh for my slides always you know try to tell my students to have sort of visually overwhelming slides and for this topic I had a hard time instead of finding something interesting as so it's is really really sort of mental notes really for what I wanna for the things that I I wanna go through and and apologize for that so I'll briefly world and I why
not why am I standing here uh and an active contributor of open source software started contributing things for publishing things under GPL license is 1997 um active developer in a member of our community although that's not life you know there are no official terms when are you active for a member of or something like that to students the alignments for about 10 years I'm a professor and use it for G medics administered and I do a couple of editorship for Computers and Geosciences so given geosciences is sort of a geosciences Journal started in 1977 that maybe that's why it has this funny name at Computers and Geosciences as should be computers following the that is 1st a paper that published in 1997 was like a 15 page paper 5 pages of which were Prince of Fortran code has also made published elicits Fortran program which is printed and was like scans of line things right so it was not like this the sort of text that it was written and then you could you could use you could sort of and the that in your computer and use that as I'm also appointed Chief of general statistical software which is as an interesting there's often my talk pockets so if you think about that and it open access journal exists of 20 years and is completely independent of publishers so it has always been in the hands of of the editors and and all of these things so it is an open access and freedom of and I'm trying to convince the other editors that should allow people to actually pay and for publishing because people outside of the cell phones for these things but and some resistance there and I also hope the identity of the Fund
project Open repeals for research was a few words about that and I do some the recently set of started doing some boring and
some some Twitter usage and so for the simple reason that that is but with data science does a lot these days the but In general why do we do with these things scientists sort of vaguely trying to discern facts from false facts and so on that try to find consensus about this and do this by public discourse right we go to conferences we exchange ideas we uh we submit manuscripts and we try to get a set of reviewers except his manuscripts and and by that share our views in that so flood hopefully in most open sense and we use methods uh we try to use methods and develop methods about which is shared understanding exist right so
there is that that if I develop a method that Freud was really explain what it does right so this is a slightly different concept from distributing closed source software where you sell things but to the the little tiny details are not really a really sort of well and that is why I always try to advocate open source software for science is
basically a a better principle not necessarily better software but about a principle and basically we strive for learning communication for ultimate transparency and so and communication is a key activity for scientists so besides inventing new things which is of course very good that if we invent something but we're not good at communicating would
not so that it's hard to be successful as a scientist and success is something that sort of scientists like to like to have like all people in life I think in some sense that's a successful scientists you could well this is of course very sort of opinionated uh you could say successful
science and scientists are those who order people listen to right so scientists who uh obtained attention in some way obtain attention by the Publications are being recitations citations in the obtained citations they obtain research grants and so on and they build some kind of reputation which you could argue with some kind of a cumulative pension over over time span of a career and how do we measure these things what we do in in sort of job selection procedures we look at it as great things like that numbers citation that somebody obtained in Haiti and extensible of course acknowledging that this is very powerful and very so of a limited in anything but it is something we you know we take into account I mean it's sort of you know with all the other systems and so these are important things but if you look at some of
that is the drawback of these kind of things what are the things that are most cited which favors a most cited in general there wasn't very nice
they performed on Norton at all uh in nature of a I think a year ago or something like that they would talk to look at which were the most cited papers about sort of what have what you have in common set for this is that of Cyprus I think that people find Protein measurement with the fall phenol reagent to that seems to be about the way to measure proteins not familiar with but and so it's it tends to be not the papers that describe primary discoveries were scientific breakthroughs but rather so the paper said obtained need a lot of attention a lot of citations are often those steps uh are about tools that everyone uses his are a lot of these papers are it's not it's not a rule but it happens a lot so particular methods like here is a method there are particular tools that are use in a large mainly that of Western aid which might uses that puts that in citations get it can be sort of something the relative be trivial but as you would 1st have to do that and and it was convenient for everyone and everyone uses that so the software tools are there
sort of uh do well tougher papers to well in in the side
so as suffer to make things possible and made at things are are understood but interestingly the Journal of statistical software was published papers only about software so it's possible that it's about a method he published somewhere else that you
implemented in the software and software is based the data this subject of this paper that has 2 years ago or so ended up with and that factor of 5 which was over all the order statistics journals so statisticians will that like from there we are with all our fancy methods and then talk pages and so on there's a funny suffered journals of
stopping is which is interesting nodes I mean it doesn't mean much but for and always it's very skewed like the 1 2 papers that really do this so what is our who if he was ever used are only if it's good it's quite a few people would you consider yourself as an entire developers so as written in back axis so no 1 can of this good someone you not just talking for nothing so I as there are in some history uh art came out of of sort of it the becomes the ideas of what came out of the and the language s which was in which was developed at AT and T using the place where think the UNIX and C C + + these kind of things were developed also in this so same period uh and s this was environment and is so this is sort of considered now as a language it's a programming language it's domain-specific developed for interactive data analysis and so on as opposed to licensee was + which are considered generic purpose languages with which you can do everything but operating system should not going to do with us and then as was celebrated he was of more commercial was sort of in research institute and uh they
started selling things and I will call that as plus and then uh at interval broken up and you lose technologies later insightful and then later on to go on and and analytics company bald insightful and now is marketing is was that at some stage so there are 7
developing and they think that started in 1987 A S then alternative for days as closely as he was only as this you could buy a lower
before that I think you could and seventies or so basically you bought a computer and software came with it so this whole affair of marketing software is something that came later than then so in the early days you would place the bottom of the whatever the PDP 11 or something like that and then the oldest of full of software would be Darren's source code and you would get a C compiler with it and you can run it and it was so that that was kind of a new thing so this marketing of of software sort of developed in England along with the the wind Microsoft and so we'll get discovered successfully a
maintenance so instead of our so that isn't as isn't sort of word for maybe a letter playing on S or maybe because the 2 authors that started our at our standard started then names starting with our Ross Yucca and lower gentlemen I think a lot our and they started saying well we can let's see if we can do this scarce now create something that does the same thing as is estimated not source and then people hoped onto it and interestingly enough then so the early 2 thousands it started to have cats shown and become sort of a really stable useful z of the majority of the users only of that was definitely goes in academia standards that well and we don't need that as plus things I was standard it only ran in Windows the yeah so uh so this because using our and now interestingly typical who so that is the inheritor of of questions I'm not sure if it's a market there has develop this this source code they have basically their own
sort of a go these all of this is closely and developed it in a protocol they call it to go and fries in the work of to comprised of 1 or something like that and so it
is like an R implementation that is completely independent from the open source database for instance commercial companies could use without having any sort of software the constraints of things the interview held in part um In parallel sort of to this history there was the history of sites like mapped lit and start lip anyone heard of old enough to have heard of that's of that because of those were like sites ftp site where you could go to 1 final kind of things that other people you were happy to share so sort the source code opensourc host of things open source would not invented by the end of this in clear what where they were licenses with it or whether amend or something like that and I remember actually
going during the days that FTP was so slow that you could sort of collected machine they get like a directory and a description of how things work but he wouldn't download files that search for files would be like 50 kilobytes of something like that which was way too large so what you would do you would actually get instructions but would do is send an e-mail to a particular address which was not the cost of a sort of repositories and in the middle there and tell you I will sort of server can handle messages this side so sent me some e-mails that chunks of this file and since the pieces each of whatever 5 kilobytes or something like that and he would get all these pieces and put them together and you could run the shell script in the Belmont and a new location of the source that looks very deliberately did works very well is in the base year to do it in the next day or so you would have all these e-mails like 20 e-mails and put things that this is how software was tested
that kilometers lot things that there were around then that turned out to be useful and are still available in in our in it so think of the basically knew all algebra libraries like the blouse or order 1 of the last things like that or colds contributing doing 1 of many regression things the but later on came on so the artists and they are predicted or oxide and are advertises itself all side as being free software environment for statistical computing and graphics full stop it compiles and runs on a wide variety of Unix platforms Windows Middle West and so free is referring to a license free you know you have to pay for it but it's also free software and some open source software for statistical due to its dedicated software for doing including graphics and it's this cross-platform in which is which is useful and
there's also a lot of work to be cross platform to to sort of be in environment and trying to to scientific computation you really goes down to what you do with numbers and do that sort of makes sure that that works exactly the same way Windows as on MacOS X as on different Unix flavors required to units flips some the this not just minutes
so what is our was the our community but and it started as a set of by 2 people in the and and that good that grows from of uh status that think were the stations but then computer science of 15 years uh these people who like to to program enjoyed it so uh and basically but people who were interested in doing this like can we do this and we have to think about what it did sort of as the original goal of test was basically to interact with data programmatically as so pro Cisco computing in the seventies eighties was mostly before the right you have to compile and common between out and so what what s tried to do was a if I'm a scientist 1 half the ceiling data when they interact with it in due to the different models and these sort of not in the programming for on compilation mode or something like that right so that was the original goal and are inherited some of some of that
so it basically I think it evolved from a group of people using or at least familiar with S plus and extending is close so the distributing it as plus extensions uh into a group of people who believed in the need explicit this anymore but to its citizens oriented data analyzes oriented uh domain-agnostic so it basically used uh I
think across all empirical sciences where the way to do data analysis and the and as I said it started as a research project as many things as Linux started also for instance uh like you know can we do this and you try think the the artists so that is a strong
factor in the success of 1st of all as plus later on our is that it is extendable last week you can very easily sort of desist you that led our system of on his the program a few lines you could very easily wrapped up the things she developed and distribute them as an R package and the thing that is that the key as success it's to be uh where it is now and it doesn't have to be the dominant situation in statistical computing and case so that we can write our packages which are basically software libraries that contain software that contain documentation from a complaint examples and tutorials and and duties for all kinds of services right we can sort of the use of this method we can
also sort of augment are with classes for particular purposes and we can make interfaces for instance do whatever web services there are today are to other programming languages we 2 2 times that of to connect the complex world that we have right
now there isn't sort of a set of uh surface which is called from the Comprehensive R Archive Network which is as a set of services that did this to be it's not only the R source code but also the binary distributions of our so you get are there but also all the packages that people have submitted their so if you right back at you can see for yourself because also sort of submitted
there and in a couple of checks are being done like this documentation complete is if we have a function that you want that if exposed to users a document that on all these things and if it is then it is accepted there is not let's say people are not going to live where the software is right or something like that or whether the method is often not value for the community notes the local boundaries you know over 10 thousand our practice uncrowned made by something like
a thousand also so there is a lot there's a lot of people doing things so this is a radial a large community of developers doing all kind of things that are potentially useful for scientists is 1 network errors in our network that is used by the bias that stations call Bioconductor so this is not the only 1 but this is that I wanted to make clear remain long and could then it would only accept our practices source form as so anything binary that is there so if you have like a C + + binary would like to you know blood that in some kind of binary form it's not going to work this is purely source code for it should have a clearly expression which licenses listed it also keeps an archive of all the versions there and so people develop things further of have to adapt to whatever development somewhere and all except versions are active so if I did something 5 years ago with a package
that was set in the state it was bad and I can go back and find it versions still there's sort of it also does the job for developers basically to compile binary packages
if you're developer and do things with 4 Clausius steepest persons like that uh you know that this is a nightmare going from that going from Windows Linux and so on that all this kind of things but it's also taking care of you baby do
things on your class form and then you submitted them and context where dissenting also works your platforms and this gets binary and so I don't have to run Windows my machine which is which is a good thing
except if they run into trouble then I have to solve right it's not like if there can bring about new problems that they are active notice also that the people are people are helpful they're on your results vary volunteers you also know that we have touch um binary packages for Windows and MacOS X contain our completely
self-contained so if I want to sort of links to an over the last library or something like that that is on my computer I can't tell them you know in salivary and see if it works you know it has to be something so it will not the the bit binary distribution of those not make assumptions about all kind of things that you have or have not your system it only assumes that you run the predictive version
of Windows or Mac OS and so if there's a big library involved then is basically and that is a little bit of you know against system administration principles of of dynamic linking but it's very convenient because you really know what you get and you noted everyone gets the same thing so you can't get into troubles that somebody has an older version of a library with which some things will not work no because it's all in there so this seems like a small detail but it sort of really helps reproducibility across different platforms to finding out problems um of course then there's this is software so is all they now making sort of a contrast to scientific publications we just basically once it's published a static uh is about software development so there are changes in the subtext package might have may have to update the packets and packages may change and if then things don't work anymore then basically give put aside right so they're not
like completely remove that have put aside and and there are no longer feasible often wonderful but then also all and
responsive also the prospect has to become more often and may be adopted to other experiment so let's see if we can see
how things and so here is where the crown is there is is basically is a website and you can see that these people are more concerned with content then with variables right so it is not like that there really worried that so on crown every package it as if a if women and chronic and look at I can look at the uh D sort of
today we I can look at the in I can look at this sort of dialog download to there are versions and so they can look here's a country extension packages and you can see others thousand 581 available packages very good they can look at the list and they get a list of 10 thousand packages show which is not very easy they're order human written indexes to Beckett is overviews and so on uh if I go to a single package which I was
to say that just had here uh you can see
get you basically get the landing page for this package and you see that there's a lot of metadata that is part of the submission of the packets as so there's a description Beckett should have a version of it might depend on certain thing it might use
hotter caricatures of a crown packages and we can see these and they don't have links so I can go to these land pages then it may suggest other packages that are used needed for examples but not necessarily but then there are authors and authors have an e-mail address all those have to be persons as this is this is required
and there might be links to the places where you can upload but reports where the source code is maintained right this is this happens to be in the top but what you also see is as the citation field so this package actually has a citation um a that has a citation files so if I go to you can see that there are 2 publications that describe 1 from the 2004 in 1 from our 2016
that describe functionality all of this package that people are invited to work to refer to the and there is like a task few so these are these human written indexes that sort of
give you here is 1 that is about the analysis of spatial data and uh tries to give an overview of that so this fact is
mentioned in in that but then there is a reference manuals which contains all the all the commands and then their of yet spheres are basically tutorials that that duties thing and then there's the source code for Windows binaries and the Mac binaries and the all sources for the archive says well here's all the BOLD source code versions and you can look at the dates when they were published and can basically go through them all right this is basically also of course been some code of such kind of source code versioning system but this is basically all frozen are filled with that all the crown publications and so I can find the and so all this
metadata is so is there
and is is readable as a package bags depend on 1 another and this is an interesting phenomena that
their packages for instance that provide basic infrastructure like doing time series analysis due in dealing with spatial data dealing with omics data uh do we predict the plots of to providing both frameworks are accessed web services the lot of the or packages that give access to a popular analysis method by their about it is
that right to to read and write from databases and all you want don't wanna we invite invent those wheels if you want reuse and so there's an enormous amount of reuse of software and interestingly so if I write of packets and hope I could today and that depends on your package b then I of course make myself vulnerable to the extent that they could be has to wear so for our
practice to work of course so far has to right so if are changes dramatically then but I have a problem with it them but then if I look at history I can see well it's quite likely that our real do the same thing that Python did for instance going from 5 to 7 to Python 3 tiny changes in essentially creating and language but it will not it's it's sort of has never done that and it's very unlikely that it will do its thing but if I work with packages then so that make my back dependent use things from another package that of course I you know my fate is in the hand of the author of that could be
that might change uh back might fail in some states the offer might stop maintaining it and and all these kinds of things uh crumb list reverse dependencies that so as I just saw my pictures of you we use like 10 or once um and also it just programmatically access to the dependency graph so
here is my all in my own graph that I could find is is actually this is this is what I took from calling as the the and
published on Twitter and you read it in analysis and the analysis that and read and read face date in early involved and he's cities of the 10 thousand packets crown so the
bioinformatics and not here because they're not crown but he he's he's kind of centers and the things that work with C C + + a language interface this is uh modern Applied Statistics with S is the body of packages that this around the books that appeared very early in the past history this is
about point is about web services and this is the 1st facial so this is where the packet is the that something and I'm responsible for it being reused a lot and and does a lot in in the space of so not everyone uses that
but that is tends to solve a lot of problem and can that like 10 years ago so there is no strong you really to find and yeah so this is 1 way to look at how sort of how scientists or scientific software divorced work together and we use the and
reproducibility is something that sort of been always had in mind when they think about scientific computation it's an important aspect of scientific research because the credibility science is at stake and research is not reproducible so if you wanna do reproducible research you need to do things with uh open source of rights to uh understand what exactly is going on but
scientific the community quite apt to wear and reproducibility as the kind of they
don't believe it easily if somebody has a new method they wanna of see details in 1 is c 2 1 is sort of say well methods and have been arguments and uh arguments and decision so we need to sort of get too gory details of the methods and how the environment how they work uh and also the 1 argue that this is about science about engineering right so engineering you can get away with closed source software arguments good and n In principle our using oranges basket data files are away to a secured a question is of course is that right is data as of providing data files are scripted those the analysis and maybe say which version of R and package shortages I used is that enough or not right we have examples of packages this was pros and frozen operations on on gist soft
and and the that were allowed not
but must go to a a question is then of course will the script still running 10 years from now on the DFT requires basically German scientists in the world be other funding schemes that do the same thing if you set of finishing research that's funded by debt and a 10 years after that you are still able to sort of make sense out of that and as we do these things uh how we can do
that we actually working on this in practice practice in a project called opening reproducible research is to you from that uh of program can uh we look at the problem can papers we made executable and of course you can do that but you need to make some assumptions about how you how you do that and it's difficult to do that for a whole science field and we focus 1st on on our because it is relatively straightforward so the question is then how can how can you encapsulated is often
procedures in uh in into the environments we do that by Dr. containers and how a new set of integrated in the publication cycle so 1 thing you give 2 authors and say well you can put a PDF but where you put the data and the states and how the we look at that and later users of and also how can
a library offers service for validating archiving executable research compendia
and which interactions with scientists life available or had with Executable research papers have so this is like if you do this you can of course offer much more than just reproducing the fate you could also say well here's a graph with a committed inflective class you know a consumer benefits in the order of the constraints on parameter and see how the how the analysis would change if I do this and of course scientists should sort of primarily say well you can do
this with this kind of interactions and uh and for what yes so we are actively developing the prototype selection duties in so
with an eye on the clock I have 10 minutes ago stories there is a there the
torque is in the 12 right OK so I'll don't try to sort of finish in like 3 minutes and now I'm I call them a lot of for all sort of items that you managed in the in uh when organizing this this and watch the 1 of them is sustainability of software the question request we will our and will qualities in 10 or 20 years from now well you can can only look at sort of how things develop from now and and the people who are involved in that it is sort of for the last 5 academic people who are involved so not people with commercial stage will will are able to uh to uh sell the whole had to show the whole show
there thoughts about who owns the copyrights the whether legal aspects what is how do we manage the licenses across the whole procedure there conferences there are increasing numbers submissions to dedicated journals on software systems that the software also the yard general uh there's for increasing number of books published and there is no not
construction which sort of collects acts like the Linux Foundation doors the industries they're interested in lot that have an interest to
fund and support local or user groups uh the groups a community infrastructure projects and we'll see a rise of of data science right some some some kind of new thing we still need to find out what exactly means that the software and and
environments like are seen to play a role or seem to fit relatively well as I
mentioned there is a citation uh process in in are and this is I think 1 of the reasons why a lot of uh scientists are interested in contributing you can sort of advertising on papers but there also the full citation so if I have a package that it
does not have a paper the and say
well for instance but if there is a user package RG then it doesn't have a
citation file that are generated default citation so it says well as of the authors and there's a fascinating and there's a version and there was the call landing page right so that makes the citation missing so any packets reuse I can cite as an interestingly people do that I think we should do it it's a good it's a good thing but they do
it and um it the citations are actually being picked up by Google Scholar and they end up in your attract and so although they not that's a formerly journal papers they considered as published software and uh school topics not so that's good the software
papers is not a question mark like how you describe what enterprise software there are no strict guidelines there different uh journals unit open research offer Journal of open source software deontologists off that all due edit out there all have different assumptions and their trends in our programming practice that I will not to have really going to sort of things are dynamic people are moving from suppression to get happen so on using Twitter knows countries uh there software sharing and legal aspects there is a whole set of legal uh the guidelines which you have to comply to when you when you submit thank it's to crawl there's a lot of things you can we can look at the people sort of and ownership and copyright source of taken care of so concluding
is uh the argument is a healthy at least in my perspective and growing community that feels a lot of demands that scientists have it stimulates to work reproducibly by offering a sustainable infrastructure and there are always like everywhere tensions between progress the conservatives there he to went
pretty naturally and there's still a lot to do to make scientists share data scripts workflows along with publications workers usually properly cite the software they use this is an important thing uh editors reviewers have to take care of it essentially an L so we we know we
hope that they will sort of increasingly right uh better software because a lot of scientists kind of you know self-made programmers with all consequences of that uh we now try to address of
a lot of these challenges basically uh any idiot educational levels of the uh countries in my bachelor master students basically where we use veil of reason produces please the hand in the assignment it's data script and a document that basically puts them together I think IPython notebooks and so long as standard tools to to develop things our mark on our house well
and and and you will lose and if few


  662 ms - page object


AV-Portal 3.19.2 (70adb5fbc8bbcafb435210ef7d62ffee973cf172)