Workflows for assigning and tracking DOIs for scientific software

Video in TIB AV-Portal: Workflows for assigning and tracking DOIs for scientific software

Formal Metadata

Workflows for assigning and tracking DOIs for scientific software
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area

Related Material

The following resource is accompanying material for the video
Service (economics) Computer animation Video tracking Software Projective plane Physicalism Information
Slide rule Service (economics) Presentation of a group Computer animation Open source Video tracking Software Information Quicksort
Coefficient of determination Computer animation Software Different (Kate Ryan album) Direction (geometry) Video tracking Software Website Musical ensemble Mereology Arithmetic progression Reading (process)
Service (economics) Multiplication sign Ultraviolet photoelectron spectroscopy Maxima and minima Code Food energy Writing Computer animation Software Software Hill differential equation Right angle Information Quicksort output
Service (economics) Code Stress (mechanics) Control flow Wave packet Revision control Computer animation Software Revision control Formal grammar Software testing Information Social class
Purchasing Service (economics) Focus (optics) Open source Code Computer animation Software Software Videoconferencing Information Quicksort Control system Physical system
Open source Code Lie group Multiplication sign Moment (mathematics) Source code Code Computer-assisted translation Flow separation
Windows Registry Open source Server (computing) Tape drive Source code Code Heat transfer Similarity (geometry) Data storage device Continuous integration Limit (category theory) Formal language Computer animation Software Software Software testing Software testing Information Quicksort
Service (economics) Slide rule Mapping Multiplication sign Real number Source code Electronic mailing list Mereology Metadata Type theory Computer animation Repository (publishing) Software Authorization Information Quicksort
Area Meta element Focus (optics) Perfect group Theory of relativity Service (economics) Open source Information Graph (mathematics) Projective plane MIDI Mereology Metadata Data model Computer animation Software Repository (publishing) Software Finitary relation File archiver Quicksort Endliche Modelltheorie Row (database)
Meta element Service (economics) State of matter Graph (mathematics) Source code Product (business) Data model Computer animation Software Term (mathematics) Lie group Software Information
Web page Service (economics) Source code Open source Letterpress printing Code Power (physics) Computer animation Internetworking Googol Software Integrated development environment Information
Data center Source code MIDI Code Digital object identifier Perspective (visual) Sic Computer animation Repository (publishing) Term (mathematics) Repository (publishing) Authorization Quicksort Physical system Physical system
Trail Service (economics) Asynchronous Transfer Mode Computer animation Repository (publishing) Software Source code Projective plane Information Number
Latent heat Voting Computer animation Software Zeno of Elea Repository (publishing) Software Quicksort Library (computing) Number
Point (geometry) Type theory Computer animation Software Software Computer-generated imagery Order (biology) Projective plane Mass Core dump Information Task (computing)
Dataflow Game controller Server (computing) Electronic data interchange Open source Hazard (2005 film) Multiplication sign Content (media) Mereology Computer animation Profil (magazine) Cycle (graph theory) Quicksort
Computer program Service (economics) Randomization Service (economics) Key (cryptography) Source code Proper map Formal language Particle system Graphical user interface Data mining Word Computer animation Software Software Representation (politics) Website Right angle Information Quicksort Social class
Service (economics) Computer animation Algebraic closure Bit rate Software Information Quicksort Formal language
Digital filter Duality (mathematics) Latent heat Computer animation Software Code Directory service Quicksort Category of being Formal language
Service (economics) Functional (mathematics) Service (economics) Computer animation Repository (publishing) INTEGRAL Software Order (biology) MIDI File archiver Self-organization Information
Service (economics) Presentation of a group Code Projective plane Computer animation Software Integrated development environment Internetworking Software File archiver Self-organization Information Quicksort Gamma function
Service (economics) Service (economics) Link (knot theory) Simultaneous localization and mapping Execution unit Source code Metadata Graphical user interface Computer animation Software Personal digital assistant Software Authorization Website Information Office suite Endliche Modelltheorie Bounded variation Pole (complex analysis) Abstraction
Reading (process) Meta element Computer animation File format File format Content (media) Data conversion Data conversion Digital object identifier Library (computing) Metadata
Service (economics) Projective plane Sampling (statistics) Maxima and minima Coordinate system Digital object identifier Metadata Particle system Coefficient of determination Computer animation Software Website Data conversion Information Library (computing) Library (computing)
Meta element Standard deviation Standard deviation Electric generator Computer file Code Modulare Programmierung Open set Metadata Graphical user interface Process (computing) Event horizon Computer animation Software Telecommunication Software Repository (publishing)
Meta element Service (economics) Code Code Continuous integration Plastikkarte Event horizon Computer animation Repository (publishing) Software Repository (publishing) Data mining Information
Service (economics) Electronic mailing list Digital object identifier Metadata Computer animation Software Thermal radiation Software Data conversion Information Quicksort Library (computing) Physical system
Slide rule Dynamical system Presentation of a group Motif (narrative) Software developer Canonical ensemble Mereology Metadata Local Group Latent heat Mathematics Different (Kate Ryan album) Term (mathematics) Software Core dump Repository (publishing) Set (mathematics) Information Aerodynamics Endliche Modelltheorie Implementation Condensation Service (economics) Planning Word Computer animation Software Repository (publishing) Personal digital assistant Internet forum output Text editor Quicksort Dublin Core Arithmetic progression Fingerprint Force
Inference Service (economics) Identifiability Computer animation Software State of matter Forest Software Data recovery Information Lattice (order)
Service (economics) Computer animation Software Information
who all physical can I use
these steps the thank you thank you for the introduction thank you for the invitation and a special thanks to TAB because obviously dataset would exist without a be the 1 of the founding members of getting this going 1st in German is DFG-funded project that internationally when
when data source from 2000 9 but by a have the luxury of being at the toward the end of the day so there's a lot of stuff that was sent to day that sort of this the groundwork works I can be quick in a few slides In particular dance presentation
does now because what I tried to do is the talk about well how would you do all this stuff if we agree on the principles and usually that's the
difficult part and then talked about a few issues and about a few solutions and I will go and look more detail it's it's very much work in progress because I think there's different directions people go and there's no community agreement how we do this and maybe this all probably there will be multiple ways how you can of sites software of course my focuses on those using DU wise uh as an integral part of this band this talk here is very legato dog which is a detailed read to find truffles and I'll
get back to to this this picture this lecture got from a colleague some of you might know and like we on was a better scientists at the good thing about
and I just wanna start xk CD comic that's always good which is about good codes and you see it because and
service that energy the lower right corner you sort of start again what is good cold
and so this more specifically scientific software this is actually already formed 2010 from Greg Wilson most of start ups Software Carpentry that there is common agreement that Scitex software apart and we have this many times and there is also a common agreement that may be and some of best practices for writing software was followed for scientific
sofware including things like formal training testing stress that's examples and if you go to
suffer covers a class I'm not the involved in this but please or 1 of the 1st things you learned if you write code to use version control minutes so the independent what to do this is my personal
history of western control systems that this just to remind everyone that the fixed traits of every 5 years or 10 years we use something else again and then the next step said on the
right thing you made which is coated with until next step is to make this available publicly and not just when you hot dry forward some system that's close up on and this a sort of very much assuming all the open source software purchase video my focus
so some of these would the most of it applies to closed source as well but there's just that and things to consider the this is all obviously they get top cats with the scientists question and I heard several times today will get trapped all commercial can go the way in the moment so that so that we get a lot of our institutions and I very much disagree with that I think it happens as far as public source code is the most popular place right now and there's some of the lot of advantages in having that if feed 5 years 50 % of the code is in hundred thousand places it's the findability black and also the sustainability is probably much harder to do this in 1 thousand places alone and place and 1 place it happens infrastructure and we all use private infrastructure as well so it's not perfect but I think it's a very good solution I think that before
we do something else we have to be very clear what exactly is missing and I will talk about a few things obviously this in a moment but in and of course once you make
your it your source code publicly available then of course there's other stuff you can build on top of that of course there's also technology but and test the use continuous integration like in this example called you all these things can be built on top of these and publicly available source for this of course happening in that sort of improves the quality of the software and here is 1 example of the Makoto again which is a piece of software I started working on a while ago on
there might be 2 things 2 big things that are missing in the of of world and I think the bucket world similar commercial and source codes places and what I'm not going to talk about this is so limits registries for example we have a lot what friend this morning where it's about a specific language a specific community which is
more about the publishing and and compiling etc. but that's a very important part that there was just enough time to go into this as well but obviously all these places also started the source code repositories so whether it's crime whether it's type where that's who we germs and what not so the 2 p is missing is metadata and I mean metadata Journal because source code repository of tons
of metadata but the metadata we care about and is already starts simple things like authors of which might be just a very simple things like real names set of user names and also just the mapping of was an author is not simply the list of people who have done at this 1 commits to resource compositori and then goes deeper in particular yeah was linking to other scholarly things so sort of what we heard earlier in the the and and in a
perfect example of something that support us this sort of nobody cares about in a in a sort of source propose generalized this finding information so I wrote this for software was this funding of and there is no metadata In general source from cluster and the other part obviously is the archiving because we are all aware that all source for repositories are full working with software that's the focus and not only is a plus to be everything whenever you want but also we don't know whether they're still around 5 years later In this is
just so I was talking about about metadata that particular when we talk about citations it's linking things together this research prophetess of project yeah uh that uh on the area on the from Mr. initial data service so that's my research papers and middle but never researcher publication grant of course of records in there and you have relations and to what it tried to sort of model this new metadata so that you
know the what's software for example was produced by the researchers at your institution in the last year of what funding collaborating with some other institutions etc. just something we all 1 4 papers and data and obviously also father of the research products the that take the state I don't expect to know what happened on the state and it's a friendly reminder this is when have started so it was everywhere now but it's actually 8 years and so if you feel that more long term
we don't know what we will be doing the source code 8 years and this is just a reminder that this is a real solve who called
decided to close down and 2016 around look read the fine print your source code will still available somewhere but this is it's not a place where you can work and and if that would happen to get trapped which probably is more popular now than we could ever was its power reason they close down that that would create a lot of problems and way beyond their own or community and that's another reminder it's where any day this was basically like the Internet will go down because everything depends on this because it's infrastructure and for top science unfortunately only the tiny piece and that's actually my
despite this page by so but because whereby catalpas they don't care about science
enough and the person who who make this possible was so the 1st author and the the principles paper items this has left it top so these sort of always various systems has gotten smaller so and then talked about this already so in this get
while the source code repositories don't give you the the long term archiving don't give you the matter you need then you have to go somewhere else and the solution that's most widely used from from our the perspective is using
is in the normal things that the wires
and these are the numbers and because signal the gives you hours for releases this since they're not from source code repositories but it's actually you 5 for this is that your 5 the wise 2017 this projects that we are now at little 4 thousand spots on the track to see nice growth and what you can also say that is still small numbers that's an
old was 70 5 % of the votes 26 thousand U. Wiesner for software the rest is mostly produce that has been doing this long anybody else spent and fiction than there's sort of very small numbers for everybody else that which is
interesting in this has been around for a while there's nothing specific or every repository could to exactly the same but but hasn't really happened the will be but this is the running example from yesterday FIL
something well I released on the top portion 2 point 3 7 pushed was in order archived g y and this is the type of some and have certainly is used to
to crowdsourcing the projects for example for ecological data so it's an interesting piece of software it's quite popular and the another example is again a
glottal source server brings up all the time and some example cycling I have have sort of control this this is actually from my own
book the profile solely constant hazard you because it's part of this work flow the King's College content it's very easy uh the tools dataset provides just push it to to market profile and that sort of 1 place where control the kind of things I have done in and of course goes for everything that's is a notable or in other places which brings up the interesting
questions what is scientific software I was thinking about this right for representation nobody we touched on this today I because we use of a of there's probably and nothing but it's not so easy It's not that old son who suffer use particle language a particular way of doing things mean and but for the things I was on the last 5 minutes that's important because what should you give the wise to random software in care with antibodies using it in 6 months allergic care about archiving so I think there is this is the
subset while total both software and and also it's interesting question if you wanna do some research on I don't know quite simple things and that's sort of news terrorist current what how languages are most proper languages today versus 5 years ago in writing scientific software astronomy that's a simple question and it's a not so easy many astronomers but because they have such a senseless infrastructure guy chancellors but should take ecology but also that because those our you get the idea so so 1 way of defining scientific software is uh something that's familiar to people using that so to put in program is a concept of duck typing so instead of saying this thing is whatever class you to say all this thing can do that thing so that still has a long history that this before programming its limbs like that looks like a duck quacks like a duck so it's a duck which means set software if somebody sites in the paper what sorting is a GUI somebody is in some other than scientific software and it's much easier than for example what the the service which very service by starts to track impact of scientific suffer what they have done instead this mining documentation and source code for key words that are scientific so that there's
other policy what you can do what I like this because it's very easy the I'm and
easier sort of something I wanna get to in the last few minutes when only knows which rates of he has some odd even you the closure language you if you're not biting the coat is sort
of worth listening to and use it as a great speaker and talks about things that go beyond the specific language on and something that support for him and that sort of 1 of his other works and is simple yeah the
and the workflow that we have it helps in the and then the downstream things when talking with about making talk consider that still too complicated but I think we have a nice profit
my name is can be on 1 example is if if this functionality that's missing in the top you integrations and you can do all kinds of crazy things and there's a place where you can find all this if you search for archive there's nothing but what stops and organizations provide an archiving services to how repositories which is what's order was doing
but it's there's a lot of other things have to be in this in order as well the so and
that has been mentioned and previous presentation of course you can also take a different approach saying archiving of software that speak here of by the suffer heritage project sort of somewhat similar to what the Internet Archive's links so that somebody else is doing that and I don't have to worry of Michael goes away at least there's a sort of full yeah the the on another
issue which is actually not quite simpler if you think about you have fought there and in 5 years you won executed this beyond having so so you the environment and the cold ocean is an organization that focuses on that this is an example of our code that's relatively straightforward but the general idea is we have this also this ordering using docker containers and and keep them around so that if you want re-run exactly the
same in 2 years and you can do this with the service they provide the user dataset the wise in this example here this actually lead to the publication so that's the source code and then we have units different do I because of the 1st quarter so the to to write as a research note the the and now I sort of into well you use the wise and what doesn't enable and this is just a simple example if you use such data site for everything gets software the keywords variation cancerous which what what this previous publication was about you find this cold ocean d metadata and there's links and metadata etc. so using the the death model if you will give something I then you can search everything that has a GUI for keywords in the title for abstracts for office etc. and because this is not perfect because you see that the author years at the poles in research so it's a service that provides so the authors which office of get lost in this case but it's just I think the learning curve on
the the not the top this is able only is a so we are because naming suffers really hot so data set we the basic stuff
just using it turned day names and I think you if we continue we have to switch to other animals so that they are not so many of the the only like 30 years so on so is is a
library of that I wrote which is about conversion of metadata and actually we launched we over content ingurgitation yesterday's all you have a DUI but you want to do I in different formats
and . sample in RDF for on other things so what's going on here and there's a library that we use for that and you see your adults he his library is a G 1 and of course because the data site and we like that we have to its own dog food and we have to think about how can we move this forward so of course we could have used but maybe we can do something else and if you look here
you see a Jason particle called Jason so that's adjacent 5 that has all the metadata we need submitted you and that's the workflow we can automate and that's how we did this and he I and use of extra funding to it with this tool so generate its own the wife who will and coordinators a project the NSF-funded which basically ended
and is in the face of writing everything up and sorting everything out that was led by property of from our side and and the child's fun and they 1 and that's the usual story didn't put it here about this so many metadata standards for
software so let's just straight 1 you know this this is the same about doing that but of course they don't and and if you take the stand as 1 of the topics for discussion for the software sedimentation rope if we can agree on this as a community is adjacent file and we do this for other and things as well as better for software packages and maybe in the future we can automate the process generation and you can take this and have a very straightforward process minting GUI and have to think
about things like the discussion we had laughter previous discussion maybe you only do this if you take this as a major worship on every worsen whatever policy you have and because you can have physical repository you can run code like the continuous integration tools so that makes it much much easier than that already easy voxels in
old that this then knows what's on our
continent radiation we can create a citation by chance also carries about what you see here is that there is no mention that this is actually software the reason for this is that the citation styles and the so sigh Proc citation style system for generating citations they don't understand software it is on the list and it's sort of another to do I think there's a good reason in the citation to say there's a dataset is a software because if the refusal along references for now people might buy 1 get on with the the metadata about the CSL doesn't understand it yet but and so just for use also encourages
learner of this and this will happen soon
and this is the last slide about the brief discussion in condensed presentation about worsening on the top of a some of the paper model given the worst and iterate and his works well not until you have 100 words in the whatever and both software and dynamic data is another example you have a lot of words that's 1 and also you have something that doesn't have worsened and so if the simplest way to figure the software is the core repository so what I showed you with the all example that's actually well how do scientists the software and people do that of course when they say we use are they don't care what what specific working on so there's a use case was very I don't have worsened and if you have worsened because you need them for specificity you have to link them together to this 1 canonical 1 that's sort of the discussion but then was referring to I think this is very much work in progress and but what will happen with dataset metadata next origin and of the year the editors careful plan 1 will support this this activation that's already Dublin Core and the other big change to the other questions this reduces really focusing on software because part of the documentation how to use status and that of software because the same concepts as different terms the the and this has already been mentioned by then the starting now and we on happy for
input and this is sort of what exactly we will do and how we go about this that sort of the happening now and I will finish
tool conference this 1 is the forest
living conference will happen Belinda's in October so it's a great place to have the wearing meeting person and just go in general and the conference yeah so I kind of like you to go there from there that ideally something interesting proposals where we open the assessment of the month and if you really crazy about persist identifiers we have some old conference that last year recovery of an extremely general in that also includes of course thinking what persistent inference for software we have thought that maybe Reykjavik was to call and this most likely state thank you FIL