Solid scenario’s for sustainable software
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 4 | |
Number of Parts | 13 | |
Author | ||
License | CC Attribution - NonCommercial - NoDerivatives 3.0 Germany: You are free to use, copy, distribute and transmit the work or content in unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/31030 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
13
00:00
SoftwareSolid geometryService (economics)InformationSet (mathematics)Medical imagingInformationSign (mathematics)WebsiteXMLUMLLecture/ConferenceComputer animation
01:01
Mathematical analysisMessage passingSoftwareEquals signObject (grammar)FunktorState of matterLevel (video gaming)SoftwareDifferent (Kate Ryan album)Object (grammar)Message passingInequality (mathematics)Equaliser (mathematics)Computer animation
01:57
Message passingSoftwareService (economics)Multiplication signFault-tolerant systemPosition operatorOrder (biology)Object (grammar)
02:21
SoftwareSelf-organizationGroup actionComputerVideo gameOrder (biology)Interactive televisionWebsiteData centerPresentation of a groupComputer animation
03:18
SoftwareService (economics)InformationSlide ruleSelf-organizationMaxima and minimaSoftware frameworkLevel (video gaming)ResultantCategory of beingSoftwareBitComputer animation
04:12
PlanningVector spaceCommunications protocolWeb portalData managementGroup actionSoftwareLecture/Conference
04:45
Task (computing)SoftwareGroup actionPublic domainGame theoryRevision controlFile archiverCoprocessorLibrary (computing)NumberNintendo Co. Ltd.Machine visionOptical disc driveProcess (computing)Computer animationPanel painting
05:43
SoftwarePublic domainComputer hardwareLink (knot theory)Copyright infringementOrder (biology)CD-ROMSoftwareLibrary (computing)Physical lawInformationProcess (computing)Insertion loss
06:14
InferenceWindowIntegrated development environmentCD-ROMPublic domainRevision controlWeb 2.0Service (economics)Copyright infringementMedical imagingLecture/ConferenceMeeting/Interview
07:09
Public domainSoftwareService (economics)InformationComputer hardwareLink (knot theory)Virtual machineLink (knot theory)Bit
07:31
Codierung <Programmierung>Rule of inferenceCoding theoryCodeSoftwareCompilerTerm (mathematics)Heegaard splittingComputer animation
08:13
CodeSoftwareService (economics)Coding theoryCodierung <Programmierung>Rule of inferenceCompilerSingle sign-onStudent's t-testRevision controlLink (knot theory)Formal languageWave packetVideo gameSystem administratorMultiplication signComputer programSoftwareWritingDirected graphGame controllerReal number
09:04
Codierung <Programmierung>Rule of inferenceCodeSoftwareCoding theoryCompilerCodierung <Programmierung>SoftwareRule of inferenceCodeComputer animation
09:42
Coding theoryService (economics)SoftwareInformationSoftware frameworkData modelSolid geometryCommunications protocolSoftware
10:03
Data modelSoftware frameworkSolid geometrySoftwareCommunications protocolCommunications protocolSolid geometrySoftwareMaxima and minimaEndliche ModelltheorieMultiplication signMereologyFile archiverCondition numberSoftware frameworkSet (mathematics)Point (geometry)Library (computing)View (database)Moment (mathematics)Slide ruleService (economics)1 (number)SurfaceComputer animation
11:15
Maxima and minimaStandard deviationInformation managementPublic domainDigital rights managementCondition numberCommunications protocolSoftwareData managementPlanningCommunications protocolSoftware frameworkCartesian coordinate systemDifferent (Kate Ryan album)Universe (mathematics)SoftwareComputer animation
11:57
SoftwareService (economics)PlanningType theoryData managementDifferent (Kate Ryan album)Mathematical analysisXMLUML
12:23
Template (C++)Public domainCondition numberMaxima and minimaDigital rights managementCommunications protocolSoftwareStandard deviationCondition numberStandard deviationTemplate (C++)Regulator geneMaxima and minimaPhysical lawSet (mathematics)Computer animation
12:46
Communications protocolDigital rights managementMaxima and minimaCondition numberPublic domainSoftwareService (economics)Standard deviationSoftware frameworkData managementSoftwareCondition numberMaxima and minimaCommunications protocolPlanningSet (mathematics)ResultantXMLUML
13:16
Condition numberData managementPhysical systemComputerUniverse (mathematics)Computer animation
13:46
Service (economics)SoftwareInformationCommunications protocolSelf-organizationEndliche Modelltheorie
14:33
Communications protocolSoftwareRule of inferencePublic domainPublic domainCommunications protocolRule of inferenceData managementMultiplication signCategory of beingSoftwarePerspective (visual)Computer animation
15:08
Service (economics)SoftwareInformationCommunications protocolGroup actionMaxima and minimaCondition numberLocal ringSelf-organizationProcess (computing)View (database)Communications protocolBoundary value problemSelf-organizationDifferent (Kate Ryan album)Point (geometry)WritingComputerEndliche ModelltheorieUniverse (mathematics)Expert systemProcess (computing)Projective planeWebsiteGroup actionPlanningField (computer science)Computer animationSource code
16:04
Communications protocolMaxima and minimaCondition numberGroup actionSoftwareService (economics)InformationProcess (computing)Planning
16:29
Computing platformProcess (computing)Pay televisionSoftware frameworkData modelProcess (computing)Endliche ModelltheorieGroup actionData managementCommunications protocolSet (mathematics)MereologySubgroupCartesian coordinate systemSoftwareComputer animation
16:58
Decision tree learningComputing platformProcess (computing)Pay televisionService (economics)SoftwareInformationSoftware frameworkData modelWordLevel (video gaming)Digital rights managementMoment (mathematics)Endliche ModelltheoriePlanningForm (programming)Computing platformXMLUML
17:20
IntelSoftwareSystem callWordService (economics)SoftwareLibrary (computing)Projective planeComputer animation
17:47
SoftwareService (economics)InformationComputer hardwareSet (mathematics)Revision controlMultiplication signPoint (geometry)BitEndliche ModelltheorieCartesian coordinate systemSoftwareSingle sign-on
18:35
Computer hardwareSoftwareService (economics)InformationElement (mathematics)Library (computing)Latent heatGame theorySoftwareEinsteckmodulXML
19:16
Function (mathematics)SoftwareForm (programming)Execution unitSelf-organizationFunction (mathematics)Projective planeArchaeological field survey
19:48
Service (economics)SoftwareFunction (mathematics)Form (programming)Mathematical analysisCircleOrder (biology)Element (mathematics)
20:19
Function (mathematics)SoftwareForm (programming)MereologyMoment (mathematics)Self-organizationSoftware
20:46
SoftwareSteady state (chemistry)Revision controlCodeSoftwareResultantWeightOffice suite
21:38
Service (economics)SoftwareInformationSoftwareGroup actionEndliche ModelltheorieProcess (computing)
22:03
SoftwareService (economics)Endliche ModelltheorieSoftwareSelf-organization
22:25
Link (knot theory)SoftwarePublic domainMereologyComputer animation
23:07
Digital signalPermanentSelf-organizationSlide ruleMeta elementDivision (mathematics)Service (economics)DigitizingSoftwareComputer animation
23:39
SoftwareInformationDigital signalPermanentSelf-organizationOpen sourceOpen setVirtual memoryRegular graphNumbering schemeResultantPhysical systemDimensional analysisProxy serverAveragePressure volume diagramMessage passingSerial portMultiplication signMessage passingSoftwareMeeting/InterviewComputer animation
24:19
Service (economics)SoftwareInformationMessage passingLecture/ConferenceXML
Transcript: English(auto-generated)
00:04
So it just says, conference on non-textual information, I thought, well, I'll put in lots of graphics this summer, so that perhaps some information will come across
00:21
even without the text. Furthermore, I borrowed most of the graphics from the website of the e-sign center where I work as well, and I'm allowed to do that, which does not necessarily mean, as you have learned today, that you can do that as well. But all the references to who owns what pictures
00:41
are on that website. You can understand me well enough? Yeah, okay, well, I don't know if that's going to help much. I'm going to talk about lots of things, but certainly want to emphasize two things which I think are important in this.
01:00
First of all, if you want to involve lots of communities, well, try to find out what stakes are in that for them. And the second thing is make a distinction between everything that has been, everything that we can do today, and everything that you will hope for the future.
01:21
But before I do that, I'll start with three take-home messages. The first one is do treat software sustainability and data stewardship on equal footing, at least policy-wise. We all know in practice you have to do something different for software than for data, but at the policy level, they should be treated equally.
01:42
The second thing is consider software and data as well as value objects, just like if you owned a patent, you don't know what the actual value is until you sell it or do something with it, but by treating it and consider it as a value object,
02:00
then it starts making sense for others to put some money on top of it in order to make more money out of it or add value to it, and so on and so on. And of case, then the third thing is then make the stakeholder positions explicit, define their role, and try to involve all.
02:23
So there may be very many stakeholders, but in order to make the thing life simple, I reduced to three groups. The first group are the governments, the funding organizations, those that, between quotes, things that are in charge of it all.
02:41
The last group are the executive organizations, like in the previous presentation, the people that own the computing centers and own the data centers, provide you the services, want to tell you how you should do it and how you should use their software to do the things on their sides, because in the end, it is their making a living
03:04
out of doing things for you. And of course, then in the middle are the scientists themselves and of course society at large. So governments and funding organizations are mostly into data stewardship and software sustainability
03:22
because basically they need to be accountable. They must be able to say how they spend their money. They must not be liable for, let's say, fortunes with data, you know, you make up your own data and you say, okay, that was the result of research. They can't handle that, so they want to be accountable. And governments also are responsible
03:42
a bit for cultural heritage aspects, which is also, I think, a general level. And in the end, they should, I come back to that in one of the next slides, set something like a general framework, something like minimum requirements for data stewardship plans, minimum requirements for software sustainability plans.
04:03
The executive organizations, the last category there, they should be cost effective and proactive, be goal focused and try to help you in the best way they can. They provide the infrastructures, they provide portals, they provide services. And in the middle, basically, that's us, the scientists.
04:23
We are there in because we want to accelerate discovery. We want to, let's say, design and publish and execute what we are doing. And in the end, I'll come back to that as well. That is the group that is supposed to write what I will call later protocols,
04:41
research data management plans, software sustainability plans. It's good to know and to realize, I think, that beyond the science domain, there are other domains that basically have the same kind of issues, like the archives and the libraries,
05:01
the national libraries, sound efficient motion pictures, the arts, gaming industry. Well, I give you the task to do something about the software sustainability of games. Well, it is very complicated. Even to label the game with a version number,
05:21
something that works under this device in Nintendo, but it doesn't work and you need that kind of device to work with it. I say, there's two processors entered. You know, it is a very complicated thing. We can, as a science domain, learn a lot about the experiences that people in these groups already have, and they can learn from us.
05:45
The cross-domain issues on software sustainability are, first of all, legislation. There is lots of contradictory law. There is lots of copyright violation that basically you have to do in order to do your job in the first place.
06:02
So the national library in the Netherlands has, I think, something like 15,000 CD-ROMs. Well, they are obliged to keep the information on that because, well, law says you should do that. But, okay, what they do is they make an image
06:20
of the CD-ROM because later there may not be any device to use it on, and then they want to run it, so they need an environment of some old version of Windows 3.1 or Windows 95. Well, those are two violations of copyrights. First of all, it is not legal to make a copy of CD-ROM.
06:44
It says on it, you should not copy this. And the second is, okay, if you want to use Windows 95, you may not have a license. So the only thing that they can do is to run these things in-house. They can't make a service out of it to put it on the web. And that's a common issue across many of those domains
07:03
that I just mentioned. Of course, there is the issue of obsolete and unavailable hardware, which basically then makes data unavailable, there is unrecoverable software, okay, something is lost, or it is not made runnable on any modern machine anymore.
07:22
And of course, the last one also, bit rot, link rot, and reference rot, you may be just too late. So splitting the problem may help, at least in terms of thinking. Most of what we are discussing today and that we are confronted with is basically legacy code,
07:44
everything that is already around. There is lots stored already. There is much put into Heritage Archives. The problem is, of course, what to keep and what in the end you will be able to throw away.
08:00
And sometimes restoration is worthwhile and sometimes it's not. But what should we do today to prevent that in the future we'll have still the same legacy problems? Because that is what we can do now. And I think the first thing and the second thing and the third thing is education.
08:22
So just to be very blunt, if you're a student and the first time you get some training or education in writing software, first say, version control, then do documentation and all that kind of administrative stuff. And only then you can think of the name
08:42
of the program that you're going to write and the language that you're going to choose. That should be between your ears because if it's in there, then it is not an additional effort once you're going to write codes in real life. And of course we can help an SSI link in the UK is thinking one of the first
09:02
that did that really very well, provide advice on how to do these things and try to improve the coding of the people that are working today. Of course the future is we hope much longer
09:21
than our past, certainly with respect to software. Most codes just have to be made. So it is then a matter of keeping the rules that we set ourselves but also think of rules when you can really discard software because keeping everything is impossible but there is no good reason to discard software, software.
09:43
Of course we can also try for the future to try to implement fair rules, not only for data but also for software. Make software in such a way that it is modular and so on and so on that this is easy to maintain. Hopefully the problem may solve itself in due course.
10:03
Well, back to the Solr scenario. It consists of three parts and for me at this moment the first part is the most important but the other ones are important as well. I'll sketch a solid framework model for a general set of minimum conditions followed by protocols.
10:22
That will be on the next slide. The second thing is in France they started setting up a software heritage archive. Danse in the Netherlands joined that initiative and a few others as well and some commercial companies. At least that is something you put in your software,
10:41
you cannot reuse it from that point of view at this moment, that's something that we should work on but they took over all the software that Google has already collected in previous time and they stopped that service and I think at least that should become a very solid new kind of library for all software.
11:04
And the third thing, something that is growing from today and that is European software sustainability infrastructure. I'll have a slide on that later as well. So back to this framework with protocols.
11:21
Well, basically the question is like this. What makes sense and what doesn't? Does it make sense to have to have a different research data management plan for your university to get your data established there
11:41
and then for your research council to do your application and then for the European horizontal 2020? Is there any reason per se why they should set different requirements for your research data management plan or for your software sustainability plan? I don't think so. However, it does make sense to have different
12:04
research data management requirements for different disciplines. So in archeology, they need an other type of research data management plan than in musicology or in astronomy. So why not focus more on that?
12:21
And that is why I did this stakeholder analysis earlier. So what you do need is at least a set of minimum conditions. You should apply laws and regulation. There should be available something like templates and examples that you can start to work for. There should be a reference to standards
12:41
and use of standards and some requirements to use standards unless you can prove you can't. And hopefully there will be some support resources as well. And we call that the general framework. Those are the minimum conditions that founders should set. But after that, within the set of minimum conditions,
13:02
all subdisciplines, one by one, should try to define their own research data management plan, their own software sustainability plan. They're called protocols in this document. The goal is, of course, a full engagement
13:22
of all scientific communities. I have witnessed the university, many more than one, university, we have a session today or next week research data management. And what do they do? Well, we bought this research data management system and it runs on that computer
13:41
and there we have the website and if you can't handle it, then you call so and so and they'll help you. And next thing we want, you have to save all your data there. And then, of course, at the back of the room, they say, well, what's in it for us? You only tell us to do that because you need to be accountable, not because you're interested in our data.
14:02
So you don't get the engagement that you want. But if you let users and scientists make their own protocols that fulfill the requirements of their own discipline for their own purposes, then that may be different. So by the model that I've just presented,
14:21
you basically engage the top-down requirements from funding organizations and the bottom-up knowledge of all the scientists that basically are then doing the work for themselves. The use of protocols are already, well,
14:41
common way of working in the medical domain and also in archeology, so it's not really new from that perspective. People are working now on things like that for research data management, but less so for software, but we can work on that and we should.
15:01
Rules and best practices and guidelines are being invented all of the time, but it would be very nice, from the point of view of a scientist, that he doesn't have to write different protocols for different organizations for basically the same purpose and the same boundary conditions.
15:22
So the process that we are starting now a few months ago is, at least in the Netherlands, to engage with universities and institutes, with our funding organization, with people around ORAS and 2020 NS3 projects
15:42
to discuss this model and then start setting up, physically, small groups, let's say two experts from external fields, like from a computing center or a data hosting center at Sherman, and three people from the discipline to actually compile these plans.
16:02
And then, and it is also an important step in the whole process, to publish these plans as being, I say, public documents, like in a scientific review paper, and if you can send it to a magazine that accepts that kind of things from your discipline, and it is a published document,
16:21
it is also a review document, and hopefully a broadly accepted document across your own discipline. And then you have this set of pre-approved protocols, and rather than writing a protocol in the process of writing your grant application, you just say, no, I intend to use protocol A, B, and C
16:42
for data, and one, two, and three for software, and it's taken part into the reviewing model. Well, there happens to be a group called Science Europe, where all the research councils in Europe are gathered. They had a subgroup on research data management. They have basically adopted this model already,
17:03
so at the level of the European research councils, we are setting steps. There's another entity called Plan Europe, platform for national eScience in Europe. They also kind of adopted this model, and advocating that at this moment.
17:20
Well, a few words about the Software Heritage Project was invented in France, basically by Inria, and they are, well, trying to set that up as something like the International Software Heritage, well, call it a library. The only issue is that there's always easier
17:43
to put in software than to get it out again, so the services to get it out still have to be created, but that is not the primary version. The primary goal was to have a complete set of all the software around. And then about this European Software Sustainability Initiative, how am I doing with time?
18:03
Well, okay, I think I'm fine. Well, actually, there's an application sent out called EUSSI, basically after the model of SSI in the UK, who set the example of basically how to do things in a very practical way.
18:21
The idea is to have a contact point in the end, to have a contact point in each country in Europe, where you can exchange your experience with, and also a little bit, as far as I'm concerned, to involve the other communities, like the gaming communities, libraries, national archives, and so on and so on,
18:41
because, and specific elements of the whole problem, they have more experience than sometimes we may have. So we can exchange best practices, specialized support, earlier solutions, existing and still functioning hardware, because that's something that's sometimes very helpful
19:02
if there's some equipment around that still can read your cartridges and so on, or run your software. And last but not least, criteria to keep or discard software or discard data, for that matter. Rewarding scientific research output.
19:22
Well, we all know that these days, there is much more scientific output coming out of research projects than just reviewed scientific papers. It has been said this morning and this afternoon already, more than once, but the problem is
19:41
how to give tools to funding organizations to actually, let's say, give credits to you, because it's very easy to just count your scientific papers and do an impact analysis on your papers, because that is also something that is very important, but we don't have any impact analysis tools for software,
20:03
we don't have any impact analysis tool yet for data. So, in order to make this circle around, we have to start somewhere. Well, FAIR may be one of the helping elements, you know, if you can show your data are FAIR,
20:20
then that may help in getting credits for that part of data. You can also try to apply FAIR to software. It's being done at this moment, so how to make it findable, accessible, but specifically, of course, interoperable and reusable. And if your software is FAIR, maybe that may be a reason for your funding organization to say,
20:42
okay, well, this is something that adds to your credit. Well, one other thing is that we have been discussing to set up something like a software seal of approval. A software seal of approval should be
21:00
kind of a lightweight methodology to at least approve software on the grounds of the way in which it was produced, in which it was created. You cannot go into the details of, rather scientifically, your code is more sound and produce better molecular dynamics results
21:22
than someone else's code, because, well, you will never end up somewhere because you have schools, you have traditions, you have all that kind of stuff. But you can still follow and trace back the way that software was created. Is there version control from the very beginning? Is there, what is it, comments,
21:43
so that you can know what is happening in the software? Is there a process defined in the group that created the software? All that kind of things, you can kind of objectively follow and give credit to, and then end up in a semi-automatic way,
22:01
within the software seal of approval. The model is, of course, modeled after the data seal of approval that already exists. And this only works if you set it up internationally. We are working on that. And if your software gets a seal of approval, then for a funding organization, they might start counting, okay, this guy has made so many pieces of software
22:22
and it received a seal of approval. Well, as already said, you can also try to apply the fair principles to software. We all know that it's already very difficult, except for in the genomics domain,
22:41
to define all the letters in fair, mainly the interoperability part, what does it mean in different disciplines. But the idea that is behind fair, namely about sharing and reusability and that kind of stuff, you can still apply to software. And a one-to-one link between the letters in fair
23:03
and what we do in the software world is being worked on. Well, one slide about DUNS, well, because they sent me here and it's there, they should have been here, but I'm here for them.
23:20
The mission of DUNS in the Netherlands for data archiving network services is to promote and provide permanent access to digital research resources. They are funded by NWO, the Ries Gers Council and the Royal Academy of Sciences and our first predecessor already dates back. They are very proud for that,
23:41
since 1964 is the first time they started collecting data there. I'll skip this one. This is an attempt that will apply both to data and to software later. This is an attempt to seriously implement fair in the real world
24:01
by giving stars to different aspects of the F, the A, the I and the R and well, have a resultant from that. If you want to know more about that, please contact me outside of this. And well, I will thank you for your attention and I hope you still remember the take home messages
24:21
that I repeated here for your convenience. Thank you very much.