We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

FAIR Data and Software - Summary

00:00

Formal Metadata

Title
FAIR Data and Software - Summary
Title of Series
Part Number
9
Number of Parts
9
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Expert systemGroup actionOpen setPoint cloudType theoryDegree (graph theory)Virtual machineLocal GroupSoftwareInformation technology consultingGroup actionPlanningLink (knot theory)Web pageXMLUMLLecture/ConferenceMeeting/InterviewComputer animation
Expert systemGroup actionOpen setType theoryDegree (graph theory)Virtual machineLocal GroupSoftwareInformationSoftwareInformation technology consultingLevel (video gaming)Well-formed formulaPoint (geometry)Different (Kate Ryan album)Office suitePhase transitionOpen setInclusion mapClosed setDegree (graph theory)XML
Expert systemGroup actionOpen setPoint cloudType theoryDegree (graph theory)Virtual machineLocal GroupSoftwareOffice suiteFeedbackField (computer science)Archaeological field surveyFreewareComputer animationLecture/ConferenceMeeting/Interview
Expert systemGroup actionOpen setLocal GroupType theoryDegree (graph theory)SoftwareVirtual machineData managementFeedbackSoftwareForm (programming)BitGroup actionComplex (psychology)Physical systemProjective planePlanningField (computer science)Hand fanPoint (geometry)Arithmetic progressionOnline helpInteractive televisionFocus (optics)Disk read-and-write headKey (cryptography)IdentifiabilityFreewareMessage passingSoftware developerSoftware engineeringView (database)XML
View (database)Web pageLecture/ConferenceMeeting/Interview
Expert systemGroup actionOpen setPoint cloudType theoryDegree (graph theory)Virtual machineLocal GroupSoftwareRepository (publishing)Internet service providerComputer animationXML
Lecture/ConferenceMeeting/Interview
Library (computing)Local GroupSoftwareSource codeCoding theoryNetwork topologyLocal GroupSlide ruleGreatest elementComputer animation
Projective planeRevision controlXML
Time evolutionSmoothingVector potentialDigital object identifierData managementTelecommunicationSoftware repositorySource codeStudent's t-testSource codeSingle-precision floating-point formatAxiom of choiceSoftware industrySystem callGoodness of fitCASE <Informatik>Very-high-bit-rate digital subscriber lineComputing platformCodeComputer animationMeeting/Interview
Computer programCodeFunction (mathematics)Software testingScripting languageComputerCodeFunctional (mathematics)Scripting languageArchaeological field surveyMultiplication signComputer animation
Line (geometry)Function (mathematics)Software testingExecution unitCodeComputerFormal languageBuildingLevel (video gaming)Functional (mathematics)CASE <Informatik>Online helpModule (mathematics)Scripting language
Software testingProcess (computing)Information managementFunction (mathematics)CodeComputer fileFeedbackCode refactoringSoftwareType theorySoftware testingCode refactoring
SoftwareCodeSoftware testingProcess (computing)Function (mathematics)Variable (mathematics)Computer fileFeedbackMathematicsSoftware testingFunctional (mathematics)Revision controlWeightChord (peer-to-peer)CodeComputer animation
CodeSoftware testingProcess (computing)Function (mathematics)Computer fileFeedbackCode refactoringFormal languageProgramming languageComputer programmingSoftware testingWebsite
Test-driven developmentSoftwareMeeting/Interview
Software testingProcess (computing)Function (mathematics)CodeComputer fileFeedbackCode refactoringType theoryMathematical optimizationSystem callNeuroinformatikMultiplication signCodeOrder (biology)Proper mapStatement (computer science)
System identificationControl flowBenchmarkProgramming languageMathematical optimizationMereologyMeasurementMeeting/InterviewXML
System identificationControl flowFunction (mathematics)CAN busWikiWordBenchmarkMereologyCASE <Informatik>CodeIterationSoftware testingServer (computing)Software bugMultiplication signQuicksortCycle (graph theory)Game controllerComputer animationXML
Multiplication signTask (computing)Series (mathematics)Table (information)Cartesian coordinate systemNeuroinformatikComputer animation
Task (computing)Mathematical optimizationBitSoftware developer2 (number)
AverageAverageCASE <Informatik>Multiplication signObject (grammar)Curve
Digital object identifierComputational scienceAverageCASE <Informatik>Projective planeMultiplication signPrincipal ideal domain
Archaeological field surveyInformationSoftwareSoftwareType theoryLine (geometry)CASE <Informatik>CodeControl flowAuthorizationLecture/ConferenceMeeting/InterviewXMLProgram flowchart
Computer programmingVideoconferencingProgramming languageXML
SoftwareSystem programmingYouTubeData managementSoftware frameworkNeuroinformatikStorage area networkHand fanRow (database)VideoconferencingSlide ruleGoodness of fitInformationProjective planePoint (geometry)Open setElectronic mailing listBlogBitGodXML
SoftwareGreatest elementMeeting/InterviewXMLUML
XMLMeeting/Interview
XMLUML
Transcript: English(auto-generated)
So on the FAIR principles, just really briefly, I would like again to highlight this open consultation on the FAIR Data Action Plan. On the GitHub page here is the link and not much has been happening since it was opened in June.
The consultation phase opened in June, so please if you want also to be heard on a higher level and want your voice to be heard there, feel free to voice your opinion how to put the FAIR data principles into practice.
Another important point, FAIR does not mean open, also not open data. We covered a lot about this aspect in the last couple of days, but it's also one of the messages, please take it home.
Data and software can also be FAIR if it's protected, if there's sensitive information included. So please do not hesitate, but also try this credo as close as necessary, as open as possible. We learned that there are different degrees of fairness in the last couple of days,
also regarding research disciplines, many of you also mentioned examples from your very own experience dealing with data and research software. So again, thank you all also for this. I think Kathleen and I, we will also take a lot of your experience back home
and we will try to improve this workshop also according to the examples you mentioned. Back to the office mostly. Back to the office, yeah. So yeah, maybe not back home at least this weekend, we will need some rest.
Again, also please feel free if there's something you still want to mention, if there's maybe an example of your own research discipline that you want to be maybe to have included in the upcoming workshops, please also feel free to include this in this LIME survey format, there's a free text field to include any more feedback.
So also if you did not have the chance in this project to, in this workshop to mention it yet, or it's just popped up recently, then just please feel free to use this feedback form.
Of course FAIR and its origins, I'm sorry, as we have learned, is a new development coming, it came up in 2016, but besides the acronym FAIR, of course, the points surrounding it, like persistent identifier systems and so on,
they have been around much longer. So actually the FAIR principles, again, were to highlight this focus on machine-to-machine interactions.
We heard a little bit about data management plans and a bit maybe more on the new side, the software management plans, and the whole bit, of course, more complicated and complex issues about licensing, just as we heard of today, which help keep data and software FAIR.
But as we have heard, lots of it is still in a transition progress. And maybe the most important message here is you as researchers, you as scientists, you are the key messengers in your institutions.
You can communicate what you have heard in the last couple of days within your institution and with your fellow researchers. And we strongly encourage you, please do that. Also, maybe point out some critical aspects you have heard, point out some personal views and opinions and some things you want maybe to improve.
So please take this home. Please open up the discussions, go to the GitHub pages and make comments. Please do as much as you like and comment and voice also criticism as much as you like, because only in this way, also the repository providers, politics, only this way they will start to change and change it according to your demands
and according to the things you want to have improved. And yeah, again, depending on your discipline, of course, FAIR can be maybe easier
to implement in your daily work as a researcher or it can be more hard. But nevertheless, you should maybe just try to start doing it. Yes. And with that, I would like to head over to Catherine.
Exactly. So some summaries from the last today. So these slides are mostly copied verbatim. We learned about the Carpentries as well. And again, if you're interested to set up some local group here, please contact us. We can discuss later as well. This one. There we go.
So we heard about Git and I hope that I convinced you that version controlling your project leads to a self-explanatory history of your project, which is helpful to you in the future, to students, to your colleagues, to everybody who else interacts with it. And in particular, this idea of having a single source of truth for everything that you do,
for the documents that you create, for the code that you write, for the issues that you collect. So these platforms have worked for software companies to solve these problems, to work effectively and efficiently at high speeds even in some cases, in many cases.
So I think they are really a good choice for scientists as well. Lastly, then we talked about how you should improve your codes. Most of you have stated that they have written scripts so far. And we are very happy that already in the time between filling out the initial survey and coming here,
many of you have already started using functions on your own. And we hope that our lessons, both in Python and R, help to improve this skill even further. We also talked about packaging and building modules in both of these languages.
And again, all three of these steps, scripts, functions, packages, they solve specific problems. They are not inherently a quality level. So don't feel bad if your problem is already solved by a nice, short, concise script. And some general remarks, because yesterday we briefly touched on this refactoring and software hygiene.
And I didn't have anything prepared then, but let's just generally say that this type of activity is accessible to you as soon as you have written tests for your functions, because then it is basically safe, riskless. You have a safety net of tests or even a safety net additionally of version control.
And any kind of code changes that you think make your code more readable, more understandable, faster maybe, can be subsidized here. And one resource I want to particularly highlight, because just this week they have relaunched their site, is Exorcism.io.
It's a community for solving programming problems. I just linked to the R and the Python examples here, but there's many other programming languages, and the problems are all the same in each language. So if you know one language already, you can maybe work through the problems rather quickly
and then try to solve the same problems in maybe a new programming language that you would want to learn. And in particular this aspect of testing, of iterating quickly, of finding one solution, then improving the solution is covered really well here,
because the tests that tell you whether you have already completed the problem or solved the problem or not, they are already provided. So you basically do only half of test-driven development here, but really useful. And in particular about software refactoring, I would like to highlight here the lower one.
I didn't order this properly. The lower statement here that this should be the second step right after you've gotten your code to work. And the third step, maybe if it's really necessary, because as you know computers are getting faster and faster, performance optimization should not be in the forefront.
You will probably waste a lot of time if you start thinking about performance too early. First it should work, then you should make the code better, and then maybe make it faster. But generally, iterating quickly does not mean that you rush things. There's also a nice saying here that if you want to go fast, you have to take your time,
you have to study the problem, you have to understand the problem. And in particular as well, cleaning up your code or whatever type of research artifact you produce. And then if it's easily understandable, then you can also iterate faster and improve faster.
Just some resources for performance optimization. If you really need it, you should always know what exactly you can improve. So there's profiling or benchmarking tools in most programming languages. So please measure first, identify what the actual slowest part is, and try to optimize this.
And in some cases maybe the slowest part will not be accessible to you. So in the Bug Diver example we talked about, I did this just last week to check my advice here, and the slowest part is the server to which I am communicating. It's not under my control, so I can't really optimize anything,
because my own code is contributing so little time to, for example, the test runs, that it makes no sense to invest much time in it. But here are some resources in case you need them. And then you go through the cycle again. So I mentioned iteration. After you have identified one of the bottlenecks, then you can check for the next one.
But definitely don't do it just based on what you think could be the slowest thing. Generally about speed and when you think about how much time you invest into some problems, there's already pre-computed tables from the XKCD comic series.
Maybe you've heard it. So there's on one axis how often do you do a task and how much time would you save when you automate it. And then you can easily check, for example, you can invest two hours of optimizing a task if you do every day a task which you can speed up by five seconds. It's very simple.
So it's of course also a bit of comedy in there, but this is mathematically correct probably, because as you can see this development of some kind of optimization or automation that can also overtake your actual task. And generally you have been hearing about best practices a lot in the last few days,
and I want to leave you maybe with one impression to think about. Anything that is called a best practice has of course been in use widely and has to be recognized. So by that time the actual objectively best stuff is already way ahead of the curve here.
So whatever you hear about best practices, it's probably more like the average. So it's definitely something that you should do. It's not something to be ignored, but it is not the laurels that you can rest on. It's a base case. Unfortunately, we have the case that many, many best practices exist,
so how should you know which one is the best for you? Sorry, you can't really. The most important thing, again, as we've heard with PIDs, for example, is just start doing it. It doesn't matter so much which of the advice exactly you follow.
It matters that you follow one of the advices and maybe evaluate it regularly. I would recommend probably at the start of each next biggest project because through a project you learn what works, what doesn't work for you, so definitely at the latest when you start a new project
it is a good time to implement this learning. I think that's more effective than switching in between and trying to improve something in between because you may get lost in trying to improve the things over continuing the actual work. Here comes just a quick overview of the resources
that we compiled to help you get through this maze in many cases. So because we just had the licensing topic before lunch, this is basically a flipped overview, so I thought that maybe it would be helpful to write down some situations that you may be in.
So if you're looking for a certain type of license in Germany, for example, you can look at efros.org. They have also news about court cases, so not all the licenses, for example, have been tested in court. Some have, although we hope that this actual question never becomes relevant to you.
If you want it really, really quick and short, there's choose-a-license.com. There are several different cases. You can sort yourself into each of the situations and then you will get some quick and usable advice. If you want to have a summary of the licenses that exist, for example, because you notice in a software that you are using
a particular license is required, then you can get a very quick summary at tldrlegal.com. Bad line break there. And generally, if you want to read more, there's papers and papers about this as well, specifically for Germans. So for example, Victoria Stodden in the US has worked on this
and I'm not sure which author this is, but again here, one of the most important advice is pick one, because if you don't, then nobody can reuse your code. And much more stuff.
I want to highlight the stuff that has helped me a lot were conference videos. So for each programming language, you have conferences. Even in Python, for example, you have general Python conferences and scientific Python conferences. Many of them publish their slides and their videos and recordings.
There's also webinars in particular. Our studio, I think, is a good resource here. This is how I learned a lot, probably mostly everything. I know. There's really a long list of blog posts. There's exorcism again, as I mentioned before.
That's highly recommended. And because I'm also from the bio background a little bit, the Rosalind info project also has a lot of problems that you can solve in Python. So if you want to train in this topic, that's a good resource.
And there's textbooks. Some of them Luke mentioned yesterday. Some of them I also can recommend. We have an Open Science MOOC, which has partially also been here developed at TIB. So this is recommended. There's so much more. So the point is we will give you the slides in the end.
And of course, we invite you to have a look at this. And before everybody leaves, we definitely want to, from the bottom of our heart, we want to thank our instructors. Matthias is still here. Martin Hammich will be here in a minute again, I presume. And Konrad and Luke have already left.
So it was really a pleasure for us to work with these people, to work with you as well. And yeah, we had a great week. And therefore, we want to also thank all of you who participated with their questions, their suggestions, their own experiences. Thanks a lot.