Introduction, FAIR Principles and Management Plans

Video thumbnail (Frame 0) Video thumbnail (Frame 1853) Video thumbnail (Frame 3649) Video thumbnail (Frame 11350) Video thumbnail (Frame 16852) Video thumbnail (Frame 23729) Video thumbnail (Frame 28462) Video thumbnail (Frame 32388) Video thumbnail (Frame 36215) Video thumbnail (Frame 48688) Video thumbnail (Frame 60249) Video thumbnail (Frame 71810) Video thumbnail (Frame 73301) Video thumbnail (Frame 84991) Video thumbnail (Frame 96681) Video thumbnail (Frame 108371) Video thumbnail (Frame 112365)
Video in TIB AV-Portal: Introduction, FAIR Principles and Management Plans

Formal Metadata

Introduction, FAIR Principles and Management Plans
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Information File format Multiplication sign Video game Software testing
Area Standard deviation Service (economics) Web page Computer Sound effect Physicalism Digital signal Line (geometry) Staff (military) Twitter Architecture Mathematics Mathematics Software Computer science Physics Information Computer architecture
Torus Standard deviation Multiplication sign File format Materialization (paranormal) Mereology Data management Hypermedia Object (grammar) Information Endliche Modelltheorie Logic gate Position operator Physical system File format Cycle (graph theory) Digitizing Computer Internet service provider Staff (military) Digital signal Digital object identifier Connected space Annulus (mathematics) Latent heat Data management Process (computing) Self-organization Pattern language Point (geometry) Web portal Service (economics) Pay television Letterpress printing Digital object identifier Staff (military) Architecture Latent heat Software Subject indexing Uniqueness quantification Lie group Associative property Task (computing) Focus (optics) Standard deviation Variety (linguistics) Information Suite (music) Characteristic polynomial Total S.A. Mathematics Hypermedia Physics Object (grammar) Library (computing)
Slide rule Group action Service (economics) Source code 1 (number) Online help Student's t-test Mereology Code Supercomputer Wave packet Software Computational science Information Library (computing) Domain name Service (economics) Channel capacity Electronic mailing list Correlation and dependence Staff (military) Basis <Mathematik> Bit Coding theory Software Personal digital assistant Series (mathematics) Universe (mathematics) Self-organization Video game Thermal conductivity Reading (process) Spacetime Library (computing)
Point (geometry) Web page Slide rule Context awareness Presentation of a group Group action Multiplication sign Set (mathematics) Control flow Open set Student's t-test Mereology Focus (optics) Field (computer science) Computer programming Strategy game Lattice (group) Different (Kate Ryan album) Software Daylight saving time Energy level Information Website Hill differential equation Library (computing) Domain name Scripting language Key (cryptography) Mathematical analysis Usability Bit Digital signal Open set Hand fan Process (computing) Software Visualization (computer graphics) Personal digital assistant Object (grammar) Domain name Spacetime Row (database) Library (computing)
Point (geometry) Multiplication sign 1 (number) Gene cluster Plastikkarte Code Field (computer science) Local Group Different (Kate Ryan album) Robotics String (computer science) Visualization (computer graphics) Software Implementation Logic gate God Physical system Thumbnail Area Information Software developer Cellular automaton Projective plane Shared memory Sampling (statistics) Sound effect Data analysis IMS Bit Measurement Data management Personal digital assistant Video game Table (information) Resultant Genetic programming
Email Cycle (graph theory) Projective plane Moment (mathematics) Video game Sampling (statistics) Planning Database Bit Student's t-test Digital library Generic programming Disk read-and-write head Field (computer science) CAN bus Data management Latent heat Data management Sign (mathematics) Software Software Position operator Address space Library (computing)
Axiom of choice Standard deviation Statistics Functional (mathematics) Scripting language System call Archaeological field survey Branch (computer science) Mereology Number Revision control Data management Latent heat Software Core dump Pressure Module (mathematics) Scripting language Area Programming language Focus (optics) Archaeological field survey Projective plane Expert system Planning Bit Control flow Arithmetic mean Data management Process (computing) Software Function (mathematics) Revision control
Satellite Standard deviation Sensitivity analysis Group action Beta function Multiplication sign Insertion loss Image registration Mereology Data management Mathematics Different (Kate Ryan album) Repository (publishing) Arrow of time Information Area Service (economics) File format Data recovery Structural load Software developer Shared memory Electronic mailing list Metadata Principal ideal domain Image registration Digital object identifier Data management Message passing Vector space Software repository Website Self-organization Cycle (graph theory) Point (geometry) Identifiability Service (economics) Observational study Divisor Metadata Field (computer science) Wave packet Latent heat Causality Software Software testing Form (programming) Electronic data processing Focus (optics) Standard deviation Information Surface Physical law Usability Line (geometry) System call Vector potential Software Personal digital assistant Video game
Standard deviation Meta element Group action Multiplication sign 1 (number) Materialization (paranormal) Open set Mereology Information privacy Semantics (computer science) Neuroinformatik Formal language Web 2.0 Optical disc drive Hypermedia Office suite Category of being Social class Physical system Chi-squared distribution Service (economics) Web portal File format Metadata Attribute grammar Principal ideal domain Bit Formal language Virtual machine Connected space Category of being Arithmetic mean Data management Process (computing) Internetworking Telecommunication Self-organization Text editor Right angle Authorization Cycle (graph theory) Representation (politics) Bounded variation Row (database) Metre Point (geometry) Web page Identifiability Link (knot theory) Principal ideal domain Authentication Virtual machine Metadata Attribute grammar Time domain Revision control Goodness of fit Knowledge representation and reasoning Software Uniqueness quantification Energy level Software testing Normal (geometry) Communications protocol Computer-assisted translation Metropolitan area network Form (programming) Standard deviation Focus (optics) Dialect Graph (mathematics) Information Key (cryptography) Projective plane Usability Basis <Mathematik> Cartesian coordinate system Symbol table Software Interpreter (computing) Statement (computer science) Point cloud Video game Object (grammar) Integer Family Communications protocol Local ring
Source code Group action Open source Multiplication sign Poisson-Klammer Software developer Projective plane Horizon Set (mathematics) Price index Bit Data management Software Software Universe (mathematics) Object (grammar) Family
Group action Euler angles State of matter Direction (geometry) Plotter Multiplication sign View (database) Decision theory Strut 1 (number) Graphic design Water vapor Open set Mereology Disk read-and-write head Semantics (computer science) Neuroinformatik Data management Mixed reality Different (Kate Ryan album) Arrow of time Information Office suite Bounded variation Website Arc (geometry) Physical system Service (economics) E-book Software developer Digitizing Moment (mathematics) Shared memory Bit Principal ideal domain Digital object identifier Checklist Type theory Data management Message passing Arithmetic mean Process (computing) Repository (publishing) Order (biology) output Website Self-organization Summierbarkeit Cycle (graph theory) Whiteboard Bounded variation Point (geometry) Metre Web page Service (economics) Virtual machine Data storage device Checklist Focus (optics) Field (computer science) Metadata Hypothesis Wave packet Congruence subgroup Goodness of fit Software Representation (politics) Selectivity (electronic) Lie group Data structure Traffic reporting Backup Compilation album Domain name Focus (optics) Information Key (cryptography) Projective plane Physical law Code Planning Semantics (computer science) System call Inclusion map Software Universe (mathematics) Video game Local ring Library (computing) Address space
Point (geometry) Onlinecommunity Group action Dynamical system Service (economics) Identifiability Open source Code Multiplication sign Virtual machine 1 (number) Horizon Information privacy Machine vision Template (C++) Formal language Neuroinformatik Data management Term (mathematics) Single-precision floating-point format Software Local ring User interface Supremum Software engineering Software developer Cellular automaton Projective plane Electronic mailing list Planning Bit Inclusion map Data management Message passing Wiki Software Repository (publishing) Personal digital assistant Order (biology) Universe (mathematics) Computing platform Freeware Local ring
Group action Beta function Link (knot theory) Multiplication sign Expert system Planning Numbering scheme Basis <Mathematik> Open set Solid geometry Group action Proof theory Repository (publishing) Internet service provider Point cloud Social class
well good morning and now and now what this is that have been lying about many in science a cop we've over both work at the I
and leave that well a warm welcome to all of you to handle them we hope you had a good to have a good weekend and well that's not as we get together and with fun new workshop format so at least said the overpass that it's going to be also a test for us it's a new format we will we will twice a year I have referred to the they times of the rupture and they're well we all hope that that you can said that you would get from new information some some interesting facts there about the stuff you may be denote form and solar with this this Monday it will mostly be had a lesson vomit and only is just charm some shop and hands-on sessions and for the next couple of their days we will have in the morning we will have some more lessons and then the afternoon will be a hands on life coding sessions so the again
the floor today you see here leave start midsummer if shot effects and about entity TIB and then continue and with and short description on that the copper tweets and what's the idea behind them about this workshop and there it's instructors and then we will invite you to give a short put chart introduction about yourself and an introduction to add are principles which will provided the background for this whole and workshop so let's not edit TIB is the German National Library of Science and Technology it so we search
line we have with the national mandate that's very important so we also have a local and we here for the university and but we also provide air services nationally and internationally in s focus areas which our science and technology architecture chemistry computer science mathematics and physics so debts and that's the
focus and we amend I like it's association and about 500 members of staff and total so it's quite big organizations the IRC lobal supplier for scientific and technical and
literature and there also a new formats including digital objects and so on so it's not only the pure academic text but all the technologies and we such objects that come and with that we also founding member of data side Kubel the provide you with some information later on and about an at this this connection and in total we have about 55 thousand a journal subscriptions about 15 thousand of them in print and about for the dolls and they're in digital format and total we provide about 9 comma decimal 1 million items and and about 17 million patterns and also standards from this and natural science and from within the past 2 years the gates I would say and managing of digital resources became very important also an important aspect that of flyways as we know them today so we still have the classical and literature parts and the tasks that come with them but we also and tried to focus very early on on and Justice Cowley indefinability off non-textual material and to be started where we only 2 with friends it's and by our discovery portal now and including objects like we search data like a 3 D objects like all your visual and media and you can discover them all and I using our portal an early on also as a member of data side we have started developing tools for we such data management as starting in 2003 so early on we had at the way before the fair principles came into place we had to do if we such data management guidelines and a DUI so the digital object identifier service we add we are developing currently you enter data management systems of such as the likeness data manager and also help their setting up a generic and discipline specific or positive torus for and for we such data so within all this and objects and also where we also do well about will been appalled model as for audiovisual media so also for today and there will be will colonies and so all the lessons will be recorded and you can look them up in this portal laid on day lesson I develop a searchable and yet and some of the jets on that she sir Perez Turing with an S them aligned we of course it's important that you and that you had it that to provide services for our 4 we searches and this community essentially we're lies on the 12th us so and usually you can trust the library to keep and your electronic and print and publications and provide this information on a long of well long time and in the digital h essentially is the same agenda so we also try to provide this digital information as for where we long time so we have to manage for example a new media formats new research formats and what this that really essential part here is that the 12th so we need a community we need we surges to trust us to do our job quite well so yes want detailed points like support also for other institutions and to upgrade established workflows and to develop also new formats of to link non-textual materials like we such data to other we such information so that's our main and focus point and there's as such we try to keep this trust and as pointed out and also for the future because you know and it in a new edge we will we will see what the wall offline whistle be in the future but to we I will I tried to contribute our part here but what it happened with having a complementing other things so you've
probably heard of software carpentry and commentary these our sister organizations which merged to starting in about a year ago into the commentaries and this was a wonderful inspirations for this workshop they are mostly volunteer-run organization who teach computing a data skills to scientists and the skills I would say the grew win between them the more advanced more professional staff that most of you have probably been taught in specifically courses at the university but there's a lot of things that in the coding the founder of the Carpentries that it and they work too well to be worth
teaching any longer ought to be worth publishing about any longer so the the glue skills basically there's a nice introduction here you can look it up on you tube from a few years ago let me just summarized here that the their lessons are a
collaboratively developed by the community of learners of instructors end of hell there are a few different topics so from the social sciences tool biology ecology and geography and 2 more Creek allow being developed so the that's that they try to incorporate into their lessons are evidence-based from pedagogical research shows the longest of reading if you're interested later also in the other slides so on the summary data Friday we will also have along a reading list as well there's an instructor training for example which I have completed about half a year ago this mentoring among the instructors and the learners as well of course and generally they try to incorporate as much learning by doing as possible into the lessons and life coding which we will for example conduct tomorrow and Wednesday and Thursday and knocked when we teach you about yeah fair principles for R and Python for example and especially to the students who are here from the region we are totally interested in coming together as a group to establish a local for example instructor group to have similar workshops such as these on a more regular basis so come talk to us if you are interested in this in which have a capacity either instructor I'll help learner we will we ever really happy to find a group of people we can do this with a bit more often and besides the software data aspect there's also some more specialized feeders for example the libraries space of the library domain it also has its own yes carpentry assist organizations I guess you can say and also fall authoring and material and for high-performance computing there's also a lessons and groups being established so it's all under Carpentries thought or if interest so I will just summarize this part they teach the glue skills of scientific computing that you can use really everywhere and with that let's continue introducing this fair data and software workshop s and you know already said we're trying many new things at once here but it is rooted to end of founded in some earlier work that the TID has been doing for example the conference off when services for science from a just over a year ago or I think May 2000 17 then both our interest in this Carpentries movement and so in last autumn the German Federal Ministry of Education and Research conducted and ideas competition which she under the lucky ones so that's why you can be here for free and with what we can and I hope this workshop so that's the 3 sources I would say for
this week the carpentry space part comes of from the fact that they have so much useful material already it's of freely licensed under CC BY and so this is of course they're useful for us and some of this we will you be reusing and in the next few days of also my colleague already mentioned this we will definitely want to have lots of time for questions and discussions of the each session so that the data sets and maybe the software that you brought here can be discussed then
off the record of course so if you have questions for us that we can help you solve or maybe at least finds ways to start solving it please let us know so we will not teach you all the time but we will discuss a lot hopefully and have
the goal is basically to apply the fair principles to both of these important research artifacts datasets and and software and software and it's all the way from your small analysis script or doctor visualization stripped down to the infrastructure so after a programs that you actually publish the and in particular we want to focus on solution ideas for common problems and you know it's a specific to the field so the different subjects different solutions will be needed but in some cases there are common problems and what we can already tell you that there is no silver bullets there is no 1 size fits all solution that's no all 100 per cent sure way to get perfect it is a step by step process though and we will try to bump up what's the hill basically and set up the fan level 1 step of the time at a time whichever for whichever kind of objects that you want to discuss with us datasets software papers on but we do want to have a little bit of fun so this is Frances Bacon and he wrote a satirical book centuries ago I think about the wisdom of the Ancients and you probably know this saying that 2 years in the lab can say for 2 hours in the library which is really a bad deal because this libraries really really nice so you should have a look at it and 1 of the breaks maybe but from my personal experience as a PhD student I can say this is totally true and 1 solution idea very concrete used to just read more old papers it it helps a lot many of the problems that we're facing especially in the digital world have in have been discussed at least before not all of them have been solved already but it's really really interesting to dig into the old material also talk to your colleagues learn from each other's experience we are really far too often repeating each other's mistakes and then only I've learned something at we could have just talked 1 year ago for 2 hours and then you would have had to make this an experience that's what we encourage you to do and also learn from the experience of other subject domains there is different strategies of course and different playing fields but I am certain in some cases or even 8 cases you can pick a strategy from another field may be applied to your own problems or you can lift up your own problem onto a more level and maybe more fun of playing field and we will use this guy's sticker face there every now and then in the slides to remind you of these wisdom of the ancient kind of key points in this lattice the summary years few lessons on Newell but some have to be taught again and again sometimes in a different context because time changes things of course yeah and you already mentioned we want to produce open education resources yeah so all the slides will be available later the carpentry STI lesson and workshop page under all the top of Group at the tip on offer they insert the instructor presentations will be recorded and as you can see you know nice is the portal which is but set up broadly for different topics so you will find it there after the workshops we will e-mail you when it's exactly finished we're not sure exactly how quickly we can do it but it will be there for for the questions and answers as you know it's it's this is being recorded so if you ask something in between you may be recorded as well as its best maybe to write the questions down for later discussion sessions I'm pretty sure the microphone Doesn't completely extend into the room there is little possibility of it then about the sticking north so some of you
have these red and green ones on the table please share them among the table roles you will use probably leave 4 6 or 8 of them poor morning and passed a known and we would like you to write down 1 thing that you learned or liked after each session so for example today before lunch on the the green reddish that good yellowish-green once and on the red ones at some point of criticism for us something that we could improve for something that you still confused about in the topics and then we will collect them and in the evenings hopefully already have a look at them so that on the next morning we can discuss this important points that came up several times and
then on Tuesday Wednesday and Thursday when you have a life cloning please use and to indicate that everything is going fine following the LIFE coding or that you have a problem or something is not working then the instructor can see that they need to slow down a little bit of why and the helpers will also be here then will will come to you and help you so that's the idea of these sticky notes then we introduce so but before we do that we want to talk a little bit about this effect that especially when you're together with different people wall have their specialties you made maybe feel like my god everybody's knowing so much more than I do but but this is really usually not the case the reality is mostly like this everybody has their special fields of knowledge to that is of course some overlap and what we will try to do this week is to broaden this overlap area soul really just don't be intimidated by anything that you here today and so on just talk to people no I was introducing myself so I'm alive scientists' I studied in constant so for most of the time but then also went to the north of Norway to study some Arctic ecology and geology and I worked in a customer support for President Kong for a little while and then did a PhD in what was supposed to be of biochemistry mostly but it turned into a method development and project for reproducible bioassays so for example I developed a method to couple light and electron microscopy you so that you could view the same samples in exactly the same cells and cell clusters in both of these microscopy measure methods and ii robotised a biofilm essay which was previously really manual work and result depended even on the strings of your thumb Whenuapai that'd and then the robot did this in a much nicer way a much more reproducible and there was a lot of integrating different systems from X to for example that's all the time when I started learning some programming in and also gate and loutish and of course for frightens there was extremely inside a exciting and interesting to get into these topics and they worked in the pharmaceutical industry for a little bit in a laboratory information management systems project and then it's the idea here I am since November and support all kinds of topics around scientific
suffer projects then the handle which and you think you have gas so we share of the same background we have both for that that the dawn Arctic biology I studied them in biology and oceanography and women and then I went there on a some we search wrestled so for example you that we are stone had to pixel times collect that there are
many many samples and also loaded and management of the database that and yet together with 4 other data Quakers so had only on experience a bit of data sharing with such data management for my own PhD but
also for many other colleagues and you learn very early on away even as a young master and PhD is you and that they're in some sign some disciplines like climate sciences it's essential to we use all the data because you cannot simply you cannot be your PhD on your own it's not their it's not possible at all at least in their head this field so data sharing this data management and something handling management plans and organizing and huge databases was very early and experience and I totally can can also a convert recommend that and I have 2 young a PhD students and so if you have if you're at the chance to get your hand on databases saw on digital repositories even just to look up some odd they tell the artist and if you don't understand it there if there's a contact that there's an e-mail address or something and Wunderman e-mail pick up your phone and you get to know the at the date of the others very better and you get a feeling of what you can actually do well also if there and I'd ever fed it up your colleagues to improve your own research has since their September 2013 I started to add a project their position and yet anti-IBM now having been a couple of years I moved to be ahead of 2 team with such data and their scientific software so at the moment we had to woman did have that a badge we hope to extend that that that in the future and of course was that with such data management is is a topic that he in the library it's anchored in various departments and so and told me about a colleague said he had TIBE dealing with we such data on scientific software management we have so
a little bit of an overview of and to
give you as well I mean you all of filled out this pre workshops survey this 25 of us so we have a few statistics you're an overview of your answers it was all multiple choice so don't worry if the numbers don't add up so about the research data management we saw that's not 10 of 25 people no about research data management plants 14 have produced their own data if you more half a process their own data and a few less and have process the data with their own self written software and so if you have you brought data set so as I mentioned before we're looking forward to have him with you with that so those sociologists you will probably cringe a little bit at this way of evaluating a survey to please their witnesses was just than X so very quickly this fall forget and version control and we were very happy that about two-thirds of you already using it at least on your all so we will add tomorrow focus was a little bit more on the advanced parts that go into the the area of working collaboratively on a project therefore using branches using remote and going through the whole workflow for example on the top and also some advanced stuff as well and in yeah both of the programming languages we s we already heard as well as as most of you are scripting this is great as some of you already know how to build functions from your scripts and some have even build a module or package so we have a few experts users greater so I'd tomorrow then enter Wednesday on those they were going to dive a little bit deeper into this and now we start and continue basically was the main core topic the fair principles themselves if can effect and yet thank you so diving into fair means and we go you back a little bit so that basically the fair principles they came up and 2016 wide so that was there the 1st fair paper was published but actually and they go way back I would say in the beginning of the 2 thousands or something like that we are well we already had in many many discipline specific such disciplines and I would share focused on data mentioning and they all who already did data managing plans they and they collect the data did a quack their day-to-day analyze them and even published them sometimes on institutional
website sometimes even stopping setting up their own and they tested this often discipline specific Beethoven'' early and also the data was accessed by other researchers already so it started load early I would even say if you go way back that's a 60 years and then we can say in the 1950's when the 1st satellites were shut up and
call all the data for under the surface of the earth and then the 1st time this also started already and we using big amounts of beta so at the principles
and here which no come out actually in the other the basics that have been into place for the last the case I would say even if if you want a good bet but even the centuries so actually this whole add Baker lifecycle there are several ways to show this life cycle we collected and we decided you know a simple 1 and I we had al- already in place now and for many years as the line we and I will try to get that point across we and early early on focus on services information and training and for this whole life cycle because we we came into contact with many and discipline said for us it was and basically of course because of our focus more from the natural sciences pond you but all and we also exchange a lot with 4 other institutions for example also uses will also registration agency for of a PID fly example for the new eyes so we have to get at got into contact early on also part as disciplines and and try to and get that message also across and develop tools and services along this hall and we such data life cycle so what we discovered in the in the last 10 years that actually there are some key factors which twice and Baytown Exchange preservation and a sharing aspects among our the among the institutes and also among 2 researchers in this is all this starting off of a more institutional Muteia so there and this list this says open to continued and there are some points like trustworthy data repositories like actually data policies in place standards for our data citation standards for me today to standards for licensing and we will dig deeper into all those of those points of it I am doing this week now we are when we talk about their data processing and we use and also for software we talk always about there's some kind of whites management so we hear should we have national and international law that comes into place so we have to deal with and with this well we have a specific methods and tools adapted that in the in there in the book fills so where we are dealing with many many different working groups and across many different fields we are dealing with of course as scientific instruments and I which are provided by many and different at companies and they use their own formats of course for for data processing for analyzing so well the same and can be true sometimes a software swells so we also have to deal with debt and of course we have to deal with the costs of the potential data loss and what happens when they dance of there's not actually and use so you have to impede experiment as we heard in the beginning and as soldiers all too I've said that this whole motivation for or to change something so in the last couple of years this whole and they management and software discussion actually is that it's actually Twining and modulation for change and that's also and maybe part of why we are where we are here today so let's just not some various very quickly I guess most of you are already familiar with this various of publishing and beta so we have most we such data that is still not accessible and estimations wary but it's about a 70 to 80 to 90 % depending on each study you're looking at an off the we such data tests that is some produced is still not a wearable not a lable in the sense of fair not a wearable in the sense of data sharing among scientists on long arrow research groups and but we have another wearable and worldwide Web and so we have found prove CIA and this we we have to focus with the goal to actually focus a bit and in sensitize on the data sharing and the publication the 2nd bond is there we're missing and standouts and we're missing a technical and the documentation is in general so those are often not a wearable and that's also why in as some areas when we come to a generic me data standards for example some organizations formed like data side and which started with mandatory me to data and which are required for I legislation for example so and next pond we have on consistent practice doing the citation and the West cleansing of the data and he also warn example for this is the dataset metadata schema that is continuously and be well up and do well up and it helps you a generic way so doesn't matter the disciplines you come form and it helps you in making your data and has citable and with the idea of providing it with her persistent and identifier them any more sheet and PID persistent identifier services and of course also bit depending on the discipline and you're coming for and but we will and provide some examples here this afternoon and then of course there's also the point and of wheat it researchers who didn't know how to share our just unable to share because of some was 2 actions all are unwilling of course to share data because of competition all vectors and he'll of cause what we think we're is needed also this time time and maybe some change of the generations so that actually it then I will add go more into a whole new let's say the on a whole new at the end and all are sent to maybe of of data and a truly software sharing and highlights the benefits and the possibilities that come with the sharing of data and the fair principles then build up on this whole background and the 1st publication years form and 2016 and the group want Marco Wilkinson and they they said again that it produces paper coming hold on this whole and discussion actually I'm I wanted a data management so we had we many many many different institutions and groups worldwide like for example the the the we such data Alliance and they already focused on this whole data management and there are many many many other groups away lable so I like the 4th 11 and then they said together and they produced this paper on the fat guiding principles for scientific data
management and stewardship and it was published it in this form I can also in many many other formats you will find online but what I would like to highlight here and they have all the storm computational background and that means they approached this this data-management aspect in a we the computational way and that's why also they said now in the in the digital age where most scientists if not all user computational methods to analyze to process that data that they will have to and enable the machines to communicate so basically the idea behind the and it's so important also for this whole is that fierce means and machine-to-machine communication and not so much human to machine human-to-human fairness in the sense of what we might understand or interpret into the fair principle so that's where and importance of family means fair from machines in a machine readable for example meter data and only secondarily to humans and that what's it's mixed up And it's in many many also organizations and workshops and you you learn here you learn about a you hear parts of the as interpreted by the European Union by the European Commission the whole European open science cloud has to be fair for example so it's and they're talking and at the same time also about about workshops about teaching materials apart and many other stuff and also on a on a human level but it has to be kept in mind that essentially the idea behind the fear is this machine actionable spec so the principles and well maybe you and I hope you had a look at them beforehand are you familiar with them if you read them and that the more I know in detail and that's the it's only 2 short the short version and you will see that they actually I sometimes celery detailed and sometimes they leave no room for interpretation to know and so it down quite open room for interpretation so when the first one definable 1 let's start with the persistent identifier so the 1st a concept actually developed in 2000 something like that it with the concept of fair Westland saying we such data and they pick it up and say that actually a persistent identifier is the basis for making data fair and the for those that persons identify they had also decided want which media data this 81 me to data and a data themselves as itself searchable as a resource and editor identify I and the me to data I actually closely linked together so that's about all this final process and we will hear more about it this afternoon accessible are highlighted here some as someone and here again we see this as focus on the machine readable park is a they then made the data I have to use standardized communication protocols so again you what for a human is a standardized communication protocol right language English German French whatever Spanish and a even though you have lots of variation slots of dialects and they tried to a year had more this vendor dies the machine to machine and communication and What's interesting here and also the Commission machines still understand the data disappears if it's not long away lable if there has there been a takedown node or something we know are often literature which appears quite often then still been needed data as to remain accessible this that means that we we know what has been there and true was responsible and what is the outcome point off of information and in the future and this sometimes done by tombstone that pages for example and Digital Repositories but especially here we still have a long way to go because it's not a common practice yet at all but of of on that in the in the next phase as though interoperable her again We're talking standouts we're talking machine readable stand as formal once shared ones but broadly applicable language for knowledge representation so again strongly computational you actually on this interoperable part and maybe it's even the hardest ones regarding the their principles so even though vocabularies always used have to be fair in themselves again that means follow up on disciplines and like like the Texas 1 will capital we that is a genome more-or-less generic taken that can represent and many and scientific disciplines but even the vocabularies themselves have can be and have to be taken care of that has to be a community wanted who adapts them to updates them and that's that's a very essential point he so not only the data the software the the objects we produce in scientific research they have to have to be fair but also the methods that are behind it so fair small is much more than only there was salt of the research but actually the whole process the whole layer as data life cycle if you want to be interpreted in this way and always with this as you on that machine action part to this at that would they have also for example include qualified references to order a data and that can be colored by automatic as citations and that's done quite quite a lot but also will have for example to and the stuff that you use to to process your data for example for the reusability parts that could be and a lot of good attributes included so prominence is a big sector here provenance that is automatically generated throughout to do research process so maybe in the future we will all have smart let's we will all be working in a completely networked an a cherries put on were classes or any glasses on and everything will be recorded what review all will add feed into an electronica last book and maybe and then I fall and there are supervisors can look at our our nodes can look at what we have done in the lab and also been machine will tell you hate and this a substance does not belong into Europe and you I don't know how how into new at the traditional something like that so it then maybe it's going maybe this is the future we are facing we don't know it yet but you have the eyes many would be 1 of the ones that of the way and of course we
also talking about data protection here we're talking about you such a life sciences here we are talking about for example if you do mission oriented research together with companies you have maybe where restricted in what you can tell it to the public and what not sold the licenses issue will be a huge 1 OS and future so so founder 1 of the Apple and symbols entrust lead is interestingly where about 1 and a half years later they had the group came together and just produce the 2nd paper because there's this whole the while the waltz like this the acronym very much and they gave it was always give away to it many many interpretations and that just means that the al-Fayadh they're using fair but actually never had a look what those and principles actually mean and what this Witten down in detail they just went on and adopted widely and they said OK 1 step back the the key out because they tend to get again and that we need to really to be clear we need to be clear what cream meant in the 1st paper so toward a 2nd form and actually this 1 wonders and not so it's not so common so when you talk to many people who say they adapt the fair principles that no at the paper not once they don't know about it it's culturally and yeah it's it's not with little to them at all and there's so dave for an especially the focus on the European commission that tried to clear up what they meant actually lift affair principles and again here and those are the statement of the paper fear was not to stand out it's it's and and guidelines their principles that are open to interpretation depending on the discipline so that's very important also that different approaches fail does not necessarily be a mean open so the others that include Open Data it can come open data but it's not exclusively for Open Data so that's very important so also when you work together with companies that you can do your best to say OK how I try to make data as fair as possible and they also said very important partly fair so partly Cowling those aspect and that others clapped may be fair enough so depending on the discipline depending on your individual i we such that can't and that's at the Institute and working group you included in it's very important to know and that you do not you be you cannot cover all the aspects of fair principles but that's OK it's the 1st on the way and of course the world will not change from the yesterday to now from now to promote so we it's a long it can be a long process and that's OK totally fine of that so they tried to categorize it a bit more and the fair principles into 6 categories and ever just mention them here they also open for interpretation so how but maybe and and it's it's you can yeah you can try to categorize you data in 1 of this categories so 1st the sentence we useless needed data there strict if they data sorry this data that cannot that is not a squat meter data had that is not in next that is on a local use speed drive or something like that and that is not so that I handed way to others the statuses we useless in the sense that is not fe it's not machine actionable in any way other than that the local computer can we did but that's about it so of findable and they have no media data at all and this dataset and would be we use less the 2nd category is that they put the PID on this dataset so at least it in next and World Wide Web it can be found it has a persistent identifier its least from some and from some so our way so it can be also invisible to maybe to the public by that this was a to a certain research group but to a certain and within an institute at lower uses the local persistent identifier it can be shared that can be found so that would be category true then they have to fail need to data that is included that means not that data set itself there is machine readable by the test PIDE and the needed data at that describes the dataset is can be understood and can be wet by another as system in a in a fair way so that would be the 3rd then we have completely fair data a solid only the need data but also the data is described for example in a semantic away for example using isn't the cat as vocabulary are using to he not at all goes something like that and by the test so with strict Espace so it's also shared among some groups the next 1 is some kind of the the same but with with an open-access so it's accessible to outduel white red wet and that the 6th category and we are not there yet so they are well few datasets worldwide I would I would say that fully open access and functionally linked again within the data so for example you have firm a medical records of a person that's anonymized and you search for a certain as sickness and for Office of the symptoms or something like that and debate up the personal links you to other patient data for example that's also known mice and way you can see and the and maybe different symptoms but with the same sickness or something like that and it's all interconnected so you search for example that I want to know what causes the Abeyta's then it will not only be 1 some tomorrow if you apply for 6 coming up that I known in the literature but you will get an Elway off let's say 500 thousand symptoms or something like that connected all connected to bad meters and you can develop new such Christians along that so the people are still working on this this aspect there so that many many research projects and going on and there are some also some man portals away lable to actually allow for this and let's say odd discipline interoperable semantic data search some also call the knowledge graph for our scientific knowledge graph and this is many people so think those kind of connected interlinked and software and will be will be the future and that's the old mogul offer as we understand it off to fair principles to say it's say we have fully machine actionable and data and metadata and they can in be interlinked and discovered halfway out the different disciplines OK and for
software as we've heard just now the end so apart was always an in brackets so the Fed data principles are meant for data and we're here for trying to interpret them a little bit Scott what it means for software so it seems to me that this offer is often thought as the other side of the coin basically as the other research object that is also quite important to the movement of besides a data sets as so it was mentioned already very early also in the horizon 2020 data-management guidelines and at the same time there is a really a huge is a of software development guidelines also for a project management guidelines from both the military and industry and of course a continuous coming along as well but also the huge universe of the freely broke open source as softer world so a lot of things exist already just to mention a few he
ISO 9 thousand family the canoe quality called from the group project the
plot the eyes for example in Humanities ed. humanities project so they have already published some guidelines and all of these is apparently now agglomerating under this fair label because as we already heard it is very popular right now it is rather simple to understand the direction of it at least maybe not all the details but it has become very popular and also the politics is on board so that's the the umbrella but it's a compromise and what we will try to do and all the software topics years follow the spirit of these airfare principles for the Office for suffer because they're not 100 per cent congruent to what is necessary for software and therefore we will deviate where necessary now going by the book when possible deviating from the textbook a little bit when necessary and back to that amendment and half and I get and that on the 4 before I lunch on now on data tools and services also offered 2 of AT T and a bit more about the the whole bigger management plans so well and we have formed the life cycle we have this key points actually distort to preserve to organize and to possibly share data and software and importantly our waters talks also that 2 1 has to start from the road beginning of the Yves so even start of the project planning as you plan your project as you plan your thesis it's plan you must or whatever and you also planned you're data and suffer management so does the choirs of course again persons tools services keeping the fear principles in mind and their data management plan camera came along along costs and on the last 5 years 6 years something like that maybe in 10 years and it was called differently before but now it's also a digital tool of Co and Cu 18 you data an Epson Hanover so some at has meant that the and we have also services offered by somebody lab it's university to an offer of a further we situate them air with such data management service as a support for local researchers and I'm sure when you look across here universities when year look across your institutions you have similar services that are offered all lady at by you institutions all it when you ask them they will tell you only have been set up at the moment we're looking on them to provide them as soon as possible usually there's the 2 answers you you can get them and say here it's a cooperation between the research departments and the ITC and EDIT service center and that the TIB because as a told you before we also have this local and that the part of the and the wall of the local university library as well so the here we also polite infrastructures for research center management of a focus on the publication costs so we're clear as a lively we like it and like to focus on the publication and are making our life easier I would say but of course we also have to keep in mind this whole and data life cycle and and quite recently we had their new professors coming to TIB and will focus on data we use on their wish all and takes and on the semantics so and we started doing a lot of data science projects here and since 2 thousand and their 5 would provide a new eyes services and you know and also focus on the legal aspects of data we use and they're the licenses they're connected and to this topic and some basic training in specialist courses there we offer as well we have a close cooperation to China for example to the Chinese that National Academy of Sciences and there we do some some bookshops also for 1 week 2 week workshop so they come here and we travel to China and have goods compilation there but there's only 1 example for for many more so the as the sum up maybe there again as a fair principles as you would they have come ashore well a short wave his they say isn't thousand 16 buff but if you consider all the all the processes and that and led to them and they have come out with a long way away and now far due west of the time before lunch I we have some more basic introductions into and data management plans and suffer management plans because the some of you are interested and what we try to do not for today is just to give you a short overview of the tools that I away all not all of them because the landscape is very what had already here but there may be some as some ideas to where you can can get the tools and maybe what we would be best for you feel to use or not what is a data management plan OK rose sharply that it addresses the issues we latent to data management and that can be a huge share selection of 4 aspects and they might be you acquired by funding bodies so they are required by the National Science Foundation for example in the U.S. they're required to add by some as whole wise in 20 20 calls not all sometimes it's optional attitude to air to provide them and it can be a formal documents but it does not have them and has to be to be developed that stops them off the project and it should outline all aspects of data created all data being used hands what I would like to highlight yeah it's been our understanding that the small coming along now also when you was a conferences that different tools are shown that it's a living document it the living documentation and it should be updated and frequently doing your project or whatever you're doing with the data so it's not that should not be as as as they take 1 like for the it is in the beginning of course for the funding bodies but then maybe you can even use it as or it can be connected even to I electronic book or something like that so at the there are there but the thoughts and to develop it more into that way so you can actually get some of the meter data use later need for publication and so on are out of the data managed plan already so you do not have to to do the work tries of 3 times or whatever so but you only recording meter data once and what you do and the Provenance once and you can we use it and later on with different tools so the interoperability between data management plans as digital tools is only with is also 1 central aspect off the Cohen sets a yeah developments going on in this field short checklist had actually is included in all of the tools hello % shortly
is that basically you always have some administrative information you have information about their data so I about data collection was which is very important so all of the prominence let's say information behind it and the documentation technical reports fixed maybe if it's important for a new domain of course in the chill about compliance as well it's for example in a climate sciences and usual compliance is not as sole central because there's not much mission into 2 research with companies or something I get and going on at least in some domains so it's it's it's a good practice often to make the data public because it's financed by our by public money and from there from the beginning so abundant that can it page to wonder depends on the on the discipline you evoking in and then of course there's some information about the stowage still archival the debate up and you plan have for our for the your data management and what is a wearable maybe you should reach out to you know University two year Computing Center you brawls ask you colleagues how they do it and they're also please said to be mined to all the new methods there at that hour offered that go beyond the local hot wife and a USB twice and of course data sharing and at the next aspect so what repository always wearable and you know what possibilities are may be offered by the John also you plan to the population or something like that and what could be a digital representation of 2 data and and stuff like that and of course there was bonds abilities then resources you have a mean Baker management means time also means time management within your muscle PhD or whatever so of course it will be a central aspect that often it that I can save you time at the ends that means if you if you have their them if you have a structure if you have if you have to state a management plan for and even if you have to adjust and you can spend more actually more time in the lab doing your project and not less if you can have a but more beforehand and so on and as them as a common approach also which is a background for the data management concepts is as open as possible as closest necessary that's also create that is now adapted and widely by the European Commission and it's adapted widely I guess even even them by air as big as wizard state organizations like the the wizard shit alliance so and they say we are not talking about that Open Data and anymore so we say that fair means as open as possible as close as necessary because as being of course a lot of scientists very concerned about their the competition in their field about the and the the ethical and legal aspects of data sharing which are depending on the country or even on the federal state you live and it's not decided yet so there are many open and legal aspects there are there are so there are many decisions so on the legal side there's still open and missing and if you wanna ask 1 legal counsel maybe he would tell you 1 thing and then you call the office of the honor 1 make another appointment and date he or she will tell you a completely different point of view because there's been no all of why lie there has been no you was dictation I would say that is key on the Wizards data-management as yet so we're still missing that that's also a message maybe had to us the political side of things to clear up the law here at least that I would say in and in Germany but maybe it also of course is said to for order and countries as well so and make to do data management plans I just put up we've very about so and if we add that are known well so we have there the MP online which was 1 of the 1st ones and I we have fared Indian key tool and in Germany year that's becoming more and more popular and the the Weesatche data-management organizer and there are there many also we have it's too old and whether there is a but we are not enough variations and we offer a whole Wikipedia page shows that trying to keep up to date on a date with they all the new tools popping up so maybe you just want to look at we are dealing now with the posts static not make and even machine actionable DMP developments yeah I have also including PID systems to sites said tool or the the you do well there of course and they include a stuff like for example all could ideas and so it's maybe if you look it up now and the tool you are using now maybe it's made its outdated in half a year from now or something like that so of all I after their heads maybe because basically the same of Arrow to a lot of software so you you have to keep in mind and the that the device landscape and what would be maybe best and you have of course different stakeholders maybe also for different type of metadata management plans so we have we suggest U.S. here institute that maybe and 1 stakeholder you have added a repositories define us and of course the publishers and they may be have their own work elations or not depending on the DMP so it does not make it easier agent can make it more complicated and about it's I know of many scientists that we use an idea online because of the night the induced will but the the we will see what the future brings and maybe yeah it goes that way also that year actually machines and elect are connected said to Indian P 2 will and then you do not have to provide this information manually in future but we have to say and we have to be honest you we are not there yet so it's still way to go so just a different picture for the different stakeholders and for example in mission into 2 research we have their industrials they call commercial partners and good DMPS it's set should include all the possible and stakeholders so as a researcher you you it basically provides the main input and then and the ideas that actually it will feed back the information will from the DMP will go back to for example
publishes research fund institution and the DMP in itself will be machine actionable and dynamic document and this is the user interface and as you can freely use it online I of the DMP online and yeah you can choose choose about different and performance this 1 is an example of a wise 2020 data management plan as it is required and then you have different texts and you just at questions which are assigned after the whole eyes in 20 20 and mandates and then you add can actually fill in here and you information and create 1 and and attach it to Rio otherwise in 2020 proposal and that's basically and that's basically it but you can also as stated and our updated are online as you like that so that the new feature here so after many Morlot and there may be in your country or a new Institute they even decided to put up a national or local data management and plan in Europe supporting your language supporting you're you institutional it's a data policies and many of them I actually based on a diene P online because it's an open-source tool and still way this 1 is there now I guess said Italian but then we also have many more and languages covered and again you for example often byzantine the online so it's and and at my time yeah our message you would be actually to have to look at what this offered by your local Institute and then if nothing is offered then and go back and use the for example the in the online on in Germany as ones which are most commonly used and also a day I 1 of the developments is to use the DMP matter which is a combinational tool now the P and P online and pick it up code is a wearable and here it started I guess 1 year ago and and it said goal is to have a single code base on which most the peace I are based on and there can be developed and better we used by the local institutions because of course we are facing the situation that you local alive we you the local you university or institute they want to be there are the main ones of course all fooling you dynasty and services so maybe it would be a good idea if you if you ask them if they if there have won't met all lady on there on their mind yes some of funding bodies which required him he's already mentioned I did the European Union and then assess but they're also depending on the countries again we have with many more funding bodies which now go on and to demand to data management plan at least at the start of the project some even an say that they want it and another an updated 1 during the project at the end so we're plotting how the data management did go during the project and of course also in the case of them data publications what did you I saw the odd identifiers we have the data and a rich repositories were used and maybe why and the data maybe as a so a node on and do reusability off to data with that in mind new topic now suffer management plans so generally you can say that in many of the aspects from data management plans also apply to solve a management plan promises what read but there's 1 really nice to a from the sofa Sustainability Institute they have put up a list of questions of considerations that you should think about when you do a software management plan for example there's something about the F 0 at how much effort will you have available that is found at how much effort for example you want to involve fewer users in should they contribute back sometimes in suffer projects you would really will be the only user or maybe colleagues of yours but sometimes also the general public could become a user licensing is 1 aspect we talk about this on friday and that generally it's a long list of questions that I recommend you have a look at it and it's also the tools you can generate an actual document from it it's not computer caught all the time generally though in software and I would argue that it overlaps with the general advice from the free Hebrew and open source so community because there are others practices already how to build your own little projects how to publish your cold how to maybe build a community of users around your code so to get a sustainable project off the ground and as likely also mentioned I would argue just be pragmatic about for example what your find their provides in terms of templates what they request from you there are 2 and is available either from general-purpose tools or from the founder specifically so there's little room to discuss so what you should use just what is provided the DMP online tool in particular can include the aspects of cell management as well and so this 1 last point is a rough idea still but if your funding institution all your your home university any does not specifically require anything then maybe you could frame your get tabel GitLab issue tracker as as we as we also heard just now that the data management should plan should be living a living document anyway should be update dated as your project that involves and in software development meant the solution for that has basically been for many decades the issue tracker so that it can be published can be kept private so you could try to frame a tool maybe the the funding body that you are proposing a project to as this is my public land that is my living plant you can look into its users can now look into it and that's not then maybe as you know the suffer development in the agile at community is done by these little that to do lists and moving cards around and to discussing what's what's the next thing you could do or focusing restricting the work that you are doing currently and also having an overview all the time of what you have already achieved so we will talk about this a bit more tomorrow but given my aunt 1 aspect could be to just use was there that's the issue tracking in suffer development yeah and and only to sum up this topic in the planning in general hold helps you to step back from the day to day work that is so sometimes be needing to this tunnel vision and planned backwards from the goal set you want to achieve was project yeah plan what actions will be necessary or will be helpful in order to reach that goal because of the
topic of the UP science cloud no came up with a couple of times and and I you we just wanted to show you where
1 more on a flight here about the so called the updated action plan so this is actually a repository has set up in there and get out and link is CIA and then it was initiated by the High-Level Expert Group of the European now open science clout and they actually in white you to contribute to them and and a Comanche for on and the documents that are included here so Florida European open science cloud to and there is this this action plan in in in place concerning all and of the individual Phayao beta principles also what I will positronium provider basically should consider how the funding schemes are going to be at addressed within the European open science class what Artur had disciplines there are they want to stop proof actually and she had no surprise they focused on headed the data disciplines there 1st because they say they feel like here there's so the they they have a solid basis let's say like that to stop but we think and also not on a person's personal note I think also that the whole long tail data how it's called so the specialized disciplines and so on should be considered to now that any new years open onto the floor for of August so still some time left and also if you have any if you if you will with comments and is there to any of your institutions to you're 2 maybe fellow scientists and they took waiters to actually have a look here and and maybe they can even be somewhat modifications and some more advancements actually to the so-called the action plan for the next couple of years said Amitabh also player we think a huge impact to date the firm becoming who isomer 20 20 year funding schemes and maybe also of course on a national side so please go ahead and and have a look and so this is so and then would be the end of the introduction session and then now we go into definable