ORDS-MV: Connecting researchers for Open Reproducible Data Science and Statistics in Mecklenburg-Vorpommern
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 17 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67777 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Kaiserslautern |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Leibniz MMS Days 20247 / 17
16
17
00:00
MKS system of unitsInformation technology consultingPerformance appraisalComputer configurationComputer animationLecture/ConferenceMeeting/Interview
00:55
Video gameComputer networkBuildingStatisticsOpen setComputer networkData managementTelecommunicationData analysisFatou-MengeComputer programmingSoftwareCodeRevision controlControl flowSource codeStandard deviationEntire functionDifferent (Kate Ryan album)Lattice (order)Student's t-testTask (computing)Level (video gaming)CircleComputer programmingTelecommunicationMultiplication signOpen setWave packetCodeLink (knot theory)MereologyTuring testData managementSoftwareNormal (geometry)DiagramStatisticsImplementationComputer networkStrategy gameLaptopCurvePresentation of a groupInstance (computer science)Block (periodic table)Universe (mathematics)Video gameGroup actionComputer scienceSource codePoint (geometry)NeuroinformatikBitData structureType theoryOpen sourceFlow separationWeb pageUniform resource locatorXMLUMLComputer animation
07:50
Entire functionMathematical analysisCodeDigital object identifierZeno of EleaControl flowRevision controlRepository (publishing)Digital signalGamma functionOpticsOptical character recognitionPattern recognitionDigitizingSoftwareElectronic mailing listCodeHacker (term)Level (video gaming)Event horizonEntire functionInformationWave packetResultantRepository (publishing)Real numberData analysisTuring testDampingPort scannerCodeVector potentialOpen setType theoryComputer networkGroup actionBookmark (World Wide Web)Different (Kate Ryan album)Universe (mathematics)BitDigital libraryData structureLibrary (computing)Interactive televisionTouch typingPoint (geometry)Revision controlWeb pageExpert systemInstance (computer science)SubgroupDigital object identifierComputer animation
14:34
OpticsPattern recognitionOptical character recognitionDigitizingDigital signalGraph (mathematics)Computer networkTable (information)Artificial neural networkSoftwareCodeCondition numberCollaborationismComputer networkInformation managementStrategy gameElectric currentInformationState of matterStatisticsDefault (computer science)Address spaceComputer networkInstance (computer science)Embedded systemOpen setStrategy gameWordSpacetimeGoodness of fitElectronic mailing listEqualiser (mathematics)Web pageFamilyProduct (business)Multiplication signInformationOptical character recognitionWikiGeometryUniform resource locatorResultantSet (mathematics)NumberProjective planeWave packetDefault (computer science)Virtual machineData management1 (number)MathematicsState of matterUniverse (mathematics)Presentation of a groupData conversionDifferent (Kate Ryan album)Slide ruleLevel (video gaming)CollaborationismMoment (mathematics)Condition numberSoftwareFocus (optics)DigitizingSinc functionComputer animation
21:19
VideoconferencingComputer animation
Transcript: English(auto-generated)
00:05
Good afternoon and thank you very much. So my name is Anja Eggert and officially I'm a statistical consultant at the Research Institute for Farm Animal Biology, formerly known as Leibniz Institute of Farm Animal Biology, but we had a bad evaluation so we are kind of an in-between situation
00:25
now and we will try to re-enter Leibniz in two years. But today I will not talk about any statistical models, how we analyze maybe social behavior of pigs or something like this. Instead I would like to introduce
00:41
you to our grassroots initiative arts and what we had the activities we do and make them more popular and also in the end maybe some strategic outlook what might be good options to kind of merge activities in Mecklenburg-Vorpommern
01:02
but of course also for the entire research community. I mean this actually of course I don't really have to tell this audience so very much but we know that even though we have like activities going on trying to
01:23
implement open science strategies in the research community since what should I say 20 years maybe we are still quite far away from that those things are really the norm in the research community. This is a diagram from a
01:42
publication that came out of an unconference of the German reproducibility network in March 2022 and this should highlight that of course there are various different tasks necessary to finally reach like this aim of having the norm or having reproducible research and open
02:05
science the norm in research. So of course we all know we have to
02:23
adapt a research assessment criteria and some program requirements this is here highlighted in green then actually half of the circle is dedicated to the demand of offering training on the different levels and of course we also know that we have to build communities. Actually the blue circles
02:44
around are the tasks for the different research institutions that of course we have to monitor the impact and we have to allocate resources. So ARTS actually is now a grassroots initiative in Mecklenburg-Vorpommern
03:01
ARTS stands for open reproducible research and statistics. So we don't really have that many resources except quite some people being interested in this topic and trying the best and to promote this topic. So we had a
03:21
kickoff meeting in December 2020 and we are like an interdisciplinary scientific network and we invite everybody independent or on the career level from PhD students to professors to participate in our activities and this is actually also what takes place. So far we have like monthly
03:43
online meetings and then we offer workshops and training in open and reproducible science practices. So this is not only to show you the geography of Mecklenburg-Vorpommern and the different location of different
04:00
cities but this should also kind of make the point that of course the research community is very diverse. So we have the universities, we have the universities for applied sciences, we have several Leibniz Institute in Mecklenburg-Vorpommern, Max Planck, Fraunhofer and also to federal institutions and the point I would like to make here is that the
04:26
various type of research institutions have very different organizational structures and this makes it sometimes in real life a little bit different to offer activities or maybe even structures for everybody but
04:44
maybe this is one of the advantages of a grassroots bottom-up initiative that we don't really depend that much on this. So we try to speak to everybody, to invite everybody and this actually I can say was very successful
05:02
in the last couple of years. So far we really have active members from all over like the research institutes. So the topics covered by ORDS are listed here. Many of us have a computational data science background so
05:22
what we do are typically really hands-on trainings on those different topics but by the way I'm a trained biologist so I'm one of those people mentioned in the presentations like yesterday trying my best to do some programming
05:40
and doing it in a good way but I was never really officially trained in this. So the topics are here like the typical topics on reproducible data analysis, literate programming, of course management and versioning of software code, publication of data code and software, the whole topic of
06:02
licensing, research data management is of course also a very large part of it and actually it turned out that the electronic lab notebooks are of interest for many of our active members. So the learning curve is
06:20
really steep for most of the researchers especially because like the kind of true computer scientists are really the minority in the group and this is actually really our task to offer training to everybody and maybe on offer training on different experience levels but of course we all know this
06:45
is not always that easy for instance talking about or training on versioning of software code during one day can be very very difficult like you basically
07:00
only come to the introduction of the topic and you know you should actually have a lecture of an entire semester but of course this is something we are not able to offer. Kind of a little advertising block for those of you who don't know the Turing Way yet, this is really online material so
07:25
there is also the link shown the Turing Way and this material is developed by a very very large open source community and I like the material very much and of course we all know we don't have to reinvent the wheel all the time so I
07:43
use material prepared or summarized on the web page a lot myself so for those of you who don't know the Turing Way yet it's really good to have a look at it. So a few of the things we do in the name of arts actually one of our
08:04
favorites are reproducibility hackathons, one person mentioned it already yesterday this is really hands-on training. This type of event wasn't invented by ourselves actually this is a group of I think in the beginning
08:23
basically Dutch people and people from Great Britain who kind of developed this type of event and you can go to the web page and you find their material information and you can contact the people so it's a very open helpful community. So during an event like this our participants attempt to reproduce
08:46
published research typically they can choose from a list of proposed papers you can find this list also on the repro hack web page so it's publicly available and of course it comes along with published available code and data
09:03
and the participants try to reproduce the published research and this way they get really practical experience and reproducible as reproducibility as a really deal with real real data and the idea is that they get inspired by
09:22
the work of the others and the code and the data and how it is all done and of course it can happen that maybe some information is missing and the results or the people are not able to reproduce the published results and
09:41
also from this of course the ideas you can learn from it okay you I also should have given this information or that information but a very important point is that of course the aim of a repro hack is not to discredit the researchers I mean they all agreed on that other people can try to reproduce
10:02
their results they also want to learn from from this interaction and so it's really important to not give it like a negative touch so but it all together it should show the importance of really a careful documentation of the entire data analysis and from our experience those repro hacks yeah work
10:25
really good sometimes we do this very classical approach and sometimes we choose like for instance three different papers and then we we decide on three subgroups from beginners intermediate and expert level and maybe
10:41
the beginners try to reproduce it with like the same software that was used in the publication the intermediate group might then just choose just another software to reproduce it and like the third group wants to build a docker container around it or something like this so also this the structure
11:04
we offered during the last couple of years and what I did now repeatedly is also in a one-day course to teach people how they can set up a reproducible
11:21
workflow as I work with R this is kind of done in the in the R universe so it is like basically the four levels writing literate literate code in R and R studio how to use get version control in R studio how to push this
11:43
local repository to github and finally how to connect this repository with Zenodo and get a permanent DOI another little bit different example I wanted to present is our cooperation with the digital humanities also to show
12:05
that really we wish to to do a true interdisciplinary or set up a true interdisciplinary network the university library of Rostock actually was able to get finance the so-called culture hackathon maybe some of you
12:25
heard about the coding da Vinci thingies hackathons they are much much bigger than I think they they are like half a year and yeah what does it takes
12:41
place about half a year we had like a weekend and so digital humanities and also are quite active so we have a junior professorship and also the libraries and also can also in dry side actually they really active setting up digital projects and and like the one of the main aims of a couture hackathon
13:05
or also the coding da Vinci is to show the potential of publicly available cultural data for for others and so this was like the background and also participated also there so we had like a week and end of September
13:28
one-half years ago and it started that so several people presented cultural data open cultural data in like pitches and then it was like a
13:43
speed dating like very very quickly always in like one hour or something and you would pick as a or you would form groups pick a topic and then you would would start creating brainstorming ideas creating digital solutions
14:00
whatsoever like really from the scratch and we decided actually to to pick the This historic newspaper was digitalized at the University of Kreiswald and is available for everybody in the digital
14:21
library Mecklenburg-Vorpommern and this was published between 1910 and 1932 and 4227 scans are available of this newspaper it was published two to six times per week always during the beach season and actually the the very
14:47
interesting information giving on the first pages of these newspapers is a probably not really able to read this but here is like the name given mostly
15:03
of the husband with the family maybe the number of kids then the town the hometown where they come from and also like the place where they would stay in Swinomunde and what we do what we did during the like the first night was that we we run optical character recognition software to make
15:29
this information much like machine readable unfortunately but to be honest it was expected the quality of the result wasn't very good I mean the
15:44
uniform and so we had it was a quite some post-processing not necessary and during the weekend we kind of did it for for a few data sets then we set up a pipeline using getting the geo data of the locations from open
16:02
streetmap we linked information from wiki data to individual people persons from the list used the weather data and came up with a small product we actually liked it a lot and then it happened that together with the
16:20
Gostochar Abweitskreis digital humanities we could apply for some money at text plus an NFT I project and the idea is now it just started to develop really pipeline for the automatic extraction of the tabular data and it was clear that we should use more specialized OCR software and we
16:41
have like a list of maybe possible software we can use that is kind of dedicated to non-uniform tabular data with changing layout and this fracture fracture typeface and so we are now three years old we celebrated
17:01
our anniversary with a workshop day we could kind of get to be a slough from the Helmholtz German equal space center for a workshop improve collaboration using get and get lab and we also had two very nice keynotes on one on quality research need good working conditions by Rima Maria
17:26
and also one given by esta plumb from Delft University there she is a data steward five selfish reasons to join an open science initiative like like words for instance so we had a very nice day and this also shows you that
17:43
we are we are already in a in a in a German wide network and this actually could happen because we are member of this German reproducibility network so this network now like kind of the next level now we can kind of could start about like more strategies why are we member of the German reproducibility
18:03
network of course in this way we can find synergies with others and they try to yeah to to connect us the individual networks but they also go kind of already one level level higher so they really aim to advise
18:22
institutions for instance how to embed open science and stuff and so and this is not really what we can do as a as a small network and make them look for one what else is happening and make them look for pom at the moment since last fall there is this research data management network established
18:44
established is called that encompass info it's financed by the state for one year and this is actually more like a top-down kind of approach so those people really try to connect the different research institutions have
19:01
quite some focus on how to embed the training on open science and reproducible science in the curricula of the universities so this is like of course something different with the focus on on research data and research data management and kind of the main aim of this whole thing at the moment of
19:24
the darken coppers is to develop a strategy to come up with a landis initiative for this fortune start management so a state initiative for research data management and as you can see here on this on the map of Germany with Zaland and Saxon unhide make them for make them book for
19:42
common is one of the states where we don't have a landis initiative of your fortune start management so this is really now the goal to get this so we wrote already a white paper with many many participants from the different research institutions and of course we hope that we are successful in the end
20:02
so my last slide this is a quote from this publication listed there on open science to 0 kind of where I started my my presentation conversations about open science have reached the mainstream yet many practices still remain uncommon
20:24
so and how can we go a step forward so our approach what I presented now here today is to set up a new initiative maybe just another one I don't know but at least in micklin book for common we are kind of the only ones doing this
20:43
and of course by offering hands-on training and open and reproducible science practices we really try to get a step forward but we also what we also see and therefore I also talked about the slumness in each sativa so that we see that we really need to efficiently merge with different efforts to really
21:04
reach a coordinated change in research culture and we're open and reproducible science practices are hopefully the default in the end so thank you very much