We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Joining Forces: the Databib / re3data.org Collaboration

00:00

Formal Metadata

Title
Joining Forces: the Databib / re3data.org Collaboration
Title of Series
Number of Parts
24
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2014
Production PlaceNancy, France

Content Metadata

Subject Area
Genre
CollaborationismForcePresentation of a groupCollaborationismComputer animation
ForceCollaborationismMaxima and minimaPoint (geometry)InformationDescriptive statisticsNumbering schemeWebsiteHorizonSlide ruleData loggerMereologyPhysical systemRight angleMultiplication signWave packetStandard deviationSign (mathematics)System callNeuroinformatikType theoryMoment (mathematics)Instance (computer science)Service (economics)Process (computing)CodeSet (mathematics)Resultant1 (number)Content (media)Game theoryTerm (mathematics)Repository (publishing)Level (video gaming)SoftwareFrequencyOnline helpProduct (business)Arithmetic meanCausalityCommunications protocolCondition numberWindows RegistryDemosceneWhiteboardCASE <Informatik>Category of beingWordDatabasePosition operatorProjective planeOcean currentIdentifiabilityText editorPublic key certificateSoftware repositoryUniform resource locatorLibrary (computing)Context awarenessComputer iconPerspective (visual)View (database)Subject indexingFreewareCollaborationismPlanningBitFocus (optics)FeedbackRevision controlExpert system2 (number)Operator (mathematics)Numeral (linguistics)Source codeXML
Maxima and minimaScalable Coherent InterfaceRaw image formatWide area networkTranslation memoryForceMathematical optimizationWhiteboardGamma functionInterface (computing)Linked dataComputer networkOpen setOpen sourceTwitterGoogolSpreadsheetMedical imagingTouchscreenSystem administratorFormal languageData managementRepository (publishing)MultiplicationThumbnailQueue (abstract data type)Web pageRow (database)Perspective (visual)AdditionShared memorySearch engine (computing)Process (computing)Client (computing)Interface (computing)GeometryMultiplication signoutputInformationSoftware repositoryField (computer science)BitMobile appDifferent (Kate Ryan album)Virtual machineForm (programming)Natural numberWebsiteOpen setText editorWordTwitterMetadataSpreadsheetAuthorizationElectronic mailing listUniform resource locatorHypermediaGroup actionWhiteboardLibrary catalogIdentifiabilitySimilarity (geometry)Communications protocolType theorySubsetRight angleWeb 2.0Descriptive statisticsImplementationMaxima and minimaReal numberLink (knot theory)EmailStandard deviationLine (geometry)Greatest elementLevel (video gaming)Query languageCategory of beingDigital electronicsRule of inferenceGoodness of fitGame controllerSound effectRandomizationView (database)Sign (mathematics)CuboidTracing (software)Endliche ModelltheorieWeb browserPhase transitionService (economics)Cartesian coordinate systemLibrary (computing)Film editingTime zoneCoefficient of determinationData loggerCellular automatonOperator (mathematics)Database
ForceMathematical optimizationDisintegrationLocal GroupMetadataElectronic program guideSoftwareVotingSystem identificationWhiteboardWindows RegistryRepository (publishing)Witt algebraRepository (publishing)Text editorProcess (computing)Term (mathematics)Matching (graph theory)SoftwareSet (mathematics)Software repositoryWhiteboardTrailIdentifiabilitySource codeMetadataSimilarity (geometry)WebsiteWave packetAttribute grammarGroup actionEqualiser (mathematics)2 (number)Goodness of fitCommunications protocolProjective planeOpen setSelf-organizationVideo gameMathematical optimizationDecision theorySound effectService (economics)NumberDescriptive statisticsDifferent (Kate Ryan album)Coefficient of determinationRow (database)Functional (mathematics)Point (geometry)Code refactoringCore dumpConsistencyInformationTheory of relativityLattice (order)Workstation <Musikinstrument>WordCondition numberNetwork topologyExecution unitCartesian coordinate systemRight angleFood energyMereologyDistanceCodeDirection (geometry)Computer animation
InformationBitDescriptive statisticsLine (geometry)Operator (mathematics)ResultantGroup actionState of matterFeedbackDiscrepancy theoryLattice (order)Repository (publishing)Data managementRight angleText editorFormal verificationRevision controlXMLComputer animationMeeting/Interview
Right anglePoint (geometry)BitRepresentation (politics)Goodness of fitInformationDifferent (Kate Ryan album)System callDependent and independent variablesDatabaseLabour Party (Malta)Library (computing)Network topologyText editorMeeting/Interview
MetadataRow (database)Speech synthesisInformationWebsiteNetwork topologyMultiplication signAreaMereologyGroup actionOnline helpMoment (mathematics)Library catalogContent (media)DatenverknüpfungWeb 2.0Repository (publishing)DatabaseSoftware repositoryService (economics)Lecture/ConferenceMeeting/Interview
DatabaseLibrary catalogPlanningCollaborationismHoaxProduct (business)Service (economics)Table (information)Element (mathematics)MereologyMetadataFocus (optics)PrototypeProof theoryLecture/ConferenceMeeting/Interview
ChainShared memoryRepository (publishing)Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
Hello everybody, so we are Very happy and thank you to staying with us here for this last presentation Before we are going for the dinner and Michael and I are promising that we are going to hurry up So don't worry So yesterday evening Michael and I had a collaboration beer and there we just remembered that we first met at the open
repositories conference in Edinburgh in 2012 and There we had the first discussions about our collaboration Today we can happily present well an advanced stage of this collaboration to you and What we plan to do is that I first give you some information on the risk free data or project and following me
Michael will give you a short introduction to data and to our collaboration and future plans for this collaboration The status quo of reads free data org Can be seen on this slide
So currently we do have almost 800 repositories that are indexed in our registry So read free data org is a abbreviation for registry of research data repositories We actually know that this name is a bit well strange and you have to get trained to pronounce it correctly
the registry was launched in December 2013 So it's not long has not been a long time ago and Currently we are referenced or mentioned in a new numerous guidelines and policies so it's also in the guidelines from the European Commission on
Well how to handle research data in horizon 2020 projects? It is also important to mention that Read free data.org is a German project Funded by the German Research Foundation currently in the second project period so the first period just
started in 2012 and The project is also Well, they say this the current Project period will last until 2015 the end of 2015 and the project is also supported by Dini
Dini is the German initiative for network information and The project team members are coming from three partner institutions and that are Humboldt University Berlin where I'm coming from and there the Computer Media Service and the Berlin School of Library Information Science are dealing with research data topics and also another project partner is the
German Research Center for Geosciences and Helmholtz Association and also the Karlsruhe Institute for Technology One of the main or most important aspects of read free data.org is the Vocabulary that is used to describe research data repositories
so in the well, almost three years of project period we developed a comprehensive vocabulary that is now called a schema it's based on XML and Well, we are able to describe research data repositories by using 37 properties and
the Vocabulary or schema is not developed just in the project context but also with the help of the community and all the experts and colleagues like you and so thanks to the feedback of the colleagues That gave it to us and we could publish
The schema and the version 2.1 at the end of 2013 And what you can see hopefully you can see it here is a screenshot of one of the search results from greasydata.org So you can find here that you get information or a comprehensive set of information on a research data repository
So you have additional names of a repository you have the URL you have the subjects The research data is coming from and you have the content types of research data and you have much more information I'm showing it more details on the next slides and what you can see in the right hand corner
Is an icon set and this icon set? I know it's really small. I'm showing bigger this icon set is based on the categories that we are using to describe research data repositories and Here is an overview on the categories that we have so we have information on policies
So if the research data repository provides a research data policy, and we have information on legal aspects so by means of the terms of use and Well policies or information on the usage possibilities of the research data in the repositories We have information on technical standards for instance which
Persistent identifier system is used by the repository. We also have information on quality standards meaning that we are giving information on the certificates that research data repositories may have and We also have information you can see the green checkbox at the top of this
graphic that Means that the research data repository is reviewed. I'm coming to this point So Here you can find another view or a lot of perspective on the icon set that we have to give you all the information that you can gain from it and
we also have requirements for Research data repositories that are indexed in the registry So the research data repository must be run by a legal entity can be a library or University institution it must clarify the access conditions to the data and to the repository
As well as terms of use The repository must have an English We and it also must have a focus on research data We know that this is sometimes a bit difficult to say and this is why we have also developed
workflow for the Indexing description and reviewing of our data records so either we are doing own research or discussing with colleagues or Research data repositories are proposed to us or suggested by the community
And then it's indexed in read3data.org, but the data record is not yet publicly available then we have a editorial team that is Enriching this data record with the use of the vocabulary or schema that I mentioned And if there are any information missing or there are questions
We are going to have a discussion With the research data repository operators and afterwards when the data record is finished then It will be there will be a review process by our editorial board and At the moment the project
Has five editors in this boards and they need one to two hours for a review of one Research data repositories and after this whole process the research data repository is on the description of it is publicly available in the read3data.org database and gets this green checkbox that you've seen in the icon set as well and
with this I'm finished with read3data.org and I Yeah, I'm here on behalf of the whole read3data.org team and we would be really happy to have feedback from you Or if you have any suggestions for research data repositories that needs to be integrated in the registry. Please let us know
Great So Maxi just gave you an overview of the read3data and what I'll do is I'll give you a very quick overview of Data bib and then conclude by talking about how we're merging our efforts together And as I give you an overview of data bib
It should be probably immediately apparent to you why we're merging these two resources so if you go to data bib.org You'll see a page like this where you'll see a pretty typical search and browse interface. You can search data repositories There's an advanced search we break out subject
Repositories we also list the most recent additions as well as a feature Just a random repository of the day with with a thumbnail image But especially as we get into the the records and show you what a repository record looks like you can you can see the similarities Here and so it's a very very typical record with just a title
URL the authority so in other words whoever maintains the repository as well as Subject classification These are controlled using Library of Congress subject headings This adheres to the fast implementation of LCSH if you're familiar with that in addition, we've got an abstract which is the narrative description of the data repository as well as
Again, the thumbnail image of the repository we use the geocodes control vocabulary for countries so that we can geolocate and put the the repository on a map and Looking down at the bottom of the record you can see that we have different policies that are associated with the repository
So who can access the data in the repository when you download the data? What can you do with it? How can you what's the use policy for the the data as well as who can deposit data into the repository? And then if we can ascertain what year the repository started We'll include that information as well as the the type of repository
So you can see how these things kind of line up three three data and data bib Anybody can submit a new record into data bib just by filling out this form. It's a pretty standard web form again, we just have the sub minimal required fields and then
Just some controlled vocabularies for just a few of them. So geo names for the country and LCSH for the subject classification And what happens is that goes into an editorial queue the administrator would log on then that's me and see a screen similar to this and What what the what the admin will do is assign the record to up to three editors
And so I'll show you a list of our board of editors. We have editors for rut that represent different subject expertise represent different geographic locations different foreign languages Etc. And so we try to assign to the most appropriate
Reviewer if you will to vet the record And so what will happen is those editors will all get an email from data bib With a link saying you've got a new record to review they'll click on the link the log into data bib and they'll have an interface similar to this one where they can go ahead and Claim the record so the first editor to claim it
He or she then kind of owns the record and is the editor of record for it and they can go ahead and Accept modify and accept or reject the record and then what we do is we've got a little bit of a dashboard So that things don't get too old. We've got a pretty standard stoplight
red Yellow green so when the records become a certain have been sitting in the queue for so long They turn yellow and then they turn red and of course everybody can see each other's records and how long they've been sitting in there So we we hold ourselves accountable as a peer group So the original advisory board for data bib
Is a displayed here again trying to just represent the global nature of the endeavor The original funding that we got to do data bib was from the Institute of Museum and Library Services in the u.s. The IMLS they gave us sparks innovation national leadership grant and so These folks helped us and gave us input especially early in the process in
Designing the the resource This is our editorial board again representing the you know, the the kind of the global perspective multiple disciplines
trying to Be able to identify data repositories in many countries in particular and underrepresented countries and in Languages other than English all of the catalog records in data bib are in English though So I showed you the website where you can user can interact with the data bib and do searches and that kind of thing
We also support a ton of different interfaces. So every time a new record is added to data bib At the data bib will tweet it on Twitter this sounds like it really only took about 20 minutes to code and it turned out to be a great thing because People who use that repository or the repository managers see that and they retweet it and it promotes
access to the repository and then the other thing we found is that they'll visit the repository entry and Be dissatisfied with it and enhance it and revise it. And so we have some some repository records that have been have had
Enhancements, you know 30 40 enhancements that we can track back to Twitter retweets So trying to leverage social media. So if you do RSS, you can subscribe to data bib as an RSS feed every night We automatically export the entirety of the metadata records to a Google spreadsheet because people like spreadsheets. They're real easy to use
If you prefer something more structured We also do an RDF XML dump so you can download all of the data as RDF XML If you're into federated search you can connect a data bit using an open search interface So if you have a web browser, for example, usually like in the upper right hand corner
You can do like a Google search or Yahoo search or you can specify a search engine Open search enables you to add data bib to that or to create a client application That would query data bib and use a use it as a machine interface We also expose all of our records as linked data in the form of RDF a so if you visit a web page that represents
A repository and you view the source. You'll see the RDF triples are embedded in the HTML again trying to Foster machine access to the to the information we have we also use the share this app to do People want to like it or you know share it or what have you plus one it?
Etc. So the the the basic idea here is maximum transparency Maximum openness so we everything is creative commons zero protocol When people submit records they click through an agreement to that effect We provide these API's we provide the source code in Google code, so
adhering to principles of open bibliographic data and Trying to eat our own dog food in terms of the open access that we promote to our community so this brings us to Joining forces and so we borrowed some good ideas from the from the French and Liberty equality and fraternity
We boiled that down into five principles of agreement, however And if you think about it this is actually kind of a cool sociology experiment because you have two different groups in two different countries that more or less had the same idea and You know we were funded by different agencies and once those you know that train is on the tracks
You know it's not coming back to the station, so we both proceeded you know down In the similar direction, but down different tracks reached similar decision points and made different kinds of decisions And now three years later are able to take really the best features of both and bring them together So how are we able to do that or how are we able to begin that process and it was by deciding?
instead of focusing on how we're different focusing on What what we want to do together? What do what are our principles of agreement and so one thing we agreed on was openness so our merged resource will continue to use CC0 protocol for all of its metadata
The second principle of agreement was optimal quality assurance and so re3 data has its own review process Data bib has its review process and so what we did was we decided to put them together and have a two-stage review process Where the initial review will be done by an editor from an editorial board like you see with data bib
But then there'll be a second after it's accepted. There'll be a second Review for quality and consistency and so we're trying to get again take the best of both worlds number three in terms of developing new functionalities
We have done an assessment to see you know what are the what are the qualities of re3 data that have been very effective What are the qualities of data bib that have been really effective? Let's take the features and functionality and the best of breed and put them both together Just as just in principle number four Shared leadership so the idea that
At the end of the next calendar year this merged resource We would be equal partners in this endeavor and then last but not least certainly is to seek some kind of Sustainability so is there an organization that is aligned with our Purpose and mission who would be willing to take on this re3 data data bib merged entity
Where it would have a life that would be longer than just either of the individual projects so we found such an organization this one data site and so Really, it's a is a perfect match for us There's very good alignment and missions in terms of data site promoting a site ability and attribution of data use
and in assigning identifiers for data sets And data bib and re3 data in our interest in identifying and promoting the data repositories that contain those data sets and so we brought a proposal to the
The data site General Assembly and board of directors in Dublin back in March of this year That was unanimously approved and we were able to announce at the research data alliance meeting That followed our principles of agreement and our merger We Right off the bat decided to exchange metadata records
So we did a dump out of data bib and brought it into re3 data Did it dump out of re3 data and brought it into data bib? Kind of as a goodwill gesture But also to understand kind of what we were getting into as you as you look at you know Aligning the two different schemata and how you map them and and where there'd be truncation of information where there'd be problems
And so where we are presently is creating a working group within data site That will be a co led by myself and someone from re3 data And what we want to do is guide the reconciliation and the merger of these two resources and our metadata
So this also includes some refactoring of software So we'll use the re3 data software stack And so we need to look at the functionality of data bib that we want to preserve and the Apis and move them over and develop them on re3 data As well as the the hosting of the the resource after the end of next year
We also want to formalize our processes So have good documentation in terms of the review process as well as our governance how the how the how the re3 data fits Into data site and is is governed under its auspices and then also importantly we want to identify new opportunities
Where there's synergy between these two activities of going out and identifying data repositories minting do eyes and related activities for data sets and See if there are new functionalities that we can develop that we can realize and advance through through data site and
so What's in a name so this is one of the harder things that you do when you bring two projects together is decide Well, what are we gonna call this thing? Right because it's your it's your baby, right? And so what we decided was that the shared resource would be called re3 data org So the resource is re3 data org and what we decided to do was retain the data bib name
For the editorial board in the editorial process so much like you have a PC that's manufactured by someone else But it's powered by Intel you have re3 data org. That's powered by data bit. And so we're able to Very gladly announced this this merger of efforts
It's it's not easy. It's made easier by collaborating over beer But so far it's been it's been a very good a very good process and we're looking forward to Being able to merge the two websites and have a unified service in the near future. Thank you
yes, thank you very much for your talk and Any questions? Yes Yes Do you contact the repositories to make them check the entries?
When you you have a description of a repository I Know there are many qualified people looking at everything But do you contact the people you are describing to ask them to give comments? Yes, as I said when we when we are not sure about things when there are questions or missing information we are contacting the
Repositories just to check the result, but when you have something missed we get a lot of feedback from Concerning the results. So yes kind of With data bib we we defer to the expertise of the editor
So if the editor feels that he or she, you know needs to contact the repository manager to get more information or to get verification of information It's it's in their purview to do that, but it's not a requirement And so this is one of those I think Wrinkles that we we have to iron out with the with the with the merger of the two resources
We we also yes, I will tell you in private what I found in the description of my own repository I'm glad to see it there but Thanks, and you know to be sure Anybody can suggest an edit to the record and improve it enhance it to date. We haven't had any conflicts
But it will be interesting to understand what the discrepancies are and how to resolve them. It's not a discrepancy It's just a vocabulary word, which is not the right one Astrology instead of astronomy to say what is written Yeah, sorry, no, no, it's fair it's fair it's fair
And you know the president of a French Republic 30 years ago made the same thing in a big meeting about research So you are not the only one is not my point. It's just that these things happen to even the best people
So a question, so how are you funding the editorial piece are they volunteers are they explain a little bit about that? Right. So About two years ago We put out a call for editors and most of the responses that we got were from librarians who are engaged in in data
From different disciplines and so the the call that we advertised I can't remember exactly but we got 70 or 80 responses to it and and very good people all of them very good people But we tried to do was pick representation of different countries of different disciplines and try and put together kind of the best team that we could
They're volunteers. They're volunteers. Yes. So many many of the librarians. They see this as a Their contribution to the profession. So making information accessible information including research data and so this there's a whole long rich history and tradition of
Bibliography so Thank you. So because Mark Parsons isn't here. I now have to ask the questions I have two related questions. So the first one is
What are you doing when you have two entries that describe the same? Repository, how are you merging them together? Are you deciding? Who wins do you kind of smudge them together and take the best of both? And then the second related question is when will the merged entity be available for people to use or is it merged now
So we already have the metadata records from data But when we see data talk and we are trying to merge each record that is doubled and that means that we are using the content information from data web and enriching it with the
by the help of the We have So we don't we don't have a timeline yet for when we'll bring the two resources together into the same website that's part of the Mission of this working group is to help figure that out So the metadata exchange that we did was really more as an exercise as goodwill
and so we did do some deduping based on repository title and URL and then I think that at least for data bib we picked the record we like the best Which one seemed to be the most complete So at the moment if I was trying to
Get a sense of what's available in a subject area I would need to search both is that correct So Technically speaking. Yes, strictly speaking. Yes. We did do this exchange of metadata records, but the world's moved on Yeah, right exactly So that's that's why the sooner that we can bring these two things together and reduce this confusion that the better
Amber button from data one We've had a lot of interest in the community from the community for a data repository matchmaking service
and Something that is an interactive tool beyond just a kind of a database or a catalog Is that something that you guys have thought about or have plans to develop? So the I've heard this idea to and we've talked about it and I think it comes down to like what what hooks? So how do you make the match? You know, what are the metadata elements? Is it matching on funders on?
Geography on subject or what-have-you. So what we've what we've tried to do at least on the data bib side is Expose everything we have and so if something is useful for a service like that To be able to put together a quick, you know proof of concept and prototype it
It's it's not something that's part of the immediate roadmap right now Our focus is on the merger, but as I was saying there are all these other Opportunities I think that will come out of this this collaboration and that that would be on the table
Are you intended intending to merge also with bio sharing initiative? Yes, we already did We yeah, we have a cooperation with bio sharing
So are you also merging your data with them? Well, we we are indexing the repositories that are proposed by bio sharing and So yeah, it's kind of a merger of data