Findability of Research Data and Software Through PIDs and FAIR Repositories
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 2 | |
Number of Parts | 9 | |
Author | 0000-0002-6454-335X (ORCID) E-5011-2016 (RESEARCHERID) n0000-0002-6454-335X (VIVO) 57188695916 (SCOPUS) 0000-0001-5135-5758 (ORCID) | |
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/37823 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
6
7
00:00
Virtual machinePrincipal ideal domainForm (programming)Structured programmingMetadataPrice indexUniqueness quantificationMereologyExecution unitStandard deviationData managementRepository (publishing)HypermediaCASE <Informatik>FamilyIdentifiabilityAuthorizationPosition operatorWordPrincipal ideal domainPhysical systemAxiom of choiceSystem administratorDigital photographyMetreMultiplication signBitContext awarenessCommunications protocolMetadataIntegrated development environmentRevision controlSet (mathematics)SoftwareInterpreter (computing)VotingView (database)Descriptive statisticsWeb 2.0Server (computing)Goodness of fitState of matterDecision theoryDigital object identifierXMLUMLLecture/ConferenceComputer animation
07:16
Virtual machinePrincipal ideal domainMetadataStructured programmingForm (programming)Price indexUniqueness quantificationContext awarenessCharacteristic polynomialRepository (publishing)Principal ideal domainPhysical systemMultiplication signBendingBit rateSet (mathematics)InformationDigital rights managementWeb pageCASE <Informatik>Repository (publishing)System administratorLink (knot theory)Graphics tabletParameter (computer programming)Vulnerability (computing)Different (Kate Ryan album)EmailObservational studySampling (statistics)ParsingGraphical user interfaceState of matterElectronic mailing listDigital signalInformation privacyService (economics)Connected spaceText editorTraffic reportingPoint (geometry)Row (database)File formatData structureVirtual machineMetadataMetreWebsiteAreaoutputDecimalExecution unitMoment (mathematics)ComputerDigital object identifierUniform resource locatorAuthorizationOpen setPrice indexElectric generatorSound effectGoodness of fitComputer animation
14:25
Characteristic polynomialMetadataContext awarenessRepository (publishing)Principal ideal domainMaxima and minimaSoftware repositoryRing (mathematics)Identity managementKey (cryptography)Self-organizationIntrusion detection systemSoftwareService (economics)HypermediaMereologyMetadataFormal grammarPoint (geometry)Moment (mathematics)IdentifiabilityGraphical user interfaceServer (computing)Different (Kate Ryan album)Physical systemSource codeRepository (publishing)BitDatabaseVirtual machineText editorOntologyForm (programming)ComputerState of matterElectronic mailing listGroup actionClassical physicsData loggerInformationPrincipal ideal domainOrder (biology)Connected spaceFreezing1 (number)System callFile formatMultiplication signAuthorizationView (database)MetreSocial classWebsiteInclusion mapAngleSelf-organizationSubject indexingDigital object identifierRoundness (object)Intrusion detection systemLocal ringStandard deviationInformation privacyAdditionUniform resource locatorXML
21:33
Ring (mathematics)System identificationKey (cryptography)Self-organizationIdentity managementIntrusion detection systemSoftwarePower (physics)Moment (mathematics)Self-organizationSet (mathematics)Principal ideal domainPhysical lawComputer fileNatural numberCompilation albumField (computer science)Degree (graph theory)Beta functionMathematicsRegulator geneAddress spaceOrder (biology)Logic gateGame controllerEmailDigital object identifierProfil (magazine)MetadataArithmetic meanGroup actionComputer animationLecture/Conference
25:01
Ring (mathematics)Identity managementKey (cryptography)Self-organizationIntrusion detection systemPrincipal ideal domainSample (statistics)PhysicsExecution unitTheoryInformationSheaf (mathematics)Harmonic analysisGame controllerInternet service providerMaterialization (paranormal)NumberWebsiteSampling (statistics)Software developerProjective planeData managementField (computer science)Innere EnergieComputer fileGroup actionMultiplication signWorkstation <Musikinstrument>Sound effectMoment (mathematics)PhysicalismSubject indexingEndliche ModelltheorieArithmetic meanGraphics tabletPrincipal ideal domainLatent heatSystem callSet (mathematics)Service (economics)MereologyPhysical systemMetadataGeometryIntrusion detection systemProfil (magazine)Row (database)PlanningIdentifiabilityDifferent (Kate Ryan album)Server (computing)Lecture/ConferenceXML
30:59
Principal ideal domainSample (statistics)Intrusion detection systemArchaeological field surveySystem programmingPhysical systemComputer configurationService (economics)InformationDigital object identifierSoftware1 (number)Digital object identifierTwitterRepository (publishing)Service (economics)InformationPrincipal ideal domainIdentifiabilityArchaeological field surveyAxiom of choiceGroup actionInternet service providerWindows RegistrySystem administratorGoodness of fitPhysical systemComputer configurationPower (physics)Multiplication signNatural numberPhysicistOrder (biology)Information privacyState of matterGraphical user interfaceBlock (periodic table)Data managementText editorGraphics tabletTheory of everythingForm (programming)Moment (mathematics)Object (grammar)Context awarenessVideo gameXML
36:47
Archaeological field surveySystem programmingRepository (publishing)Physical systemDigital object identifierPrincipal ideal domainComputer configurationService (economics)InformationSoftwareMetadataFile formatVirtual machineMetric systemLie groupAxiom of choiceDigital signalObject (grammar)CollaborationismMereologyPrincipal ideal domainBitPhysical systemDifferent (Kate Ryan album)IdentifiabilitySystem callInformationFile formatSet (mathematics)Web 2.0Metric systemMetadataGraphics tabletFrequencyMultiplication signMessage passingValidity (statistics)Key (cryptography)Subject indexingHarmonic analysisKnotComputerInternet service providerMatrix (mathematics)Form (programming)Source codePoint (geometry)Beta functionConnected spaceInclusion mapProfil (magazine)HypermediaObject (grammar)Descriptive statisticsVirtual machineProjective planeParsingSystem administratorRepository (publishing)Sheaf (mathematics)Ocean currentNumberInformation systemsInformation privacyProjektiver RaumMetreOrder (biology)Endliche ModelltheorieService (economics)Mathematical analysisArithmetic meanWebsiteTouch typingField (computer science)Local ringSoftware developerXMLComputer animationLecture/Conference
45:12
Digital object identifierAxiom of choicePrincipal ideal domainObject (grammar)Digital signalCollaborationismMetadataSoftwareCodePhysical systemStandard deviationUniqueness quantificationIntegrated development environmentFocus (optics)Function (mathematics)VideoconferencingComputer-generated imageryEndliche ModelltheoriePrincipal ideal domainService (economics)Client (computing)MetreLink (knot theory)Traffic reportingRepository (publishing)Set (mathematics)Level (video gaming)Profil (magazine)View (database)Graphical user interfaceCollaborationismFunctional (mathematics)ComputerInternet service providerComputer fileInclusion mapLatent heatMultiplication signNumberModal logicMoment (mathematics)VelocityHarmonic analysisOrder (biology)Physical systemAxiom of choiceSelf-organizationMereologyFlagCombinational logicLibrary (computing)Automatic differentiationCycle (graph theory)Term (mathematics)Video gameCompilerFocus (optics)Data structure1 (number)WebsiteUniform resource locatorType theoryFreewareSource codeProjective planeObject (grammar)SummierbarkeitUniverse (mathematics)Digital object identifierFile formatArithmetic meanStandard deviationSystem administratorMetadataProxy serverComputer animation
53:23
Digital rights managementLevel (video gaming)Image registrationMetadataComputing platformInternet service providerDigital object identifierClient (computing)Data structureService (economics)Physical systemHypermediaComputer-generated imageryImage resolutionPrincipal ideal domainQuicksortDigital signalService (economics)Image registrationType theoryPrincipal ideal domainOrder (biology)Local ringAbklingzeitSelf-organizationVideoconferencingGraphics tabletRepository (publishing)IdentifiabilityCASE <Informatik>Data centerGraphical user interfaceStudent's t-testQuicksortMultiplication signForm (programming)Distribution (mathematics)WebsitePhysical systemMetadataDifferent (Kate Ryan album)Workstation <Musikinstrument>Descriptive statisticsUniverse (mathematics)Information privacyField (computer science)NumberMathematicsComputing platformTotal S.A.Client (computing)HypermediaMereologyMetropolitan area networkLevel (video gaming)Medical imagingKey (cryptography)Process (computing)Library (computing)Wave packetInformationUniform resource locatorVirtualizationArc (geometry)Digital object identifierBitWhiteboardComputer configurationTraffic reportingObject (grammar)Query languageComputer animation
01:01:34
SoftwarePrincipal ideal domainImage resolutionQuicksortDigital signalInformationDemosceneNumbering schemePhysical systemPoint (geometry)Digital object identifierRepository (publishing)Latent heatGeneric programmingZeno of EleaDisintegrationNatural numberSelf-organizationPrincipal ideal domainInformationGraphics tabletRepository (publishing)Coordinate systemIdentifiabilityPhysical systemCASE <Informatik>File formatOpen setProfil (magazine)Projective planeDigital libraryMetreOrder (biology)WebsiteInternet service providerArtificial lifeInformation systemsSource codeSampling (statistics)Functional (mathematics)Data centerForcing (mathematics)Traffic reportingTorusMultiplication signTotal S.A.Process (computing)Perfect groupShared memoryService (economics)Online helpINTEGRALNumberOcean currentEndliche ModelltheorieDisk read-and-write headVideo gameRoundness (object)SoftwareDigitizingError messageDigital object identifierSet (mathematics)Different (Kate Ryan album)Metric systemElectronic mailing listPoint (geometry)MetadataXML
01:09:45
Shared memorySoftwareDisintegrationZeno of EleaDigital object identifierCodeEmailUsabilityBackupLanding pageDemo (music)Convex hullOnline helpReading (process)VideoconferencingDigital object identifierUniform resource locatorField (computer science)CodeSelf-organizationSource codeSoftwarePhysical system2 (number)Server (computing)Block (periodic table)Computer configurationDressing (medical)BitExterior algebraDependent and independent variablesRepository (publishing)MetadataDemo (music)Type theoryLevel (video gaming)Generic programmingHash functionObject (grammar)INTEGRALIdentifiabilitySpeech synthesisRandom matrixAdditionDifferent (Kate Ryan album)SynchronizationRoundness (object)Flow separationKey (cryptography)View (database)WordLatent heatVideo gameService (economics)Computer animation
01:13:42
SoftwareDigital object identifierZeno of EleaSlide ruleBitView (database)Limit (category theory)Repository (publishing)Digital object identifierMultiplication signElectronic mailing listProcess (computing)Task (computing)Sound effectProjective plane2 (number)Graphical user interfaceOrder (biology)Set (mathematics)CuboidXMLProgram flowchart
01:15:20
Zeno of EleaDigital object identifierCodeMaizeRepository (publishing)Web pageLoginAliasingUltraviolet photoelectron spectroscopyLemma (mathematics)Self-organizationEquals signPasswordField (computer science)Decision theoryView (database)Computer-generated imageryControl flowInternationalization and localizationGlobale BeleuchtungScripting languageSoftwareSoftware frameworkZeitdilatationFormal languageVolumenvisualisierungComputer fileTask (computing)ImplementationComputer programmingMetric systemPhysical systemGraphical user interfaceAreaLeakComputer iconEmailAddress spaceAuthorizationFunction (mathematics)Wave packetMaxima and minimaHand fanShared memoryLink (knot theory)Execution unitMusical ensembleLocal ringRankingRepository (publishing)Vector spaceBitCodeLatent heatAddress spaceLibrary (computing)EmailWeb crawlerCartesian coordinate systemWeb pageMenu (computing)XMLComputer animation
01:16:42
Link (knot theory)UsabilitySoftwareRepository (publishing)Read-only memoryExecution unitRepository (publishing)HookingElectronic program guideDecision theoryCodeSet (mathematics)Goodness of fitRevision controlSoftwarePhysical systemOrder (biology)Electronic mailing listForm (programming)Task (computing)Computer animationXML
01:18:02
SoftwareWeb pageLeakZeno of EleaDigital object identifierMathematicsSample (statistics)Information securitySimultaneous localization and mappingUsabilityPersonal identification numberExecution unitRepository (publishing)Revision controlOrder (biology)Right angleSelf-organizationBookmark (World Wide Web)SynchronizationDifferent (Kate Ryan album)Form (programming)Beat (acoustics)Electronic mailing listMetadataNumberXMLComputer animation
01:19:27
Range (statistics)SoftwareInformation securityMathematicsDialectSample (statistics)Graphic designRankingDigital object identifierZeno of EleaLink (knot theory)Revision controlMathematicsRevision controlDescriptive statisticsFile formatWeb pageControl flowLine (geometry)Event horizonAdditionForm (programming)NumberMetadataProduct (business)Point (geometry)XMLProgram flowchart
01:21:11
Maxima and minimaDigital object identifierZeno of EleaSoftwareMaizePlanningResultantFlow separationSoftware testingProcess (computing)Open setSoftwareLatent heatDigital object identifierWeb pageCodeProjective planeLink (knot theory)Multiplication signPoint (geometry)Branch (computer science)Default (computer science)Ocean currentCASE <Informatik>Revision controlXMLProgram flowchart
01:23:04
Digital object identifierZeno of EleaLink (knot theory)Physical lawSoftwarePiKeyboard shortcutStapeldateiCodeComputer fileReading (process)Video gameCycle (graph theory)MetadataDigital object identifierMessage passingNumberType theoryXMLProgram flowchart
01:24:26
Physical lawMathematicsZeno of EleaDigital object identifierSoftwarePhase transitionChainLaserRight angleStapeldateiDescriptive statisticsDigital object identifierWeb pageComputer fileLanding pageEntire functionSet (mathematics)Ocean currentNumberRepository (publishing)Multiplication signFile archiverIdentifiabilityCASE <Informatik>Projective planeTraffic reportingSoftware testingElectronic mailing listPoint (geometry)XMLProgram flowchart
01:25:54
Descriptive statisticsMultiplication signBitCASE <Informatik>Formal languagePoint (geometry)Programming languageSelf-organizationArithmetic meanNatural languageForm (programming)Process (computing)Order (biology)SoftwareDatabaseWordSet (mathematics)Client (computing)XMLComputer animation
01:27:50
Maxima and minimaDynamic random-access memoryComputing platformPhysical systemRepository (publishing)SoftwareUniform resource locatorIdentifiabilityPhysical lawProjective planeGroup actionCASE <Informatik>Latent heatMereologyClosed setEstimatorDeutscher FilmpreisMetadataControl flowRight angleDomain nameFile archiverDemo (music)XML
01:30:00
Uniform resource locatorDynamic random-access memoryOrder (biology)CASE <Informatik>SoftwareGraphical user interfaceMultiplication signSet (mathematics)Revision controlDirection (geometry)Computer configurationLink (knot theory)Mobile appOffice suiteRepository (publishing)Roundness (object)Projective planeWebsiteChainRow (database)Digital object identifierSelectivity (electronic)MetadataComputer animationXML
01:32:02
TrigonometryDynamic random-access memoryElectronic mailing listAuthorizationSet (mathematics)CASE <Informatik>Repository (publishing)Endliche ModelltheorieProjective planeLogic gateMereologyXML
01:33:24
SoftwareTwin primeIRIS-TMenu (computing)Wireless Markup LanguageMathematicsRevision controlExecution unitMedical imagingDescriptive statisticsMathematicsRaw image formatField (computer science)Link (knot theory)Goodness of fitRight angleRing (mathematics)Revision controlElectronic mailing listProjective planeDigital object identifierRoundingStapeldateiNumberXMLProgram flowchart
01:35:00
WebsiteRevision controlSpacetimeMenu (computing)Lie groupGamma functionHardware description languageDigital object identifierZeno of EleaRow (database)MIDIFamilyMultiplication signProjective planeNormal (geometry)Flow separationRevision controlVideo gameDigital object identifierExistenceCycle (graph theory)Set (mathematics)Time travelBlock (periodic table)Computer configurationDifferent (Kate Ryan album)Point (geometry)Order (biology)CASE <Informatik>MereologyElectronic mailing listComputer animationXMLSource code
01:36:36
LoginZeno of EleaManufacturing execution systemDigital object identifierBackupLanding pageRevision controlDemo (music)Repository (publishing)Maxima and minimaSoftware repositoryDecision theoryOnline helpChecklistCondition numberTerm (mathematics)Letterpress printingStrategy gamePublic key certificateStandard deviationDublin CoreMiniDiscIntegrated development environmentCore dumpUniform resource locatorData typeMetadataComputing platformSimultaneous localization and mappingMetreWindows RegistryMIDIRepository (publishing)BitBackupForcing (mathematics)Row (database)Public key certificateCASE <Informatik>Strategy gameProcess (computing)Principal ideal domainAbstractionDatabaseSoftwareNumberFamilyInformationSlide ruleDublin CoreSoftware engineeringLevel (video gaming)Data qualityDivisorEntire functionType theoryProjective planeChecklistGroup actionObject (grammar)Expert systemStandard deviationStatement (computer science)Point (geometry)Content (media)MetadataUniform resource locatorFile formatDesign by contractOnline helpFerry CorstenLink (knot theory)Latent heatWordLibrary (computing)outputAreaPresentation of a groupMultiplication signVariety (linguistics)Frame problemRule of inferenceDecision theoryStaff (military)Perfect groupShared memoryView (database)Digital rights managementSearch engine (computing)Software developerFlow separationDomain nameAuditory maskingWebsiteSoftware testingPlanningXML
01:43:52
Functional (mathematics)Source codeDatabaseDigital filterType theoryContent (media)Standard deviationSystem programmingPrincipal ideal domainMenu (computing)View (database)SoftwareLine (geometry)Spring (hydrology)SummierbarkeitRange (statistics)Gamma functionMaxima and minimaProgrammer (hardware)Repository (publishing)Mathematical analysisRepository (publishing)BitSoftware testingComputer configurationSingle-precision floating-point formatSheaf (mathematics)Physical systemPrincipal ideal domainCASE <Informatik>InformationFilter <Stochastik>Demo (music)Row (database)Digital object identifierOpen setAreaComputer animationSource codeXML
01:45:14
Execution unitComputer-generated imagerySummierbarkeitNormed vector spaceComputer networkRevision controlFilter <Stochastik>Computer configurationRow (database)InformationTraffic reportingMathematicsResultantRepository (publishing)Electronic mailing listDomain nameType theoryView (database)XMLUML
Transcript: English(auto-generated)
00:00
First part, findability. So we already considered it in brief what means findable in the sense of fair. So for F1, that's the first principle, we have the metadata and the data that they are assigned globally unique and eternally, eternally is very interesting, persistent identifier.
00:26
A bit on the background here, eternally persistent, maybe be a bit of a double one because one should include the other. So we have this whole PID environment here, which is the very first thing
00:49
the authors thought about when it came to the findability aspect, which is very nice. But as a persistent identifier in itself, it only includes generic metadata to make a dataset citeable, so that is actually not enough.
01:13
And so we think they added here the F2 that the data described with what they call rich metadata.
01:22
Rich metadata may include the provenance metadata of course, so the description, how an experiment was designed, what software was used, which version of the software was used, a short abstract, some keywords maybe.
01:40
So if you think about a standard publication and what the journals want of you to submit, when you submit a manuscript actually, you can also apply many of those principles when you try to publish a research dataset. So the wording sensor is not so different, so you may be familiar with it, many data
02:06
repositories also demand keywords or something like that of you, but it's all an optional thing to submit. So actually you just need to provide often the metadata behind the PID that is necessary to get a
02:25
PID, like your name as an author, a title, and of course then automatically the year it was basically published. And you get your DOI and that was actually it, but it does not cover the rich metadata.
02:44
So here it's also a mandate for the repository administrators to make sure that the scientists have the opportunity at least to also provide this kind of rich metadata, whatever it might mean for your discipline, because that's not specified here.
03:03
It's up to the communities to define for themselves what means rich metadata. The PID one is clear, but even here you have about 20 persistent identifiers to choose from, and you can even develop your own.
03:21
So here the community is the one, they have to decide what are the standards, what are the protocols, what are the guidelines to be in place for their needs. So here the FAIR is where the findable part is where we open to interpretation.
03:47
Same goes here a bit that's more on the technical side, F3, that data and metadata are registered or indexed in a searchable resource. That is often covered when you get a persistent identifier, but it's not always the case.
04:04
There are about 20 identifier systems out there, and only some of them are linked to a searchable resource, because you can also set up your, for example, your local handle server. It's about 50 dollar or euro a year as an institute, you get your handles, which are then distributed within the institute, and that's basically it.
04:31
And you can shut down, in theory, the server anytime you like. And then the PID here, the local handle, is not searchable anymore.
04:42
So it's not automatically connected to an international or national searchable resource, like a discovery in this web of science, like Google, like anything you like. So it's not automatically connected. So that means the choice here of the
05:04
persistent identifier has to be taken with some care and consideration in mind. The next one, F4, metadata specify the data identifier. That's also a principle that leaves some room for interpretation.
05:26
Because the metadata standards behind the identifier systems can vary, and they can be adapted and changed and modified all the time, and there's not always a protocol behind it, or a protocol which is in itself, again, fair in the context of the community.
05:48
So if one chemist, let's say in the US, means that, okay, for this experiment I have to have a certain kind of metadata set, and some of
06:00
it is in the standard one, and some of it is in the standard two, and I combine it and make my new standard three, or something like that, then maybe the colleague in Europe or in China or wherever doesn't have to agree with that, and it's not part of a community decision. So for the same lab experiment, maybe there may be five, six, seven,
06:22
what they call standards, which are not reviewed, which are not ranked, which are not discussed within the community. So it's kind of, it's a tricky, it can be a tricky issue to be addressed here.
06:42
So what is your or your repository's role as a teacher, maybe as an instructor, as a person who knows about research data management, and gives this knowledge away to colleagues.
07:03
So you know that it's good practice to assign a globally unique persistent identifier up on publication or a draft at the upload, because you can also get a PhD, which is not searchable yet before the publication state.
07:24
And some repositories offer this feature, especially for the DOI. Repositories also provide a metadata schema in the human and machine-readable formats, so you can look up
07:41
your XML, and XML is a good example of being both, of being machine-readable and human-readable. But as a researcher, you have to be, of course, be open enough to learn also new formats and to not be afraid of it, and not be afraid to ask specifically.
08:03
Starting off with the metadata, most of the times you have the PhD, you have the author names, you have the subject areas, and so on. And this metadata input and the metadata generation should be supported by these repositories.
08:23
It can be an XML schema, you can upload it maybe really easily. You can use an API or ask your data creators or your institute actually to connect a link or to have an open API. You can use to submit the data, and especially when you have lots of datasets, that would be actually the way to go.
08:51
So not manually typing in all the metadata information, because that's really time-consuming, and if you have more than one or two datasets, it may take ages.
09:02
So here, using the APIs that are provided by most of the good digital data repositories may be one way to go. Then the repositories themselves support, of course, effective searching.
09:21
What it means, I will show shortly with an example of a DUI. So if they have this PID system in place, and they use a common one, another local handled server, for example, then they are often connected to many different discovery indices. And data publications can be found actually.
09:47
Yes, and the last point, I guess we covered already. So support in general, both should be supported, meaning a manual data and metadata upload, as well as the more automatic one.
10:07
As a scientist, there are also some roles here. So one recommendation could be to check datasets that you use if there's a PID in it and cite it.
10:23
Use the PID, please, to cite the dataset and not the URL or anything else. And that would be actually the way to go. Interestingly, many journals, when you submit a manuscript and submit your reference
10:44
list, and you do not really use Zotero or another reference manager, some scientists still tend to manually create their citation list. Then they often only put the URL and not the DOI, even if the paper or the dataset they cite has a DOI already.
11:06
And that happens quite often. There has been a study two years back from Herbert von Sompel, and he discovered that actually between 60 and 70 percent of all the publication records across about 5,000 journals,
11:22
they used the URL, and there were many broken links there, instead of the DOI, even if of course the publications had a DOI. So because journals use the DOI system and have been using it for, I would say, at least 10 years, they use COSREF as a DOI agency,
11:44
and they are so familiar with the system. We think that even the editors should be also aware that there should not be URLs included in the reference list, but the DOI of course. But that's still not the case, so there are many examples of broken URLs out there.
12:07
And so, yeah, for the future, please, maybe it would be nice if you could include the DOI. Or the other persistent identifier, of course.
12:20
Then if you use a repository, you should make sure that the repository offers a PID system. You should give preference to repositories that automate this, so that you have no way to publish a dataset without a DOI.
12:41
Because if you want to have publication, if you really want this public appearance of your data, you should make sure that it actually uses a PID system. And if that's not the case, as we'll see later, that some especially local repositories do not include a PID system yet,
13:05
that one argument is cost sometimes, because the repository administrator sometimes in some countries have to pay to get a PID system account, basically, and to implement it into their repository structure.
13:24
So they usually, sometimes it happens that they do not include this if they set up the service. Please report this, and report especially the requirement that you need such a PID to your repository admin, or the data curator, or your institution, whoever is responsible for your connection towards a repository.
13:52
Another thing, of course, that comes with FAIR is adding rich metadata. We know that depending on the discipline, this can be very complicated and time consuming.
14:05
Again, at the moment, if there's no standard recommended by the repository, please go with the good scientific practice and provide as much as you can as possible. And keeping in mind what would a fellow researcher, PhD, whatever, do within the next 5, 10, 20 years if he or she had to work with my data.
14:31
I guess with that maybe in mind, which metadata can be covered. But of course, there's a lot of work going on at different disciplines at
14:42
the moment to have here some more support in place actually as soon as possible. But again, some disciplines are not there yet, and there has to be also you can consider, maybe encourage colleagues yourself
15:02
to take part in those groups like the Research Data Alliance, where community initiatives to further put forward rich metadata standards. And vocabularies and ontologies, of course, are also included here. And maybe have some time to participate there or at least
15:26
look them up and see what others have been doing around the globe, because it can really help address this issue. Yes, and last question, maybe even more complicated. What would a researcher from a different discipline
15:42
do if he or she had to reuse the data by the metadata information I provided? So that's also not a question to maybe get up, take a different angle on things, and maybe ask a friend or colleague to look over if he or she is from a different discipline.
16:04
And of course, if the data publication here asks for it. You do not have to do that with all the data, but at least for those you're going public with and they're going to be reused, we would recommend actually to get on and try to
16:28
keep those aspects in mind when you publish data. Some examples, I think it's a bit more, it's nicer when you have some good examples here. So this one is
16:46
from the proteomics side, so we have here classical research paper in Giga Science, this journal back here, classical article format.
17:03
You recognize it immediately. And there's also a Giga database behind it. This is referenced by a note in the paper on the availability of the supporting data. And here, again, we have a local identifier in place, which is mentioned in the paper. And in
17:26
addition to this local identifier used by this repository, they also use a DOI for each published data record. And this is in the reference list as a classical citation here. So it's reference 14. Again, with the authors, the title of
17:43
the data set, the publisher here, the Giga Science database in 2017, and again, the DOI in a URL format, so it's machine actionable. So not only for a human, but also for a machine. For example, this connection can be
18:05
made clear in the DOI metadata, ideally, and you have here this nice data citation in place. And many journals at the moment, plus, for example, as a publisher, they also go this way. So you
18:25
can already save your DOI in the beginning, so before you even have the DOI for your publication in place. And they are automatically linked, so not only human readable, but also machine readable,
18:42
linked before the actual publication of both. So paper and data publication is done. So on a technical point of view, it's not a problem. It's all possible. It's also possible with using other identifiers other than DUIs. That's not a problem, but it's just also a formal issue for journals, for editors
19:05
to actually support this kind of data publication format. A bit more in the PID background, so which ones are there, how they should
19:22
be used, and how they can be used for research data. PIDs basically are everywhere. We mentioned now the DOI frequently, but there are many more. So maybe I can do a short round of hands up. How many of you do have an ORCiD account?
19:47
Okay, about half, 50 percent. Okay, thank you. And isn't there any other research ID? One, okay, two. No, it's also for can be persons. Yeah, but yeah, it's both. That's why it's both.
20:11
So we have others, like a Scopus research ID here by Thomson Reuters, and ISNI actually can be both.
20:22
We have organizational IDs and funder IDs, like GWT, like FundRef. They index organizations. Ringgold identity, for example. So this is an example for the funder side and organization side. And then, of course, we also have resource IDs.
20:47
So they can include APIC handles, for example, which is not your local handle server, but which is actually a handle server that is managed by the SAP Göttingen.
21:02
So here they offer, for example, also APIC persistent identifiers for eUDAT. So it can be compared to the UI system. Of course, we have handle.net, we have the URN service, and we have many
21:21
more, like especially for some, let's say, disciplines like PEESH, persistent identifiers for cultural heritage entities. And, of course, there are many more. I was told I should repeat the question. So the question was how safe is the usage, more or less, of Orkut, because it's still a company.
21:49
If they sell out to another organization or a private one even. So Orkut at the moment, it's a non-commercial organization based in the U.S.
22:04
And I know the Orkut people quite well and their mandate actually is, so no guarantees here, but their mandate is not to be sold. And their mandate is actually that researchers can keep full control on what is on their Orkut ID, what is displaced publicly and what is not.
22:28
So there are many Orkut profiles out there. I would not recommend it to be that way, but there are many Orkut profiles out there where just the Orkut just refers to a name or an acronym even and displaying nothing more.
22:44
So you can still use this Orkut ID and still your research profile can be kept up to date, but just for yourself, for no one else or only for the working group or the institute you are in and not beyond that. So as a researcher, you may argue you want your papers to be wet, you want to be
23:04
visible, you have this competition in place, so you want to be, to a certain degree, a public person. And that's also why we think, again, commercial players like ResearchGate, for example, are so successful because they actually sell their data to others.
23:25
And with Orkut, on the other hand, you have this non-commercial background, they cooperate closely with other DOI agencies like Cosref and DataSight. So it's a very close cooperation here. They are all players who want to, who say that the data behind the PIDs, actually
23:55
that is public domain, but with a personal profile like Orkut, the researcher can decide what publications are going to be displayed, actually.
24:07
But as soon as you go public with a paper, with a data publication, of course the public means public, so it means the metadata is open. So I would argue that way. I would argue they do not have to put their personal details in, there
24:22
is no field for displaying an age or the country you were born or your email address or stuff like that. You can add your email address, but you do not have to have an Orkut that is kept private. And according to German law, let's say it like that, of course the data you submit also to Orkut will stay.
24:49
And if they change their regulations, or going to change their regulations, you are still free to say, okay, I delete my profile and I do not want my data to be reused in any way.
25:01
So Microsoft, they started AI methods and stuff like that, it is much more automated already and there the researcher can be printed more or less, it is fully readable and with Orkut the advantage is actually that a researcher can still maintain
25:31
control over their data that is placed behind an Orkut profile and displayed behind an Orkut profile and have more control on this side.
25:48
Okay, so thanks for the discussion so far, very interesting. But that's not all. There are even more PIDs, who I would have thought. Okay, so we are talking and there have been some startups and some new, I
26:04
would say, GitHub groups and some excitement going on about PIDs for projects, of course. PIDs for projects run three years, five years, ten years. They have a URL, they have some information about the project, of course,
26:22
and they say, hey, why not provide a persistent identifier also for a project, index them so that they can be found again. And maybe some of you had the experience when you try to look up old resources, old projects, and the sites are not maintained anymore, you get your 404 because the sites are lost, the servers have been switched, whatever.
26:49
So with a PID for a project, you would still have to meet the data information in theory, and still in place what the project was about, who was responsible, who set it up.
27:02
And so this information wouldn't be lost, again, along with the website, what is actually often happening today. So this is one aspect. Then there's a PID discussion going on for instruments. There is a very active RDA group, an RDA interest group on this aspect.
27:23
So instruments, especially climate science, for example, you go on a ship on a research vessel and you set up a complicated system of many different instruments, you connect them together and
27:40
you use them to collect your data, which is based on public funding, public money, of course. And so why not get instrument IDs that are not specific for certain companies, but that are specific for certain research disciplines.
28:01
So you can, in your group, you can describe this instrument with a certain set of metadata and you can easily reuse those metadata and provenance information when you, for example, try to submit your data set for publication. So that it's connected together. So here, that's another development that's going on at the moment. What is in place already
28:26
is a PID for ship cruisers. So again, for research vessels, there are many countries, of course, doing expeditions to faraway places. And they have been for a long time now recording, of course, all this
28:45
information about the ways they're going about the stations, the sample stations and so on. They are visiting how long they stay at a station and so on, all the geographical records associated with it.
29:00
And they are now trying to harmonize, actually, those kind of information and how it's going to be used in research data and, of course, also article publications. So that, actually, when you read a paper in the materials and methods section, you no longer say, okay, on the cruise of Polastan in 2010, we took samples in the Arctic Ocean.
29:26
Okay, the Arctic Ocean is big. Where exactly did you take those samples? What kind of ice shelves were there? So when you have been there, how many days, how deep did you sample or something like that?
29:45
So all those provenance, also information, all those metadata, they should be included also in this, behind this PIDs, actually, for ship cruisers. So it's, yeah, it's another development going on and there are some communities where we're active behind this.
30:05
Another one is for physical samples. For example, for a field station, if you're a geologist and go out to sample, like, minerals or something like that, you can set up a field station and you can describe this field station using an International Geo Sampling Number, an IGSN.
30:29
So there's also a PID for that and that whole community also is very active to maintain and to take care of those PID systems behind us.
30:43
And of course, we already talked about, there's also a new development to provide PIDs for data management plans. So there's an answer. We asked in a conference last year, do researchers really need to care about PIDs? So it was a conference coming, many repository admins coming together and they asked,
31:06
is this really important for the researchers themselves to know about persistent identifier systems? And we think it's safe to say yes. But what do they need to know? So what does a researcher actually need to care here?
31:21
Because he or she wants to focus on their research and not so much on the infrastructure behind it. So PIDs actually are infrastructure. That's very important. It's a service, but it's an infrastructure aspect. It's like the power from your power plug, it should be there all the time. You shouldn't actually care too much
31:46
about it, but you need to have a certain awareness, I guess, how to use it, what's good use and what's not. We did a survey last year across 1400 scientists in the natural sciences and in engineering
32:04
across Germany, and about 70% said that they are using DOIs for their journal publications. And I thought, of course, it's a natural thing to do because our journal or the editors, they are providing us with the DOIs and we are familiar with the DOI system coming from that side.
32:25
And about 10% were also familiar with the use of digital object identifiers for research data. And even less were familiar with other persistent identifiers for research data.
32:43
So it was the first trend that maybe DOIs and a few others maybe are going to be the ones most frequently used for research data. Then we asked why they are not so familiar, so why didn't you use PIDs or DOIs here especially for other objects, for other publications.
33:11
And over half of them answered that they didn't know about the option to use DOIs for other publications. And they said, oh no, and we do not want any counseling services. We are
33:23
trusting our libraries, we are trusting our journals, our infrastructure providers, that they know what's best. And when they say we do not need a DOI or not a PID for a paper, then it's fine that way. So we do not care, it's not that important for us. As long as we have our DOI for our article, let's say it like that, that's the most important thing.
33:46
And that reflects some of the situation we still have of course. We are still struggling also as infrastructure providers to make it as easy and as natural as possible within the research data workflow,
34:00
within the research workflow at a home laboratory, at a home working group actually to use those PID services and not even know that you are using them. And now I would also invite you to have a look at the we3data.org
34:21
if you're not familiar with it. It's a registry of data repositories around the globe. And if you look them up and just do a quick search, you will see that there are over 2000 repositories listed there. So that's quite a lot.
34:43
If you are a researcher in a certain discipline and you have to pick, let's say, for your discipline out of maybe 100 data repositories available, it's already a tough, a very tough choice. And even 50 is still a tough choice. So when you look them up, you see them.
35:04
But if you search specifically for the ones who include PID services, they will only be about 800 of them, of the 2000 who include PID services. And if you tailor down that search even further for, for example, a DOI or an ARC as a
35:26
physicist, maybe you want to use an ARC as a persistent identifier because it's more common in your community. Then you are down to only a couple of choices maybe. So depending on the discipline and the PID services offered, you can narrow it down quite quickly.
35:46
And then you, if you have some, maybe a closer look into the data options they offer. So if they offer an embargo, for example, if you need that, or if they offer certain licenses for your data publication, then you come quickly down to the repositories, which may be most suitable for your choice.
36:13
So that would be one way to, how to include a PID in your actually data management, in your choice, how to choose a good data repository.
36:32
So it's just one way, but we always recommend W3Data because it's at the moment, it's the most up-to-date information on data repositories available globally that we have.
36:47
Okay, continuing with the PIDs, so why they are important part of the FAIR data principles. What is behind the PID? Maybe here in a little bit more detail. So PID, if the
37:04
metadata is maintained well, can provide a lot of provenance, of course, information behind a digital object. So not only behind research data, but behind any digital object they include. And as we heard before, it can also be a
37:25
description of a field station, for example, it can be a description of a ship cruise and many more aspects of the research workflow. So with provenance and metadata, a PID always also touches on legal aspects, for example, data policies and guarantees.
37:49
So metadata information on research data also can include, for example, licenses, it can include aspects of data metrics,
38:08
also it can include here references to other publications which are important for the usage of this particular data set. Or data resource we are describing. So there are many aspects here and we try to narrow it down here
38:29
a bit on these central points, what you should remember when it comes to this information and the PID concept.
38:40
So provenance means that we have validation and credibility, that a researcher actually should comply with good scientific practices and be sure about what should get actually a PID and what shouldn't. Some data repositories and institutions have very focused and strict data policies in place
39:07
to also to use, for example, different kinds of PID systems for different purposes. So one example in climate science is what is sometimes used, is that there is a handle provided for
39:23
the data set or the data package as a whole, used for modeling, used for first analysis of the data, but actually the data publication, so what is mentioned in a paper later on, is only a part of it.
39:40
Only this small part of the whole data package actually goes into the public repository system and gets a DOI here. So they use different kinds of PID systems for different purposes. But again here it strongly depends on the discipline and on the community and also maybe on your
40:05
local data policies you have what the requirements here would be. So this is the aspect about provenance. Metadata of course is central to the visibility and citeability of a data set and they should be provided with consideration.
40:28
The policies behind the PID system, I guess that's not so interesting for many researchers, so just say they ensure the persistence in the World Wide Web. That's basically the key point behind the policies of many, many different PID systems.
40:48
At least the metadata behind a persistent identifier should be available for a long period of time. So this is one of the key messages.
41:03
The different PID systems and agencies they support or what means long or what means persistent, they are not harmonized that way, so they interpret different time spans here, so it's basically up also to the repository administrators to ensure this metadata availability behind a PID system.
41:31
So here again it depends strongly on the policies. For the machine readability, they are of course an essential part of the future discoverability of the data set and it's always good to have
41:48
a check for example what metadata formats are provided behind a PID system and if they can be reused. Just behind a data site UI, persistent identifier system for example, you can use the API which is
42:06
open to get the metadata in many different formats if you want to index them. So you can get it in JSON-LD, you can get it in XML, you can get it in RDF XML as well. So there are just three possibilities how you can get the metadata information. So it's quite flexible but not all persistent identifier
42:28
systems are that flexible. So if you're really interested in that or maybe setting up your own discovery index or anything like that, just keep in mind that there are different styles let's say out there and check what metadata formats you would like to use.
42:47
For the metrics, we touched that briefly. So they are supported by many PID systems but they are still at the very beginning and there's not a common system which is actually
43:03
widely accepted by researchers or communities when we go to repository administrators or something like that. So there are many different systems out there for the different kind of metrics you would like to carry out. But there are some projects
43:25
I would like to mention. One here, the Making Data Account project. It's based in the US. It's led by the California Digital Library I guess. And DataSite for example is also participating here and it's one of many projects in this
43:48
whole metrics section. But here we'll have to see what will come here in the following years. Also of course there's for example in some current research information systems, some CRIS systems, like here at TIB we use Vivo. They now
44:11
include metrics in the form of the small common donuts offered by Altmetrics. So you will see maybe some of these donuts also out there.
44:24
But of course it's not focusing on data publications much but basically still on research papers. But maybe there will be a similar development when it comes to data publications. But we don't know yet.
44:43
PIDs provide interoperable metadata. At least most of them do. And most of the PID providers also talk to each other to get a lot of harmonization in place. And one example of this harmonization is the close connection between an ORCID
45:03
profile of a researcher and the DOI system offered by Cosref and DataSite. So we have a collaboration going on between those three PID providers to actually share
45:20
their metadata and to get an auto-update function which is completely optional for the researchers. Where they can say that the ORCID profile should be updated automatically if a new publication is accepted and is published for example by a journal.
45:44
And I would say 95% of them use a Cosref DOI. So the possibility that your next paper is going to have a Cosref DOI is quite high. And if you have also an ORCID profile you can authorize this collaboration basically to auto-update your ORCID profile if you wish so.
46:14
Exactly. And so here for this example the choice would go again for the DOI
46:23
system. But again maybe also other PID providers will join this collaboration in the future. So there are talks going on to the whole EPIC handle community and of course there is also exchange for example from the
46:45
EUDAT project with those three going on. So there is a lot of harmonization efforts and so on going on at the moment. Talking a lot about DUIs, an example for structure of such a DOI can be seen here. Basically it consists of
47:08
a proxy which is the same all the time. And the prefix that is specific for a certain repository for example. It's also always the same structure. It starts with a 10 and then it can include a
47:28
number or even it can indicate a name or something, an acronym for the data repository itself. This practice to include a name here is often used for visibility aspects I would say. It's not recommended actually because the name
47:49
of a repository can change. Especially when you consider the persistent aspect of a PID and including a DOI. So persistent means who does guarantee that a repository will use the same name let's say in the next 15, 20, 30 years or something like that.
48:08
So as a DOI agency we recommend to use a neutral format here for example, a number combination. But again it's up to the
48:21
repositories to choose. So it's not up to the scientists because this is not up to negotiation. You cannot decide which DOI you would like. In general it's automatically created and it's given to you by the repository or by the journal you are using.
48:44
The last part here, that's the suffix, that's the part that is specific to you and unique of course to the data source, the resource type you're referring to with this UI.
49:02
Yes, I should also mention that it's an internationally recognized and supported standard since a couple of years. And it actually means that the DOI always refers to the objects themselves and not to the location. That means
49:23
that the URLs behind the DOI, they can change and they change all the time, they change quite often. So as a repository admin you have to take care that actually the reference behind the DOI is up to
49:40
date and so that your clients, the researchers, your customers, whoever, they do not get a broken link and they are update checkers, they are link checkers for that. So it's automated as well. So that's usually not the problem.
50:01
The DOI service since 2005, as I said, TIB provides the DOI service and we were among the first ones who started providing the service initially in cooperation with the German Center for Climate Computing in Hamburg, the DCARZ.
50:23
And they were the first to approach us and ask us, hey, we have a data set here and it goes into the IPCC, the International Panel of Climate Change report, and we want to cite that. We want to make a reference to this data set and not only to other publications.
50:44
So how to do that? Back then there were not many data repositories, least of all public ones, let's say, like that in place. So the DOI system actually was adapted. It was used by journals back then already. So we adapted
51:03
it and the data site was founded together with other funding members among them, for example, also the California Digital Library. And we then came together, the British Library was also among them, and they
51:20
came together and decided, okay, we put an organization behind this DOI referencing thing. And with the time, there also came the necessity to adjust the metadata behind the DOI system because not only to include research data, but also other resources that are produced in the research life cycle.
51:49
STIB, of course, we have still here a focus on the science and technology part as a DOI agency. And in Germany, it's structured like this, that we have five DOI agencies in Germany.
52:06
I guess this is among one of them. The National Library for Economics is another one. The National Library for Medicine is, again, another one. And here we split up, actually, according
52:24
to the disciplines and provide this UI service within TIB for academic customers, let's say, like that. So for academic repositories, which are financed at least 50% by public fundings, this DOI service is for free. So if you set up a repository
52:44
at a university located here in Germany, and you are funded by public money, at least by 50%, the DOI service is completely for free for you. So you get all the support. We have here a support service behind that. And you can apply for
53:05
a prefix or one or more prefixes, and you get the support you need to use this DOI system. Yes. Some of the services that are out there is, for example, managing the prefix assignment, then, of course, first-level support.
53:28
So if you have troubleshooting, technical issues using the Data Site API or something like that, we provide first-level support there. We do also training and counseling services as a DOI agency, and, of course, provide access to the registration platform, which can be accessed
53:54
also manually, because sometimes some repositories only have one to three data sets a year they want to publish and provide a DOI for.
54:03
And sometimes, of course, you have 50,000 a month. So it really, not 50 maybe, but yeah, maybe 10,000 a month that happened before. So it really depends on the system you want to use.
54:20
Yeah. And how does it look like? So you have the International DOI Foundation on the top. That's the global, let's say, managing board for getting a DOI. Then you have the global agencies like COSREF or Data Site here.
54:40
And you have on the national side for the countries and here in Germany for the different research areas, you have the registration agencies who then provide support for the clients.
55:01
Yeah, a bit shortly on the numbers, maybe not so important from the researcher side of you, but maybe if you want to give this information back also into your institution or into your research data team and so on. About 1.3 million DUIs have been registered so far by TIB. And here's some
55:28
of the distribution. So most of it are actually research data sets. We have some for query literature, for example, this includes cruise reports and funding reports for German funding agencies.
55:47
10% for images and 0.4% for audiovisual media. And there are many external clients also who use the DOI service. In total, we have 165 data centers, but the numbers of course change daily.
56:08
So we have one to two new data centers a month coming to TIB, one could say. Most of them are university libraries, some are of course also research institutions from different hamlets.
56:27
So Leibniz are only two examples. We have Max Planck, we have Fraunhofer, so many more. And we also have of course data centers from other countries. Here's the policy that we actually recommend
56:41
that they have a local agency in place, so country specific, but of course that's not always the case. So within Europe or within the US, we also have of course cooperation agreements actually to have clients also of course from data centers from other countries until they have their own DOI agency nationally.
57:13
Within TIB, of course, we also use the system. So our own cataloging team, they use it for virtual digitalization, for example.
57:25
And in the TIB portal, we have reports, we have audiovisual media, so we use the system for our own as well. So to sum up, PID 101, what are the essential aspects about persistent identifiers for
57:47
researchers, and also resolving some myths about what is a PID and what is not. The definition, the most common one that is agreed on is that PID is a long lasting reference to a digital resource. It can be really any digital resource, so it's not focused anymore on an article publication or on a data publication.
58:11
It can be a video, it can be a digital description of a field station, it can be a lot
58:22
of things. So it's not, again, it's not only a PID, it's not always references to research data or article. There are different sorts of PIDs, as we have shown, and different use cases. So it really depends on, of course, the type of TIB you're looking
58:45
at, and you should check if the PID is the one you should use for an article or data or persons or organizations, field stations, and so on. And it strongly depends, of course, on the intention on the organizations which offer the UIs and on the metadata schema behind PIDs.
59:15
They are offered by organizations, so if you want one, just ask your local institute, your
59:24
library, or for example, if you submit a paper, they are offered mostly by the journals. As a researcher, however important, you do not have to pay for a DOI. You should never, ever, ever pay to get a DOI by yourself. You shouldn't, because that's the job of your institute, of your library, of your
59:47
publisher, where you are going to publish your paper or of your repository, where you want to submit your data publication. So you do not have...
01:00:00
It's not intended, no PID system is intended to be used individually by one researcher, it's not the case. Even if some, maybe some, there are some other organizations or there's someone telling you otherwise, that is not the original intention
01:00:21
and was never, is not supported by the big PID agencies it's still, it's just not the case. So PIDs are mostly used for persistent citation. That's not a key aspect and it's all published resources should have one,
01:00:40
not all have one as of yet, but they should have a PID. It doesn't have to be a DOI, it does not have to be an ARC or something like that, a widely accepted by the community, it would be nice if it was, but it does not have to be, it should be, but it should be a system, organization, a library that is supporting an institute,
01:01:03
that is supporting this persistent identifier. Next one, very important, a good citation should always include a PID and a PID does not really mean a URL. It can mean a URL if there's no other option
01:01:21
as a personal note, but if there's a PID system available and if your object of course has a PID, you should use it in the citation and you should adjust your citation or your reference tool, for example Zotero or many, many others,
01:01:42
EndNote, whatever you use, that PID is always included in your reference list, very important. And a good journal also should check that. Next one, metadata behind a PID are very, very important,
01:02:00
so please take care when providing them or ask your data curator or your institute or your support team to help you take care of that. PIDs are not perfect, PID systems are run by humans, so we are not perfect, maybe machines are, but we are definitely not.
01:02:20
They are issued by organizations, so please bear that in mind. There can be a 404 behind a DOI, actually they are. So the systems are not perfect, they hope, they try to improve, they try to collaborate, they try to do it as good as possible,
01:02:40
but if you have really, of course, if you have issues, if you have suggestions to your repositories, to the persons, to the data creators, to the persons involved here when issuing persistent identifier services, please contact them, feel free to contact them.
01:03:01
I guess most of them would agree that they can improve their services and some issues can be solved really quickly. So we have had an example of a data center where about 150,000 data sets from one day to another were not long reachable, so you got also an error here,
01:03:22
and we looked at it and we said, hey, what's going on? We picked up the phone and the issue was solved in two hours. So they just forgot to update the URL behind those 150,000 data sets. So it can be solved very quickly,
01:03:40
but someone has to say here is an issue and then we can find a solution for it. And last but not least, PIDs are useful and also are fun to use and fun to, if you do some metrics on them or something like that, play around.
01:04:00
It actually can be quite a lot of fun. So yeah, if you have some time, just see for yourself and they help to make your work more visible. So some aspects, yes, now you had the one list. Now maybe the list you do not want to know,
01:04:20
you do not need to know, maybe some of you want to know or you want to have it in the back of your mind. Like something like I showed, like total number of PIDs registered. While numbers are always nice for institutions, as a researcher, maybe you do not need to know them
01:04:40
that much, at least when it comes to PID systems, the names of the agencies, it should just work. So depending on the repository you use, depending on the journal you submit a publication to, it should work that you get your DOI. How the persistence works, so that's still an issue they're struggling with
01:05:00
as a PID provider. So one of many, but you shouldn't care about the infrastructure behind it, you shouldn't have to care about it. How they fight each other, why one say they are better than the other or something like that. Of course it happens, also at conferences and so on.
01:05:24
And how perfect the PID system is because they are certainly not and there's room for improvement. And yeah, one can keep that in mind, but it's not that important. So you should care about, as a researcher, you should care about your passion, your research,
01:05:42
and as long as you are not in the information science yourself, then yeah, it should be the job actually of the PID providers to focus on communicating the most practical points. And the sightability and visibility aspect
01:06:00
as a benefit for the researcher should be crystal clear and we will have some more examples now coming up in this week to actually show you the goodies of having PID system in place. So we like to refer to PIDs, in our case STIB, DOIs, as the glue,
01:06:22
the glue that can keep it all together so the different digital CVs, the current research information systems, the different data and other kind of digital repositories you have out there, the discovery indices,
01:06:43
the researcher profiles, like here in our case, that's a Vivo example. So DOIs and PIDs in general can be one way to make it interoperable, to keep it together and actually to have open APIs
01:07:03
or something in place where you can actually use to build up your own system that is harmonized in one way or another and can be addressed more easily. Yes.
01:07:21
Yes, there's still the issue of, well, what are other formats, let's say, using also persistent identifiers and one that has to be mentioned here, of course, are data journals. So for data publications, again,
01:07:41
we have, there's a possibility now for some disciplines like Nature Scientific Data or Biomedical Data Journal, and there will be many more popping up in the next years because besides repositories, that's also a way, and they, of course, they act as both as a,
01:08:03
they can act as a data repository, but as a journal with a journal function, so with a report actually on the research data or something like that as well. So it's one more, let's say, it's one more source actually to use the UIs of PIDs,
01:08:23
and here it's going to be pretty nice to see which, just as a personal note, which the UI agency, so Datasite, Cosref, and so on, will take over here and will take over the PID, will be the PID provider for this new kind of digital publication format.
01:08:47
Yeah, and with that, we come to the more practical side. Exactly, and back to the software as well. So first round of questions, who here uses GitHub?
01:09:00
Okay, almost everybody, 2 1 3rds, almost everybody, okay. And who here has used Zenodo? A few, two or three people. Okay, we have prepared a little demo, and it's very live, so we'll see how it goes. The main gist here is that there is an official
01:09:21
integration between GitHub and Zenodo that comes from a project of the Mozilla science lab called CodeMeter. We heard just now that the PIDs are usually managed by an organization, and the customers are, on the one hand,
01:09:42
the users who want to find a resource, but also the institutions who are hosting the resources. The DOIs are so-called minted persistent identifiers, but there's also a different type of identifiers, and they are called intrinsic. So they are generated from the object itself.
01:10:01
And on GitHub, as you probably know, the Git hashes, or also in the Git system generally, the Git hashes are such unique identifiers which are not assigned to a resource, but they are generated from the system. So therefore, always uniquely identifying something. And therefore, the question of persistence then is shifted
01:10:22
from, in the DOI system, for example, you have these contractual obligations of the organization who has the resources to update the URLs. We just heard from one example where they did it a bit later than they should have done. So this stuff happens, but there's also a technical level where, for example, you can do the redirects on your server
01:10:43
or you can use these Git hashes, for example. And this integration between GitHub and Zenodo now takes advantage of both, basically. So you have, on the one hand, the Git system with the intrinsic identifiers, and on the other hand, this academic preference for having DOIs.
01:11:03
So you can simply sync the two accounts there, or the two systems, and then get for your GitHub repository a DOI minted in addition to the intrinsic identifiers. So that's why this speech bubble here is there. If you're in Rome, dress like the Romans. So if the DOI system is more accepted
01:11:20
in the academic world, here is one option to give your code a DOI. So we're going to have a little demo in a minute of that. And here, this is, Zenodo is, as we mentioned already, a rather generic research data repository. So it is not discipline-specific, which in turn means that they cannot provide you
01:11:42
with too much predetermined, for example, metadata fields that you should fill out. So here, it is a question of our own responsibility. We have to provide these rich metadata so that the DOI actually becomes useful.
01:12:02
There's one alternative that is being well booted up currently, and has also recently opened its doors to the public. It's called Software Heritage. And what they are aiming for is, for example, ingesting all the source code from GitHub, from, I think, the Debian repositories and several others.
01:12:24
GitLab, for example, is also an up and coming, or already quite popular, code hosting service. And yeah, on Software Heritage, the goal is to have all this public source code available as well as a copy, and also in a rather persistent manner
01:12:42
and citable through that resource as well. Should the original copy, for example, on GitHub, ever be removed for any kind of reason, whether that's technical problems or GitHub being bought by Microsoft, yeah, whatever. Yeah, you've heard the news. I wouldn't be too worried about this,
01:13:01
but for much of the code, there is a goal of getting a second copy, basically, on Software Heritage. And generally, it's an interesting side note. There's a workshop going on, I think, on Wednesday from the JISC and the Software Sustainability Institute. So if you follow up on your own notes in this workshop,
01:13:21
please also have a look at the software preservation tag there at the block of the Software Sustainability Institute, because probably in the next few days or weeks, there will be more info available there as well. So as I said, live demo now. We will see how that goes.
01:13:43
And I have to explain a little bit there. I'm going to switch the slide view a little bit. So I have to explain the background a little bit here. For a different project, I recently, well, no, it's half a year ago, actually,
01:14:01
I started writing an R package, which had the lucky side effect that because we were preparing this workshop in parallel, we can use this R package for some demonstrations. And for example, I did not mint a toy for this repository yet, but we're going to do it. We're going to try it. I'm pretty sure it works, at least the process.
01:14:23
So we will probably not see the DOI itself because the nodal also might need some time to maybe review the application and review the upload, but we can set up all of the necessary steps. And yeah, we are here in the issue tracker from this project, from this repository.
01:14:42
And I assembled myself a little task list. So I tried installing this package already. So I'm pretty sure it technically works. So it is worthy of getting a DOI. I tested it in the Zenodo sandbox. And that's when I found out that Zenodo doesn't assign the DOIs right in the next seconds,
01:15:01
but there appears to be some delay. And then maybe tomorrow, when we talk more about GitHub and Git, we can look it up whether it actually works. So what you do when you have a GitHub repository that you want to have upgraded with the DOI first, of course, look at the official documentation here
01:15:26
from GitHub, making your code citable. They explained a little bit about the DOI. They explained that you should make sure that you know which actual repository you want to have with the DOI. And then you're supposed to log into Zenodo.
01:15:41
So let's try that. Yeah, there's a login, either from Orchid or from GitHub, but you can also set up a specific Zenodo account. I'm going to use my GitHub account now.
01:16:03
There we go. This is my email address here from the library. And now I am switching citable code, GitHub. Let's have a look at step two. So we authorized the application.
01:16:20
Now we need in Zenodo to pick the repository that we want. So we're going to the GitHub menu item here. And this page is a long page, so we will just jump to where it is there.
01:16:43
So this is the repository, and we're just going to switch it on. So get started. We're almost ready, actually. And the next step in our decidable code guide
01:17:03
is to check whether the repository has this hook enabled. So that is in settings. It was webhooks, right? Webhooks, yeah, here it is.
01:17:21
Looks good. And now creating a new release. You know this terminology from the software world, I guess, a release or a release version. And in the Git system, it works by tagging. We can talk more about this tomorrow.
01:17:41
We generally will talk more about Git and GitHub tomorrow, so some of this stuff doesn't maybe yet make sense. But yeah, Zenodo is now basically waiting for us to tag a release. And the release we can do on GitHub. There's a form for this.
01:18:00
And because I had prepared my little task list, I want to look it up again, see, oh. That happens when you move a repository to a different organization on GitHub.
01:18:20
Your old bookmarks break. But this will no longer happen when we have a toy, right? So we have switched the Zenodo sync on. This is good. Now we're going to tag a release. And because I had already prepared some version numbering here, which we will also talk about on one of the next days,
01:18:43
this package is at this version, 0.42. So we're just going to use that to create a release. So there was a form that was here, and there's some helpful suggestions that if you want to use the version number here,
01:19:01
just use the V and the version number. The release title, now we're coming to this rich metadata aspect. What should the title be? I'm going for something simple, and we'll just say Zenodo DOE. And we should describe this release.
01:19:22
Again, that's a kind of metadata. And as I showed you just now, because I had already compiled this list of changes, so basically the change log from one version to the next, what has been fixed, what has been changed, I think it's a good practice to include the release notes into this release description.
01:19:42
So it is just in markdown format. And because this is the first release that I'm creating to get this toy, I will also include the change log from the last versions. So we can close these intermediate pages.
01:20:01
And yeah, people will then see, ah, this release includes this and this fix, this and this has changed, or there seems to be a page or a line break here in between somewhere. Ah yeah, so some minor reformatting here,
01:20:23
which of course did not happen in the preparation of this demo. There's always something, isn't there? So does it look good? I think it looks okay. And then the last piece of meta information here is a binary check. Is this a production ready release or is this a pre-release?
01:20:41
As you've seen in the version number, which again, we will talk about more in the next few days, it was a zero point release. So I am going to say this is more like a pre-release version still, because it is ready to be used and tested, but it is maybe not completely mature. And also in addition to this version number here,
01:21:04
I'm going to append the beta tag. So these were, yeah, this is the form basically. It's not too many steps, I would argue. And we're going to press publish now. No, we're going to switch to the issue again.
01:21:24
Yes, I've tagged the release. I've copied all the release notes. So I think my plan is fulfilled. So there we go. The pre-release is on GitHub, but is it also on Zenodo already?
01:21:42
So we're going to switch to Zenodo. Okay, I have to log in again. Oh, there we are. Oh, interesting. In the test, it took several hours of processing, but okay, our door is already here.
01:22:01
This was even better than I expected. So there we have it. This is the publication date. We have just published a software as open access. It is available on GitHub. So if we hop back, we will get to the GitHub overview page of this project exactly at this version here.
01:22:22
So this is a release tag. Where is it? I'm not sure why I can search it now, but it is this specific point in time when I made this tag. So as this project evolves, this link from Zenodo will still point to this old time point.
01:22:42
But of course, users can always switch to the master branch. As you probably know in Git, the master is just the default branch where all the work happens. And in this case, the 0.4.2 beta tag is also the most current tag. So that's why it's displayed here.
01:23:01
And now the only thing that is left to do basically is update the code itself with this DOI. So there's a nice shortcut here, get the DOI batch. We will use this markdown code here, just copy and pasting it into the README file.
01:23:21
So we have a README here. We're going to edit it. And there's already a batch for the lifecycle, which we can leave. And I will just add the DOI batch directly below it.
01:23:41
Yeah, the commit message. That's something I had not planned in advance, but I think get Zenodo DOI. Oops, I think that's the correct spelling.
01:24:07
You have to be really complete with your metadata. Also commit messages are metadata. And also I can demonstrate one nice little feature here in GitHub or in Git generally,
01:24:21
if you type fix and then the issue number. I wanted to demonstrate this on the next days, but we're going to do it now. It was, I think it was a 14, right? Yes, it was a 14. Then this will automatically be closed, this issue.
01:24:43
So I think there's no need to add more description and to extend this. I mean, we're getting a DOI, we're adding the DOI batch. Let's do it.
01:25:00
So there we are. And if we go back to the main page of the repository, people now see that we have a DOI for this. And from there, they can also get back to this nice landing page. So what we have now done basically is created a persistent identifier for a repository in GitHub.
01:25:21
That means Zenodo has grabbed a zip, copy a zip archive of exactly this point in time. So it's not the entire Git history for this project. It's just the current set of files. You can browse through the list here. We have this description, which in this case is simply automatically synced
01:25:40
from the GitHub release page. So it's the same thing. I mean, these issue numbers don't make a lot of sense in this case, but it's probably something we can edit here. I haven't, in the tests, I haven't got to this point. Yeah, and then it automatically recognize
01:26:02
that this is software. It set a publication date on Zenodo here. I can add my ORCID. I think the ORCID will not be synced when I put it in GitHub. So I think I should add it here. So it's, did anybody take the time by the way?
01:26:21
I mean, this was now maybe 10 minute process, including all the descriptions and explanations. The affiliation, yeah, that's, well, I guess it would be more like this. So apparently the affiliation is automatically read from the GitHub organization, which in some cases will be correct.
01:26:41
In some cases, it will maybe be a bit too squashed. Yeah, here's the description. So I think I'm going to edit it. The language is interesting here, of course. I mean, they're meaning the human language, not the programming language. And yeah, here would be the possibility
01:27:00
to add more keywords. Do we have time for this? So we have one more hour. Should I demonstrate all of this form? Because I think it's pretty straightforward. So, but I can show you which keywords I had already set in GitHub as well. They're called topics. You can add topics.
01:27:23
There's even some suggestions here. And because in this case, it is a microbiology biobank, the bugdive database, the client, therefore the package is a bugdive client. So it's written in R, it's about microorganisms,
01:27:44
it's a bacterial database. And so all of these things I already tagged in GitHub. So I'm just going to copy and paste the keywords. Not sure how this works. Maybe it has to be, it's probably has to be one after the other.
01:28:16
This is not the most interesting part of the demo, sorry.
01:28:23
Actually, I would expect to have some kind of auto-complete here. So maybe if I write bugdive, I will edit. The access rights, yeah, are open access. And the license, it is another one in this case,
01:28:42
because it's a software license, which we will learn more about in the next few days as well, on Friday specifically. Yeah, there was no grants in this case. So I guess I could remove this related or external. I don't have to re-identify this. Well, here, we exactly have one reason
01:29:04
why the DOY system was created, right? We have of course the domain, we have the top-level domain, we have a group name on the GitHub platform, and we have a project name on the GitHub platform. But here, the question of persistence is basically answered by the people managing it.
01:29:20
So a repository can be renamed, a group can be renamed, renamed GitHub code, I don't know, close. So all of these are reasons why a URL would break. It's not a natural law that URLs break, but some people decide to break a URL, unfortunately. And this is also something we have to fight
01:29:42
to be honest, but for now we have ensured that with a DOI, we at least have a copy and a copy of the metadata, a copy of the actual software code, but the related identifier would be the GitHub intrinsic identifier and the Git release tag in this case.
01:30:00
So the explanation here and the note was that the reason Zenodo recommends filling this out is that you want to have the software in this case or the data set as well, maybe in your cases, linked back, for example, to the paper that you published before. We saw an example where a paper referenced downwards
01:30:20
basically to the data set, but then you don't really know maybe the repository where the data set is listed will never take care of referencing upwards to the paper that was of course published later, for example, after the data set. So here you have the possibility to do a bidirectional link and say, if I had already published a paper about this,
01:30:43
I would put the GUI of the paper here and then probably use the option sites this upload. So the resource here, the paper sites, my paper, so that my software, sorry, so that people who discover the software first can also find the paper or as we saw before,
01:31:02
the other way around is currently the more common way. And what Zenodo now will also do, I didn't mention it before, but every time I create a new release as I develop my software and release it under a new version, it will automatically grab a new copy of it. So I have a chain basically of DOIs,
01:31:23
one general DOI for the whole project, but also for each successive version as well. So then in this case, probably there's an automatic feature for it to add the DOI of the next version to the old records. And then Zenodo should, where was it here?
01:31:42
Select this qualifier for this resource that that DOI is a new version of this upload of the old version. So yeah, that would be one option here to enrich the metadata set considerably. And, oh, there's a lot more here.
01:32:00
We can't go through all of this. Contributors, they are in many cases also read automatically from the Git history. So what you will see in many cases when, for example, a larger R package will be uploaded to Zenodo, there's many authors, but it's the Git contributor.
01:32:22
So even somebody who has just fixed a typo will appear in the author list from Zenodo. And there's, of course, pros and cons for this behavior. But I think I want to definitely add my DOI here, my ORCID as well, at least.
01:32:45
No, sorry, I'm already listed up here, right? I'm not a contributor to my own project, but I'm the author. Okay, sorry, I don't need to do this. But if there would have been people who had substantially contributed to this, but maybe not by committing into the project, I could list them here as well
01:33:01
to enrich the data set in a way that Zenodo could not automatically ingest from the Git repository itself. Yeah. Okay, I'm not going to go through all of these here, but enrich the data later. So are there any questions about this part?
01:33:24
I'm going to do one thing. I will change the description to just the first few sentences here of the README
01:33:44
so that I don't need to paste the raw data here, but I can paste the formatted version already. Yeah, that looks a bit better.
01:34:06
Links are okay. Okay. Good. Oh, identify as required. Ah, so I didn't remove this empty field.
01:34:22
Sorry, now I'm saving it. Now it should be possible to publish. Exactly. So did this already create a new version? No, it didn't create a new version. So that's the only one we have currently. And as you can see here, there will be a version list if I do this repeatedly, but here there's a separate DOI here
01:34:42
for this project in general. So, and I can also use this, but the specific version has this nice batch here with a six one in the end. And it appears that the general version is six zero. So I was, I hit a round number. So the question was whether it was possible to update
01:35:03
as an auto record and have the same DOI. So in the metadata, yes, you can have some updates, but if I publish a new version, a new release on GitHub, then that new release will get a new DOI, but the here cite all versions DOI will remain the same
01:35:20
as long as this record exists. So you get a whole family of DOIs basically that are interconnected. When you go through the normal project life cycle of updating, for example, an R package several times, having new versions, then you get a bunch of DOIs. So the question was how to ensure
01:35:40
that the people get exactly the version of a dataset, for example, when they click on a DOI in a reference list. And it is by using these version DOIs, it's not the general DOI for the project. But in some cases you also want to reference the project itself, then you can use this one. But the node will push the version DOI to you.
01:36:01
This is the more visible and more convenient option. In order to have this time travel like feature in GitHub where you go into the different time points where the thing was published, where it was released in that version that you cited in the paper.
01:36:20
Okay, further questions. Okay, so then, wow, I'm very happy that this went so quickly, especially in the technical parts. This was a little stumbling block. We're going to close all of this.
01:36:45
And yeah, we're back in the presentation. So to summarize, the PID part, it is an abstraction layer that directs you to the actual locations of objects. That's what it is.
01:37:02
So, and therefore the underlying location can change for any number of reasons, which maybe we can discuss in the evening. And now we come to another exercise which will probably fill the rest of the day for us of how to choose a fair repository.
01:37:21
First, a bold statement or something that's maybe also very obvious because there is no really perfect or right repository. As my colleague mentioned before, it depends on which data quality do you have or does the repository require, it depends on the discipline of your subject,
01:37:40
your institution, your funder, they may have requirements as well, or they may be predetermined where you should put your stuff. Of course, reputation still is also a factor you will probably learn about useful repositories just from word of mouth from your colleagues. Some repositories will also help you
01:38:01
make your research visible, maybe simply by tweeting out a link to a new upload that you have. There's all many different ways where they can help you and some do, some don't. One interesting question is, of course, the exit strategy. So do they have, for example,
01:38:21
legal contracts in place with a backup location, for example, where they can transfer the data in case their own funding runs out and also has this backup strategy been tested? Well, that's also something you could ask the repositories. Then as we have talked about the fair principles a lot, making the data valuable
01:38:42
could mean that the repository forces you a little bit to, for example, add some more metadata than you might think of at first. And also there's a whole family of certificates which repositories can get for themselves
01:39:00
as a certification process. And some of these certificates will be based on a document review, some will be based on an audit where somebody comes here and checks and some will be based also on very hard technical things, but there's a huge variety of this. So in the end, you have to balance the different needs
01:39:22
of yourself, maybe of your colleagues, of the funders, of the institutions and so on. So it is not a super easy decision, we admit it. The Digital Curation Center has a checklist that can help you, as we saw in the software management plan,
01:39:41
can help you think about which considerations you should take into account. Yeah, but in general, of course, I mean, you're all here because you are interested in this topic, but still we want to evangelize a little bit and we want to promote the idea of data sharing through repositories, of course. I mean, you are basically handing over your data
01:40:02
to data curation experts, so they can help you make sure that the data is safeguarded, is regularly backed up and preserved long time, something that you as individual researcher or that even skilled research groups may not have the ability to do for the timeframes we are interested in here.
01:40:22
Yeah, and all of this is done with the hope of enabling other people to use your research, to find it and then of course, to cite it and therefore give the credit back to you, the creators of the datasets, the writers of software, the developers of software. We can give you some general recommendations.
01:40:41
So of course, we said PIDs are extremely important. They are almost the first point to get data findable or to get any kind of resource findable. You should look out for use of standards that are widely recognized. So some of them we've talked a lot about here like data site, Dublin Core is another one that comes from the library experts
01:41:01
and also there may be discipline specific metadata formats. So at least make sure it's not something they cooked up themselves. Yeah, the licensing, we will talk more about this in the next few days and certificate is a good guideline but as I mentioned, there's a huge variety of factors
01:41:21
that go into a certification process and some are easier to fulfill and some are more difficult to fulfill. It's, as I said, a complicated process. Besides the Re3Data repository search engine, there's also fairsharing.org which is a bit younger
01:41:41
and also as you can see here in the, I'm not going to read all of this but you can see here that they don't have as many records but they also integrate databases and policies. So you get a bit more complete overview of what is available in that domain, not just metadata standards, repositories and identifier schemas.
01:42:02
However, we will use them for a little exercise because there's just more repositories to search and as you can see, the number here is even lower. So these slides were made just maybe one or two weeks earlier and I looked it up just now, there's even more repositories listed in Re3Data in the last few days.
01:42:21
Yeah, but both of them have in common that they are on the one hand projects that are curated by experts but both of them welcome a user input. So you will probably see it in a few minutes. When you find a repository on Re3Data, for example, there is a button there, I think, send correction suggestion or something like this.
01:42:41
So if you do know that some information here is outdated, please update it for everyone. And yeah, this is what we would ask you to do now. Let's have a little exercise and go to Re3Data. So here, it's three more than on Friday already.
01:43:01
So there's people busy even on the weekends incorporating more repositories. You can of course dive in right away but I want to show you a few other ways first. So we can browse the entire Re3 database by subject, by content type or by country. So for example, we can get this nice map.
01:43:21
So if for example, a national funding agency requires you to use a repository from that country, that would be a good idea to jump into the data this way. If you're interested more about what is popular in my subject area, maybe you could do it like this. So let's see whether there's something about biology.
01:43:42
Oh, there's even microbiology, biological chemistry, food chemistry and there's microbiology. So I'm going to just jump in here. Microbial, I guess I would be a microbial researcher. Here there's eight repositories that are tagged as microbiology. So I then can filter them a bit more.
01:44:03
So I don't need the subject filter anymore but as Angelina mentioned, there's all of these aspects that you may need to take care of, like which PID systems do they use. So let's say I want to use a Doi system. So then I only have four left.
01:44:23
Maybe I need to have the option of embargoing my data. Then I am down to a single repository in the microbiology section. We're down to one repository that fulfills these requirements. In this case, it would be Dryad. Not that I want to make any special advertisement for them
01:44:41
but that's just how the filters in my requirements demo came out to be. As you see, there's some summary here. So they have an open access option. There's licensing advice. There's a Doi as we filtered and we can look up some more information.
01:45:02
And if any of this, as I mentioned before, was wrong now, and maybe you know the subject area and you know that this record is wrong, there is, I think it's here, it's a suggest option. And yeah, what I wanted to show you last is the curation option here, exactly. So if any of this information was outdated now
01:45:23
and you knew it, then you could submit a change request here. So to notify the curators of v3.data that something is outdated. And you can even cite this whole record. So if for some reason you want to cite the curated version of this record here,
01:45:43
maybe because you're writing a report of overviews of repositories in your research domain, then you can even do that. Yeah, and with that, I would suggest that you get on typing. Please look up, for example, a repository from your home country
01:46:02
or from your research domain, as I demonstrated just now, and think about which of these filters are relevant for you, and think about why some of the filters are maybe restricting the results list too much to be useful,
01:46:20
and which compromises you may have to make when you choose as fair as possible repository.
Recommendations
Series of 19 media