DOI Workflow Best Practices: Regional Exemplars from DataCite Members (Americas)
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 8 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/69877 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
00:00
Digital Object IdentifierEreignishorizontVollständiger VerbandTwitter <Softwareplattform>RechenschieberCodeSondierungDatensatzMomentenproblemWärmeleitfähigkeitInformationTwitter <Softwareplattform>EreignishorizontCodierungTouchscreenDigital Object IdentifierMAPDatenverwaltungProdukt <Mathematik>Vollständiger VerbandRechenschieberSchlussregelYouTubeSelbst organisierendes SystemSondierungComputervirusHash-AlgorithmusWeb SiteHypermediaCodeOffice-PaketComputeranimation
01:39
InformationMetadatenVersuchsplanungPhysikalisches SystemVerschlingungDienst <Informatik>HauptidealringKomponente <Software>EnergiedichteHauptidealringZusammenhängender GraphBitEinfach zusammenhängender RaumDigital Object IdentifierMetadatenTouchscreenSelbst organisierendes SystemMereologieSoftwareentwicklerFunktion <Mathematik>Dienst <Informatik>Office-PaketTypentheorieProgrammbibliothekZahlenbereichWeb SiteInformationPhysikalisches SystemBesprechung/InterviewComputeranimation
03:02
InformationSoftwareAnalysisRuhmasseStatistikRohdatenProzess <Informatik>Exogene VariableAbstraktionsebeneComputervirusDatensatzRelativitätstheorieEinfach zusammenhängender RaumDigital Object IdentifierAutorisierungDreiecksfreier GraphIdentifizierbarkeitComputerspielDatenloggerSelbst organisierendes SystemRFIDZusammenhängender GraphSoftwareZahlenbereichObjekt <Kategorie>Rechter WinkelMetadatenEindringerkennungSchnittmengeComputeranimation
04:05
TypentheorieUmwandlungsenthalpieProdukt <Mathematik>VolumenTopologieDatensichtgerätDatentypNeunzehnDatenfeldRegistrierung <Bildverarbeitung>Total <Mathematik>VersuchsplanungSoftwareSystemidentifikationElektronische BibliothekObjekt <Kategorie>VektorrechnungMechanismus-Design-TheorieObjektverfolgungVirtuelle MaschineDienst <Informatik>Digital Object IdentifierHauptidealringMetrisches SystemAnalysisIdentifizierbarkeitRelativitätstheorieEinfach zusammenhängender RaumZusammenhängender GraphWeb SiteDienst <Informatik>LeistungsbewertungInformationVerschlingungVisualisierungMetadatenMereologieComputeranimation
04:51
Dienst <Informatik>VersuchsplanungInformationSoftwareDigital Object IdentifierMetadatenHauptidealringElement <Gruppentheorie>VerschlingungCodeFunktion <Mathematik>TabelleQuellcodeTypentheorieKonfiguration <Informatik>DatensatzSystemplattformWeb-SeiteRegistrierung <Bildverarbeitung>EnergiedichteStrom <Mathematik>IntegritätsbereichRouterDatenfeldSelbst organisierendes SystemAuthentifikationWeg <Topologie>VersionsverwaltungDatentypURLMeterArbeit <Physik>DatensatzRegistrierung <Bildverarbeitung>Digital Object IdentifierMetadatenCodeInformationEindringerkennungVersuchsplanungSelbst organisierendes SystemKonfigurationsdatenbankVerschlingungIdentifizierbarkeitAdditionDatenfeldComputerspielWeb SiteDienst <Informatik>Dreiecksfreier GraphFunktion <Mathematik>SoftwareAutomatische HandlungsplanungVersionsverwaltungStrömungsrichtungAutorisierungMailing-ListeKonfiguration <Informatik>SystemplattformTypentheorieProzess <Informatik>TabelleLanding PageAliasingQuellcodeZusammenhängender GraphObjekt <Kategorie>Repository <Informatik>RückkopplungDatenloggerGraphURLClientDokumentenserverCoxeter-GruppeVerkehrsinformationRFIDMereologieBitZahlenbereichSurjektivitätProjektive EbeneArithmetisches MittelDifferenteRechter WinkelRelativitätstheorieComputeranimation
12:06
EnergiedichteDatenmissbrauchTermOffene MengeDruckverlaufProgrammbibliothekMetadatenCodeWeb-SeiteDokumentenserverEADSpeicherabzugFontCOMMixed RealityInhalt <Mathematik>Coxeter-GruppeIdentifizierbarkeitMetadatenDokumentenserverQuellcodeStandardabweichungPhysikalisches SystemDatenverwaltungPackprogrammMinkowski-MetrikLanding PageMereologieOpen SourceInterface <Schaltung>GrundraumMehrwertnetzWeb SiteTouchscreenExogene VariableFakultät <Mathematik>EinflussgrößeSchnittmengeCASE <Informatik>UmwandlungsenthalpieGenerator <Informatik>BrowserZentrische StreckungProgrammbibliothekElektronische BibliothekBasis <Mathematik>IntegralDifferenteBeobachtungsstudieMaterialisation <Physik>FlächeninhaltObjekt <Kategorie>MultiplikationDateiformatDADSMAPStabt-TestDigital Object IdentifierInhalt <Mathematik>Vollständiger VerbandBenutzeroberflächeChatten <Kommunikation>ComputeranimationBesprechung/InterviewProgramm/Quellcode
17:48
Web-SeiteSchnittmengeTermOffene MengeDatenmissbrauchSoftwarewartungInformationTransitionssystemThumbnailBasis <Mathematik>Dienst <Informatik>Zentrische StreckungMetadatenRouterWeb SiteVerkehrsinformationQuaderPunktBitIntegralStatistikFakultät <Mathematik>Arithmetische FolgeLinked DataSchnittmengeLanding PageDigital Object IdentifierMultiplikationsoperatorElement <Gruppentheorie>BenutzeroberflächePhysikalisches SystemProdukt <Mathematik>SoftwareProgrammbibliothekURLLesen <Datenverarbeitung>DokumentenserverGüte der AnpassungInhalt <Mathematik>QuellcodeResultanteWeb-SeiteUmwandlungsenthalpieElektronische BibliothekOffene MengeHyperbelverfahrenRechter Winkel
23:03
Total <Mathematik>DatenmissbrauchTermOffene MengeMultiplikationsoperatorProgrammbibliothekUmsetzung <Informatik>IntegralVollständiger VerbandDienst <Informatik>Web SiteInformationImplementierungCoxeter-GruppeProgramm/QuellcodeXMLBesprechung/InterviewComputeranimation
24:11
PowerPointFakultät <Mathematik>Objekt <Kategorie>ImplementierungFlächeninhaltPhysikalisches Systemt-TestProgrammbibliothekGrundraumFakultät <Mathematik>Dienst <Informatik>Objekt <Kategorie>TypentheorieIntegralOffene MengeWeb SiteElektronische BibliothekImplementierungRelativitätstheorieRepository <Informatik>Produkt <Mathematik>DatenverwaltungDifferenteAdditionDatenbankDokumentenserverPackprogrammComputeranimation
26:34
PowerPointDoS-AttackeMaßerweiterungDialektProdukt <Mathematik>Offene MengeDienst <Informatik>DokumentenserverSoftwareTouchscreenInformationRechter WinkelLateinisches QuadratDatenbankStatistische HypotheseComputeranimationFlussdiagramm
27:21
PowerPointDienst <Informatik>InformationDokumentenserverRichtungHypermediaFakultät <Mathematik>Wort <Informatik>Kernel <Informatik>TouchscreenStatistische HypotheseWeb SiteMetadatenGenerator <Informatik>Bildgebendes VerfahrenOffene MengeDienst <Informatik>StandardabweichungDatenverwaltungSystemplattformProdukt <Mathematik>InformationComputeranimationFlussdiagramm
28:37
PowerPointE-MailProdukt <Mathematik>DokumentenserverCodierung <Programmierung>GraphiktablettProzess <Informatik>Online-KatalogDatenstrukturSoftwaretestMarketinginformationssystemHoaxKreisflächeStrategisches SpielGlobale OptimierungDatenreplikationGradientHalbleiterspeicherGanze ZahlVarianzVersionsverwaltungSystemprogrammierungStellenringZahlenbereichSystemidentifikationElektronischer FingerabdruckDienst <Informatik>MetadatenRepository <Informatik>Bildgebendes VerfahrenDatenmodellIdentifizierbarkeitPhysikalisches SystemComputeranimation
29:24
Coxeter-GruppePowerPointSISPInformationRegistrierung <Bildverarbeitung>GrundraumCASE <Informatik>InformationProfil <Aerodynamik>IntegralPhysikalisches SystemStellenringComputeranimation
30:02
PowerPointCoxeter-GruppeMultiplikationsoperatorOrtsoperatorComputeranimationBesprechung/Interview
31:16
Office-PaketTypentheorieProzess <Informatik>FeuchteleitungIdentifizierbarkeitInformationCASE <Informatik>Stromlinie <Strömungsmechanik>Spannweite <Stochastik>SoftwareentwicklerRelativitätstheorieBesprechung/Interview
32:21
EnergiedichteSystemverwaltungTermFlächeninhaltPhysikalisches SystemPunktMultiplikationsoperatorRFIDDatenanalyseIntegralSelbst organisierendes SystemVererbungshierarchieBesprechung/Interview
34:19
TOEKanal <Bildverarbeitung>Selbst organisierendes SystemIdentifizierbarkeitStandardabweichungWellenpaketAdditionUnendlichkeitCASE <Informatik>RFIDBesprechung/Interview
35:10
Digital Object IdentifierWeb SiteMetadatenSchnittmengeInterface <Schaltung>Güte der AnpassungQuellcodeDatensatzLineares GleichungssystemDokumentenserverMultiplikationsoperatorPhysikalisches SystemSchlüsselverwaltungPunktZweiAuswahlaxiomBesprechung/Interview
37:38
MetadatenFormation <Mathematik>Digital Object IdentifierMultiplikationsoperatorOffene MengePunktBitElement <Gruppentheorie>Web SiteIntegralPhysikalisches SystemInterface <Schaltung>DatensatzBenutzeroberflächeIdentitätsverwaltungBesprechung/Interview
39:44
ImplementierungSelbst organisierendes SystemBesprechung/Interview
40:36
Office-PaketInformationMetadatenInformationIdentifizierbarkeitDebuggingCASE <Informatik>RelativitätstheorieBesprechung/Interview
42:02
SchlussregelBitPerspektivePhysikalisches SystemMetadatenMixed RealityInformationPunktIntegralRFIDStabKontextbezogenes SystemBesprechung/Interview
43:13
Coxeter-GruppeBesprechung/Interview
44:09
Besprechung/Interview
Transkript: Englisch(automatisch erzeugt)
00:07
Hi everyone, welcome to this Datasite Annual Media Community meeting 2023, where we are going to explore in this session the DOI workflow best practices with a fantastic panel of three speakers where we are going to explore
00:21
their insights and their experiences in their own organizations. So let's begin with some housekeeping ground rules. We remind you that you can join the discussion on Twitter and Instagram with the hashtag Datasite2023. We invite you to review the Datasite codes of conduct
00:42
which are going to be on the chat in a moment. Thank you so much. And by the end of the session we will invite you to fill the survey by the end of this event. It's only two questions so we will appreciate if you can show your insights after this session so we can improve
01:01
on our next events. And we remind you that the slides and the recordings of this event will be shared afterwards event in our YouTube channel and of course in our Cenodo community meeting. So with no further let's introduce the panelists. We're going to start
01:23
first with Sara Studwell who is librarian and product manager from the Office of Scientific and Technological Information from the U.S. Department of NRE. So Sara, welcome and the stage is yours. Thank you so much. Let me share my screen. Okay. All right. Well, hello.
01:51
As Arturo said, I'm Sara Studwell and thank you for the opportunity to talk about DOI workflow best practices using PIDs to connect research components through DOI metadata.
02:03
So a little bit about AUST. The Office of Scientific and Technical Information is similar to a library type office for the U.S. Department of Energy and our mission for the Department of Energy is to advance science and sustain technological creativity by making research and development outputs accessible and useful in the modern science landscape not only to our
02:25
research community but to the general public as well. As part of this we collect R&D outputs from researchers and labs throughout our corporate ingest system E-Link. We also provide PID services to make these outputs discoverable to a wider audience facilitating linkages between components
02:42
like people, awards, organizations and those research outputs. And to date we have assigned over 200,000 DOI's through data site. And to make these as findable and reusable as possible we have an amazing curation team that enhances the submitted metadata. We also provide search tools to make these outputs findable. So PIDs provide a number of benefits. One of these is
03:07
to broaden discovery and access. Here is an example of research component linking through persistent identifiers. And I'll say that we're already doing some of this but we're not doing all of this yet but this these are the goals here. So this is an example is a data record
03:23
we have in our collection that we've assigned a DOI to. On the left you can see the robust metadata. It includes the ORCID IDs for the creators or authors of the data set which we are currently incorporating. The ROARS for both the organization producing the data and the funders as well as the award DOI which are pieces of metadata that we want to include.
03:44
So the right shows how we plan to include related identifiers to connect related research objects. You can see the original data record reference an article and a piece of software and was cited by another data set. So this helps communities to not only find and reuse the data but also to see the larger research life cycle and understand the impact of this research.
04:07
Another benefit of these persistent identifiers is helping with analysis and impact assessment. So by using these PIDs communities can identify these linked components, understand how they are connected, and then use those connections to track and understand
04:21
the impact of the research. So things like citation analysis and related identifier metrics facilitates impact evaluation through those linked PIDs within the associated metadata. Here are a few screenshots from PIDs at OSCI.gov, our website that provides information about persistent identifiers and their uses and then consolidates all of the information
04:43
about the various PID services we provide. And this shows just some visualization examples from that website. As part of our mission, we do offer these PID services both to the DOE community and to other U.S. government agencies. We assign DOIs to data, to software,
05:04
reports, conference presentations and posters, and awards. Specifically, we assign DOIs to data through our data ID service and to software through DOE code. So we use E-Link for the majority of research submissions, but DOE code is our platform for all things software,
05:21
including the submission of code and its associated metadata. And I'll speak a little bit more about DOE code in a bit. So labs, facilities, or individuals submit metadata to OSCI via the E-Link or the DOE code API or UI. And then this metadata describes the object, which is then passed on to data site for DOI assignment. And just as data site does,
05:42
we encourage providing as much metadata as possible to increase discoverability, access, and reuse of the data and that software. So this is our current approach. I'll also be talking about our plans for the future, but I'll go through what we are currently doing now. So we already have done some of this work, but there is other work underway to include a lot
06:04
more metadata. So currently researchers or labs will submit their research outputs or the For some of these output types, OSCI will sign a DOI if the object does not already have one.
06:20
If the object already has a DOI, OSCI will collect that piece of metadata. So additionally, submitters have the option to provide other persistent identifiers, including ORCID IDs, award DOIs, and other related identifiers. The submitted metadata is then put through our enhancement process. So we have a team that will review and curate that metadata. The record is then updated and stored, and that original and this new record,
06:46
this enhanced record, will overwrite the originally submitted metadata. So we also use Scolix, Crossref, and other sources to enhance the metadata when possible. We had previously added related identifiers from some of these external data sources to the
07:02
submitted data records, but after feedback from our communities, we stopped doing so. So all identifiers found in the metadata has been added either by the submitter or data creator, or when found through manual curation. So this graph illustrates the workflow of DOI assignment for data. So we provide this for the DOE data community, and we call this the data
07:26
ID service, assigning DOIs to DOE funded data objects. Data creators, labs, data stewards will submit that metadata for the data object to eWIC, including a URL to where that data is hosted. So
07:40
we are not a data repository. These DOI landing pages, excuse me, these DOI landing pages resolve to the repository where the data is hosted, and we do require that URL to the landing page for DOI assignment. So OSTI will assign a DOI based on the prefix for what we call a data
08:00
client, but that's, you know, a project, a lab, etc. So we then assign a suffix, which is an internal OSTI identifier number. OSTI then sends the metadata to data site for DOI registration. So this record would now become available in OSTI.gov or DOE Data Explorer, with the DOI providing a link back to the landing page where the data is hosted. So furthermore, this example
08:25
shows when another research object like a journal article sites that data, the DOI will also link back to that landing page. Excuse me. We do the same thing for software through DOE code for DOI assignment. As mentioned, DOE code is the software services platform and search tool
08:46
for DOE funded code. And so just like data, the metadata for the software or code is submitted to OSTI, and then OSTI will send this metadata onto data site for DOI registration. One difference here is that OSTI serves as the landing page for that software,
09:01
meaning that the DOI is going to resolve to DOE code where you can find the metadata, including versioning information, which you can see in the screenshot on the right. So those covered our current workflows. But like I said, we want to do even more. So to accomplish this, OSTI is rebuilding E-Link to reflect the current needs, but also to
09:23
anticipate future requirements. We gather requirements from across OSTI as well as external communities. One major component of that rebuild is the collection of additional metadata, focusing on related identifiers. These are things like ROAR IDs or Research Organization Registry IDs in the organization metadata fields and incorporating, collecting, populating ORCID IDs
09:47
into that submission workflow. We'll be able to send this on to data site, providing a more complete picture of that research lifecycle. And this all aligns with our new public access plan as well. So the new workflow when a researcher or lab submits metadata, OSTI will
10:04
preserve that original metadata record in our data tables. We will then have a separate layer that will allow for the enhancement and also to track provenance. So this second enhanced layer is going to contain additional metadata that will both systematically and manually curate,
10:20
including things like version information, ROAR IDs, related identifiers. This record is then going to be evaluated and any additional metadata that can be sent to data site will be included in both the original submissions for DOI registration, as well as any updates to metadata as well. And so lastly, I just want to mention something that I think is very
10:44
exciting. I mentioned we'll be able to systematically add those ROAR IDs to our records through an organization authority that was developed and built in-house. So we've had an internal authority list containing historical names and their synonyms. And we've used this to
11:01
standardize our organization names, but we're now building off this authority to include even more information. So we've been collecting this for decades. And so now we can include the information, affiliations, research organizations, funding organizations.
11:20
But this authority doesn't just include ROARs. It has a ton of information about the organization, including other identifiers, geographical information, and aliases. You can see in this example, the list of aliases is quite long, probably about 20 or so for Oak Ridge National Lab, which is one of our national laboratories. The fact that we have
11:43
collected and curated all of this, I find very impressive. So when a user submits a record to OSCI, the organization names will be standardized, and then OSCI can include the organization identifier metadata when assigning DOIs to research components, creating that persistent identifier link between the DOI and the organization identifier.
12:07
So that concludes my presentation. We're very excited about the work being done to incorporate persistent identifier metadata, recognize the impact it will have on the discovery of research and increasing scientific discovery. And I know this has questions,
12:21
but I think we're holding those till the end. So thank you. Thank you so much, Sarah. Now let's proceed with our next speaker, Eugene Barsky, from the University of, sorry. Are you seeing what you're supposed to see?
12:49
Eugene Barsky, Research Data Management Librarian from the University of British Columbia at Vancouver. Thank you so much, Eugene. Please go ahead. Thanks so much. I prefer to present from the browser so I can see the chat flowing. And
13:04
good to see so many familiar names for folks who have been working for a generation now in this area. Thanks so much for joining. Greetings from Vancouver, British Columbia. It's early morning, well, not early, 11 o'clock now. And it's a beautiful, crispy day. I hope you have enjoyed
13:23
your day wherever you are. And I'm just not sure if we can maintain the same level of excitement about metadata as Sarah does, but I will try to attempt to do that. So my presentation about a specific case study and institution. And to explain our case study,
13:42
I think I should better explain what the institution is. So I'm based in Vancouver in Canada. University of British Columbia is the biggest school in the west coast of Canada, the second largest in Canada, more or less. We have 70,000 students. We have 17,000 faculty
14:02
and staff. We get almost a billion dollars in research funding, and we have an annual budget of more than the cities that we are located in. So they are a really big school. And as a library, we also are rather a large library. We have more than 300 employees and 80
14:20
librarians. We run many systems. It might be familiar for so many of you, but we are a big school that we need to take care of our stuff that we produce. As a large library, we run four standalone purpose-built repositories. We run a data resource for research data for quite a while. We run this space for even longer.
14:45
We run Atom for archival materials, and we have content VM for around 1 million digitized objects. Since we have four various repositories, it makes sense for us to create
15:00
one discovery interface where our users can see all our content in one place. So we home built it and it was released in 2016. Many of you have seen it. They are just recognizing your names from the chat. We are listed open source. It actually powered the Canadian data discovery for quite a few years as well. So what we do, we take the metadata
15:27
from the source repositories, and we crosswalk it to our common metadata standard, which we have released. All the crosswalks are publicly available on our GitHub. We crosswalk it to one
15:41
common metadata standard, and from there, we crosswalk it to the data site and the schema.org for enhanced discovery. So unlike many other places, we don't mean to device in the source repositories. We need the UI and the discovery layer.
16:03
And in some cases, we write it back into the source repositories. In some cases, we do not. Like content VM, it doesn't allow you to write it back. We have been doing it for quite a while. I have initiated the device meeting in 2015-16. So we minted almost 300,
16:26
actually more than 300,000 UIs now. Just this year, we minted 15,000 UIs, and we maintain them on a weekly basis. And we will talk about it when we talk about issues with UIs. If you do it on scale, you have to take responsibility and maintain the stuff you do.
16:46
We have a robust API system, and we allow community integrations, including IIIF, into our work. And we have seen different community integrations implemented, and it's very
17:01
cool to see that your work actually can build into something else. For each of our data sets and also any other digital assets, we have a landing page, which is very useful if you want to expose your digital assets into Google Scholar.
17:21
Google Scholar doesn't like to take data, as you know, but you can ship data to Google Scholar if you work with the Discovery interface. And they create landing page programmatically, obviously, to each of those digital assets. We allow data to be downloaded in multiple formats,
17:43
and metadata is available from JSON to RDFs to any other formats that different systems can read. Also, for each digital asset, we have a landing page with statistics and locations enabled, which is really cool to see for some faculty and researchers who want to see where
18:02
their research is being looked at and used, especially for data. So, when we implemented this service around eight years ago, we have offered it campus-wide. We said, okay, we'll start offering UI service for researchers who need it.
18:22
And we learned very quickly within the first few months that it's actually an issue. Many faculty members, many researchers reach out to us and ask for a UI, for a brochure, for a piece of software that does not have a persistent URL solution. And as a result,
18:43
since we signed the license with a data site, the library is responsible for it. We cannot commit to maintaining those because if the URLs change, it's on us to edit all those digital items. So, we are very cautious nowadays to commit to any one-off faculty requests
19:06
for minting DOIs. Obviously, there are many advantages for us to mint DOIs. We get the persistent URLs, but most importantly, as all of you know, being here, each UI is not
19:21
just a persistent URL. It's also a metadata package. And that metadata package is very helpful when we ship information about our digital assets to other partners, like schema.org, like we ship it to all ProQuest products as well, Salmon, Primo, and so on. And we take
19:42
good use of data site APIs and GraphQL for citations. Some things to pay attention to, and I know that all of you know that, but this is actually a painful point. You cannot delete a DOI after minting it. So, if you work with a discovery system, by mistake,
20:05
some systems, source repositories like ContentDM create pages that you need to delete later. But since you minted DOIs for those automatically, you have to create some stones. So, you have to create solutioning for the UI. And one thing to remember,
20:23
you cannot delete them. You can edit them, but you cannot erase them forever, which means maintenance is required for our scale of 300,000 DOIs. We do it on a weekly basis, on a weekly basis around the report for the DOIs that have not been validated and which fixed us. And as I mentioned, faculty who request DOIs as a service
20:46
is not a good idea, that we really learned it from our own experience. The data site folks asked me to talk about ROARS and our integration for ROARS. I was really interested to hear what Sarah has to say before me. So, we also, we waited a bit
21:06
before integrated ROARS, but you will all know that ROARS is a new-ish standard, replaced grid. We are very excited to be able to play with it. And it works well with orchids and device. It's not proprietary. It's open. And that's how we like things to be.
21:26
So, right now, at this point of time, if somebody looks for an item from all, sorry, all digital items from the University of British Columbia, they're not able to see it because there's no place we're actually saying all items that we have are UBC items.
21:44
ROARS help us to solve that. If we implement ROARS, it allows our institution to show the community that all these digital assets are UBC digital assets. So, ROARS is an excellent way
22:02
to expose your collections to anyone who can read the linked data. So, what we are doing, it's a work in progress. It's not completed, but it's work in progress. We are implementing ROARS in all data site metadata. So, all the 300,000 UIs, respectively and prospectively,
22:25
we are adding a couple of elements into the metadata for data site where we claim that each of those UIs has a contributor, UBC, as a hosting institution. And here is our ROAR.
22:46
Which means any institution can use an API and search for a specific ROAR, and you'll be able to retrieve all our data sets together. And I gave you the specific elements that they are using,
23:03
but that's more technical question. So, I will leave it at that, and I thank you for your time, and we'll wait for the questions at the end. Thank you. Thank you so much, Eugene. Very insightful. And to close this conversation, we have
23:23
the participation of Rodrigo Donoso-Vegas, who is a director from the Directorate of Information Services and Libraries at the University of Chile. Thank you so much for joining us, Rodrigo, and please feel free to share your insights. It's okay. Okay, thank you. Thank you, Arturo, for the introduction and all participating in
23:46
this session of data site meeting. It's a pleasure to participate in this activity and to represent the experience that we have developed at the University of Chile. My presentation will deal with the implementation of integration between information service
24:06
and service such as DOI and ORCID. I would like to contact our university, which is the main educational center in Chile, in my country. We have 47,000 students and 4,000 academic
24:26
researchers, 20 faculties, an institute in all areas of knowledge, one hospital, and five campuses in all Santiago of Chile.
24:41
Our library system has a 45 library, archives, a museum, three types of centers, and 90 librarians. We also have a collection of more than 3 million items and more than
25:01
45,000 digital objects that are available in our digital library and complementary service of journals, books, data, among others. In addition to traditional service, we have the implementation of research support service specifically to support the need
25:27
of researchers in relation to research data management, data management plans, open science, among other issues. In 2021, the University of Chile began to lead
25:44
the Chilean Data Site Consortium and start to promote its infrastructure and product. Previously, we have developed the first data repository in Chile. In 2023, we already have six members who have also been implementing their own digital
26:09
infrastructure and using data site service. As consortium leaders, we provide support in the Fabrica and the integration with different systems, mainly Dataverse.
26:25
The university has been a member of ORSI since 2023. At the university, we have an ecosystem of information service, which includes, from left to the right on the screen,
26:46
an academic repository, a digital library, we use Primo and Alma, a journal portal developed in OGS, a Pecape product, a book portal,
27:03
OMP, Open Monograph Press, a Pecape product, research data in Dataverse, Dataverse software, and other regional services like a Latin American repository network.
27:21
Specifically, in the academic repository service, which includes mainly thesis and research articles, the book portal, the red screen, the red image, include open access books, the journal portal, and the portal data, data portal, generate those from the data site Fabrica
27:45
and are immediately available in the data commons using data site metadata standards. For example, in the books portal, developed in Open Monograph Press, a Pecape product,
28:03
from this platform, we automate the data management via API. For example, in this screen, you can see that we click on the title and directly send the information to data site
28:21
with all metadata to data site. The visibility that our book now has is very high, saying the incorporation to this book to the global infrastructure. In another example,
28:41
the data repository, we include ORC in the metadata and allow us to link with local and global service to give greater visibility to our research. In the image, you can see the
29:00
data model that integrate the service and if we can zoom in, you can see the importance, the identifiers, such as DOIs, ORC, or ROAR to link with academic service, such as CRIS system. To promote this infrastructure, it's necessary to promote the
29:28
use of a persistent ID, specifically in the case of ORC, we are detecting users who already have ORC, but we have not associated their affiliation to the university.
29:46
We also give support to academics to create or improve their profile. Considering that the information will reach local system via API or another
30:01
integration. Thank you for your attention. Thank you, Arturo. Thank you so much, Rodrigo, Sara, Eugene. It has been really interesting to see how you have been working within your institutions to implement these workflows. I think, well, there are fantastic examples of best practices and I am sure that the audience will be really
30:24
interesting to know more or have been at least taking some insights of your presentations. I will remind you that we have some time to solve some questions that you may have for presenters on today's session. So, we invite you to use the Q&A button to
30:41
to perform or to make your questions. We have received a couple of them. Some of them have been already answered. Thank you so much for your interest and for being so active. Maybe we can start with what we have seen during your presentations. I would like to ask you
31:01
what has been the main challenge or what you consider the main challenge that you have faced in your position to implement these workflows within your community? I don't know if you like to start, Sara. Sure. For us at the U.S. Department of Energy, we serve all 17 of the
31:23
national labs and that covers a very diverse range of research and types of research. And so, kind of interacting with all of those communities and especially as we start to or start to or we are rebuilding E-Link, gathering their needs and their requirements
31:44
and making the submission process as streamlined and as user-friendly as possible so that there are lower barriers to them either submitting the related identifier or
32:00
persistent identifier information or, in our case, we're going to be systematically adding those more so just taking that completely out of their hands. They don't even need to worry about it. So that has been a big priority and I know our development team has done a lot of work to make that possible. Thank you so much. Eugene, would you like to...
32:24
Thanks, Antoine. Thanks, Sara. I will focus on three challenges. One is something to be aware of. None of the UI services, although not proprietary and open,
32:43
are easily free. So, it takes energy and finance and money to commit to maintain and do this work, not in terms of paying the cost of an annual license, which also has a cost, but also it's mostly about the time we need to spend as developers and as a data analyst to do
33:09
this work. It is not a super easy piece of work. Two is explaining the value to our administrators. Administrators do not necessarily understand or have to understand the value of
33:24
PEDs and the infrastructure and how the PEDs can live together and improve. Some countries have done excellent work in this area. In Australia and England, they have national initiatives. We have it too in Canada, but it's ongoing. It's not. It's happening slower. Many administrators don't understand it and they have to package and sell it to them.
33:47
And three right now, as it's at this point of time, there are many holes in the system in terms of PEDs working together. It is not a perfect solution yet. It's more like a seed. And we
34:04
try to solve some problems, but we are far away from making this a smooth integration between all PEDs working together. I will leave it at that. Thanks, Arturo. Thank you so much. Rodrigo, would you like to share the challenges your organization has faced?
34:27
Okay, yes. I think that the main challenge for our institution is to promote the use of identifiers or PEDs to work on standardization.
34:43
In our case, not all researchers are aware of the importance of identifiers. Also, develop and adequate a standard-based infrastructure and in addition to being able
35:02
to count on training professionals, mainly in Latin America. Fantastic. We have one question from Sherry, which is directly for Eugene. It's asking, are you generating DOIs for all data sets from the Dataverse side? If so,
35:22
why not just meet the DOIs via Dataverse or are you sending more metadata to data side when meeting DOIs through the custom interface? I was typing the answer, Arturo. Stop typing and answer it. Hi, Sherry. Good to hear from you, as usual. So I said we have four source repositories. Similarly to what Rodrigo is
35:48
mentioning, he mentioned Dataverse, I think they run it too. Yes, we mean DOIs. Okay, I'm going to be controversial here and I was told not to, so I will be gently controversial. They mean DOIs in Dataverse because we have to, it's built-in. Our only choice is handles
36:07
on DOIs. We mean handles in one of our Dataverses, the licensed data, we mean DOIs in the second Dataverse, which is the research Dataverse. So we mean DOIs, it's built-in, you can't avoid it. When we import those metadata records into our discovery system,
36:25
we mint another one, another DOI, because that's how we build the system. What we do at that point of time, when we have duplicates, we have piece of work that submits to data side, those two DOIs are identical. We're actually claiming those two
36:46
DOIs are identical and it's for citation purposes. Why we are doing it, why we are creating it ourselves? It's because we build the system way before everything, all those source repositories speeds were introduced. When we started to work with the open collections, it's
37:05
a legacy issue. Dataverse was not working with DOIs, it was working with handles only, and that made sense for us. It's a legacy problem that we are living with. It's a long answer, but I wish it would be easier for us to do. Fortunately,
37:25
data side schema, the 4.5 now, allows you to claim that DOIs are identical and it's very helpful and I'm thankful for that. Fantastic, thank you so much.
37:42
One question that we also, oh, there is one particular question that is asking to Eugene, can you edit DOI metadata in Dataverse once they have created it? So, technical question, as another one. So, once the metadata in Dataverse is created and
38:05
is submitted to a data site, that's the Dataverse DOI metadata, whatever is submitted. I don't remember how many elements, I think only the mandatory elements are submitted on the Dataverse
38:20
and no more than six, the mandatory elements. So, that's what's submitted to a data site. When we take that item into our homegrown open collection interface, we mint another UI with a bit more metadata, including ROAR,
38:43
because the Dataverse does not have ROAR built in at this point of time. Even the ORCID integration is not superb in any way. ORCID integration and data, we are talking about systems here, ORCID integration and Dataverse is not there yet. It's not very good. It will be better eventually, but right now it's not. So,
39:04
we're actually enhancing those metadata records in open collections. And we, again, claiming that those UIs are the same, identical. But that's just how one institution, it's not perfect in any way. That's why I said the ecosystem right now is full of issues.
39:27
But I'm hopeful that as we work through those issues, we are going to make them better, and be aware of those issues. And that's the first step for resolving them properly.
39:42
Thanks again. Thank you. I'm more general question for, especially for organizations who are starting their path in this implementation of workflows. Do you have any suggestions for those who are,
40:00
well, starting on this path and may face some of these challenges that you have also expressed here? Do you have any suggestions on how to proceed and how to document themselves, or how to proceed when they are starting their path in implementation, these kind of workflows
40:21
within their communities? I don't know if maybe, Eugene or, sorry. I can go ahead. I guess it kind of depends on the workflow, especially incorporating persistent
40:46
identifiers or this related identifier metadata, if you are relying on the submitter to provide that information. We all know that, especially with researchers and things like that, there's that researcher burden, and just getting the basic metadata is sometimes hard enough.
41:06
So education, kind of like what Rodrigo was saying, education on the front end, why it's important to include your ORCID ID, why it might be important to include the rewards or the related identifiers. You've done all this research, you've done all this work,
41:24
these things help increase discoverability. Maybe the education aspect from that end, and then at least from our use case, I guess, would be because we are relying on the
41:40
submissions, but then automating what you can, kind of taking it off of your submitters. So incorporating the collection of that ORCID ID on submission or systematically pulling related identifiers, persistent identifiers, whenever possible.
42:02
Thank you so much Sarah. Maybe Eugene, would you like to share it? I think a little bit to Rodrigo. Rodrigo definitely has a different perspective than I spoke enough to. Rodrigo, would you like to? Okay, yes, I agree that it's important to train the internal team, internal staff,
42:25
to raise the awareness and consensus to the important bit. And also the user. The user needs to train about the ORCID, about the role, about the, eventually the next integration,
42:49
the disinformation with other systems, like Greece, like metadata is only one start point, but disinformation is in another system. And
43:05
this is an important issue. Thank you so much. I don't know if we have any additional open questions. I think all the questions have been answered yet. So if you agree, we can wrap it up for today.
43:27
Thank you so much to all the attendees. Thank you so much for keeping up with us. There have been many hours of sessions, but I really, really appreciate that you have made the effort to join us during the whole sessions. And of course, a huge thanks and
43:42
congratulations and thank you for our speakers today. I think we have seen fantastic presentations and I am sure that these best practices can develop to further discussion in the future. So thank you so much for opening the door for these questions in the future.
44:02
Thank you so much and we'll see you on the next session. Thank you. Thank you.