DOIS and Data Citation: Back to Basics - 1st May 2014
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 18 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/35920 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Data Citation14 / 18
00:00
Hash-AlgorithmusGüte der AnpassungDienst <Informatik>Digital Object IdentifierCoxeter-GruppeWeb SiteKanalkapazitätProzess <Informatik>Online-DienstEinsJSONXMLUMLComputeranimation
00:56
Digital Object IdentifierAbgeschlossene MengeDigital Object IdentifierMultiplikationsoperatorVorlesung/KonferenzComputeranimation
01:34
TrägheitsmomentElektronischer FingerabdruckDienst <Informatik>BildverstehenGebäude <Mathematik>EinflussgrößeWeb SiteTransformation <Mathematik>Quick-SortPunktZahlenbereichKontextbezogenes SystemDigital Object IdentifierProgramm/QuellcodeXML
02:40
InformationDatenstrukturFunktion <Mathematik>TypentheorieGarbentheorieSchnittmengeVerkehrsinformationMaterialisation <Physik>XML
03:39
MathematikBitReelle ZahlFormale GrammatikMaßerweiterungServerZahlensystemMechanismus-Design-TheorieComputeranimation
04:21
ZeitabhängigkeitElektronische PublikationMinimumTouchscreenProfil <Aerodynamik>Klasse <Mathematik>Funktion <Mathematik>ZahlenbereichMechanismus-Design-TheorieTypentheorieQuellcodeAutorisierungQuick-SortGarbentheorieDeskriptive StatistikRechenschieberEvoluteStandardabweichungSchnittmengeComputeranimation
06:00
Konvexe HülleSchnittmengeDatensatzBefehl <Informatik>GrundraumKreisflächeBitDifferenteRelativitätstheorieQuick-SortAttributierte GrammatikIdentifizierbarkeitDateiformatRechenschieberAutorisierungObjekt <Kategorie>Mailing-ListeGeradeDigital Object IdentifierQuellcodeProgramm/QuellcodeComputeranimation
07:40
GruppenoperationBefehl <Informatik>SondierungURLIndexberechnungDigital Object IdentifierQuellcodeDeskriptive StatistikZahlenbereichBitProgramm/QuellcodeComputeranimation
08:30
TypentheorieHinterlegungsverfahren <Kryptologie>TermObjektverfolgungMetrisches SystemGammafunktionDigitalsignalObjekt <Kategorie>Jensen-MaßDatensatzVerschlingungProzess <Informatik>Metrisches SystemProdukt <Mathematik>IdentifizierbarkeitStandardabweichungVirtuelle MaschineDienst <Informatik>TypentheorieDigital Object IdentifierMessage-PassingServerFunktion <Mathematik>Web-SeiteBenutzerbeteiligungMetadatenMatchingDateiformatHinterlegungsverfahren <Kryptologie>Weg <Topologie>StrömungsrichtungComputeranimation
10:53
Digital Object IdentifierDienst <Informatik>SoftwareMusterspracheEindeutigkeitVirtuelle MaschineSelbst organisierendes SystemZeichenketteDateiformatBitCASE <Informatik>Repository <Informatik>Digital Object IdentifierSoftwarewartungTermSoftwareDienst <Informatik>Web SiteComputeranimationXML
12:10
Produkt <Mathematik>Metrisches SystemKreisbogenSchlussregelMetrisches SystemQuick-SortAssoziativgesetzBenutzerbeteiligungFlächeninhaltProdukt <Mathematik>VektorpotenzialAutomatische IndexierungDienst <Informatik>WhiteboardStatistikAutorisierungSchlussregelFunktion <Mathematik>Wort <Informatik>SystemplattformDatenverwaltungInformationsspeicherungRankingKlasse <Mathematik>Service providerDigital Object IdentifierComputeranimation
14:37
ChecklisteQuellcodeOnline-KatalogMetadatenSoftwareentwicklerDienst <Informatik>BitFlächeninhaltMultiplikationsoperatorEreignishorizontWeb SiteChecklisteRechenzentrumDigital Object IdentifierComputeranimation
15:27
Virtuelle MaschineDatenverwaltungClientBenutzeroberflächeInformationDienst <Informatik>Dienst <Informatik>TypentheorieOffice-PaketInterface <Schaltung>Supremum <Mathematik>Selbst organisierendes SystemBildschirmmaskeInformationComputerspielGrundsätze ordnungsmäßiger DatenverarbeitungBenutzeroberflächeZentralisatorDigital Object IdentifierVorzeichen <Mathematik>ChecklisteEreignishorizontPhysikalisches SystemFunktionalWeb SitePunktBenutzerbeteiligungGrundraumVirtuelle MaschineDokumentenserverClientDatenverwaltungNotepad-ComputerSchnittmengeNeuronales NetzJSONXMLUMLComputeranimation
17:13
Registrierung <Bildverarbeitung>AutorisierungSelbst organisierendes SystemMathematische LogikDigital Object IdentifierSystemidentifikationPhysikalisches SystemFramework <Informatik>Objekt <Kategorie>DigitalsignalStandardabweichungDienst <Informatik>Element <Gruppentheorie>InformationRechenzentrumWeb SiteSelbst organisierendes SystemSystemverwaltungElement <Gruppentheorie>Digital Object IdentifierDienst <Informatik>GrundraumPhysikalisches SystemInformationOrdnung <Mathematik>Mathematische LogikSystemidentifikationFramework <Informatik>StandardabweichungLesen <Datenverarbeitung>Registrierung <Bildverarbeitung>KonfigurationsdatenbankVirtuelle MaschineXMLComputeranimation
18:32
ClientSelbst organisierendes SystemAdressraumSpannweite <Stochastik>Web SiteDigital Object IdentifierGruppenkeimE-MailMAPDomain <Netzwerk>URLStabInformationLanding PageDienst <Informatik>Selbst organisierendes SystemDigital Object IdentifierMessage-PassingDomain <Netzwerk>ComputersicherheitWeb SitePhysikalisches SystemClientSoftwaretestMomentenproblemProdukt <Mathematik>BeweistheorieNetzadresseDivisionDatenverwaltungAutomatische HandlungsplanungChecklisteSpannweite <Stochastik>Dienst <Informatik>Neuronales NetzExogene VariableRegistrierung <Bildverarbeitung>Virtuelle MaschineMailing-ListeAdressraumE-MailAuthentifikationResolventeLie-GruppeEigentliche AbbildungMAPXMLComputeranimation
20:38
ClientImplementierungAuthentifikationURLDigital Object IdentifierMultiplikationsoperatorProdukt <Mathematik>Selbst organisierendes SystemDemo <Programm>Dienst <Informatik>NetzadresseURLSoftwaretestApp <Programm>ResolventeIdentifizierbarkeitExogene VariableWeb SiteVersionsverwaltungWort <Informatik>Projektive EbeneVariableClientImplementierungVorzeichen <Mathematik>TypentheoriePhysikalisches SystemSoftwareentwicklerStreaming <Kommunikationstechnik>Dichte <Stochastik>Zusammenhängender GraphPunktAutorisierungNormalvektorAbenteuerspielService providerParametersystemRegistrierung <Bildverarbeitung>AuthentifikationDigital Object IdentifierZeichenketteMehrrechnersystemOrdnungsreduktionComputeranimation
24:18
URLParametersystemLanding PageInformationDigital Object IdentifierMetadatenGammafunktionClientImplementierungDienst <Informatik>DifferenteSoftwareentwicklerValiditätWeb SiteRepository <Informatik>Produkt <Mathematik>MomentenproblemSchnittmengeObjekt <Kategorie>Element <Gruppentheorie>Dienst <Informatik>NummernsystemInstantiierungFormation <Mathematik>MetadatenVersionsverwaltungVirtuelle MaschineMereologieInformationUmfangSoftwaretestProzess <Informatik>Anpassung <Mathematik>DatentypPunktAutorisierungDigital Object IdentifierMatchingJSONXML
26:51
ClientBenutzeroberflächeMailing-ListeFunktion <Mathematik>URLLanding PageSichtenkonzeptRegistrierung <Bildverarbeitung>Digital Object IdentifierStatistikURLInterface <Schaltung>Mailing-ListeServerOrdnung <Mathematik>Registrierung <Bildverarbeitung>AdressraumE-MailWeb SiteBenutzeroberflächeClientKartesische KoordinatenVerkehrsinformationMultiplikationsoperatorSelbst organisierendes SystemDienst <Informatik>Produkt <Mathematik>Prozess <Informatik>LoginDigital Object IdentifierXMLComputeranimation
28:05
YouTubeWeb-SeiteHypermediaVideokonferenzMetadatenWeb-SeiteNotepad-ComputerKonditionszahlKonfiguration <Informatik>ModemObjekt <Kategorie>IdentifizierbarkeitDigital Object IdentifierQuick-SortComputeranimationXML
29:15
Kernel <Informatik>Meta-TagDienst <Informatik>VerschlingungErwartungswertMetadatenDigital Object IdentifierStandardabweichungWeb SiteTeilmengeKonfiguration <Informatik>SoftwarewartungWeb-SeiteSelbst organisierendes SystemTermAbfrageRelativitätstheorieSpannweite <Stochastik>DatenbankTypentheorieAutomatische IndexierungIdentifizierbarkeitCASE <Informatik>MultiplikationsoperatorMAPKanalkapazitätRepository <Informatik>SchnittmengeInformationUmsetzung <Informatik>MultiplikationKollaboration <Informatik>Einfache GenauigkeitEntscheidungstheorieBitNotepad-ComputerZahlenbereichQuick-SortKonditionszahlResultanteArithmetisches MittelDatensatzUmwandlungsenthalpieLanding PageService providerVerschlingungSnake <Bildverarbeitung>BestimmtheitsmaßLie-GruppeMetrisches SystemProzess <Informatik>DateiformatSchreib-Lese-KopfVollständiger VerbandXMLUML
38:50
Kernel <Informatik>Meta-TagVerschlingungMultiplikationsoperatorWeb SiteMetadatenURLImplementierungDienst <Informatik>TypentheorieUmwandlungsenthalpieRechenwerkSelbst organisierendes SystemBitGrundraumDigital Object IdentifierSchnittmengeKontextbezogenes SystemIdeal <Mathematik>Web-SeiteMereologieQuick-SortRelativitätstheorieElektronische PublikationInstantiierungRepository <Informatik>TermProzess <Informatik>SoftwarewartungCASE <Informatik>Snake <Bildverarbeitung>DatensatzZeichenketteAuflösung <Mathematik>PunktRegulärer GraphBasis <Mathematik>Program SlicingInformationElement <Gruppentheorie>Hinterlegungsverfahren <Kryptologie>Service providerDokumentenserverBeobachtungsstudieVorzeichen <Mathematik>RechenzentrumAbfrageRegistrierung <Bildverarbeitung>DatenbankVersionsverwaltungSoftwareGruppenoperationMAPLesen <Datenverarbeitung>ÄhnlichkeitsgeometrieAmalgam <Gruppentheorie>Funktion <Mathematik>XMLUML
48:25
HypermediaStrategisches SpielVererbungshierarchieZahlenbereich
Transkript: Englisch(automatisch erzeugt)
00:00
Okay, welcome everybody. Good afternoon. My name is Sarah Olsen and welcome to the Australian National Data Service Data Citation and DOI's Back to Basics workshop. We're lucky enough to have two experienced presenters today talking about data citation and DOI's. The first presenter is Geri Ryder. Geri's going to introduce data citation and the
00:30
next presenter will be giving an overview of the ANDS CiteMyData service which is our online service via the ANDS website to attach or assign DOI's to your research data. So Liz will explain how this works and how you can do this if your institution has the
00:43
capacity for DOI minting. Liz will also be able to answer any technical questions that you might have about this process. So welcome to our presenters and all our participants of course and we'll get started with Geri. Hi everybody thank you for coming along today and it's great to see so much interest in
01:02
data citation. It's a topic close to my heart. What I'll be covering today and what I hope will be a relatively short session. I'd like to allow plenty of time for questions after both Liz and I have spoken. But we'll start with looking at what is data citation,
01:21
what is a DOI and the relationship between DOI's and data citation and then a quick look at why we care about data citation. Not everyone in the audience may be familiar with ANDS so just briefly ANDS is a
01:41
Commonwealth funded initiative which has been established to enable what we call the transformations of data from unmanaged to managed, disconnected to connected, invisible to findable and single use to reusable and we work across the publicly funded research and data
02:02
sector of Australia. You can find out more about ANDS on the ANDS website. Today we are talking about data citation and DOI's. You might like at some point to visit the ANDS website to download this lay for poster. It actually neatly summarizes a number
02:21
of the concepts that we'll be covering today, but it also does provide some broader context to the topic of data citation, some of which we won't be able to get to in this sort of back to basic session today. But it's a nice, neat resource that you might find useful. So let's start by looking at what data citation is.
02:44
Well, quite simply, data citation is the practice of providing a reference to data in the same way that researchers routinely provide a reference to other outputs in their papers, such as journal articles, reports and conference papers.
03:01
And this is done for the same purpose for data as it is for other scholarly outputs. That's to acknowledge the use of materials or resources and to provide enough information to enable others to verify and access the resource. And in this example, where I've just blown up the actual citation, you can see that
03:25
a data set has been formally cited in the reference section of this journal article. And you can see that the structure of it is probably fairly similar to other types of
03:41
references. So just stepping back a little bit, it's probably fair to say that within the scholarly community, it has been and to some extent continues to be common practice for research data to be shared informally between colleagues. Until relatively recently,
04:02
there were few mechanisms for actually publishing research data to make it more broadly available. Mostly it was stored on local servers or perhaps a USB stick in the bottom drawer. So it's probably not surprising that there wasn't a real concept of formal data citation.
04:22
However, now there are a growing number of mechanisms for publishing data and on the screen now you can see just a few examples. And these have really evolved to support global data driven research, but have also served to raise the profile of research data as a first
04:41
class research output rather than as a byproduct of research. When research is written up in papers, it's been common practice for data sources to be included in the acknowledgments of a paper or referred to in the methods section. But this is now changing actually quite rapidly.
05:01
Like other types of scholarly outputs, there needs to be a standardized way of referencing data. And there are with standards now emerging. And we know that publishers are looking at how a data citation can be incorporated into their instructions for authors. And this slide is intended I guess to show the evolution of data citation where previously it's been important
05:27
really just for perhaps the owner of the data to know where their data is and what it is. Whereby these sort of brief descriptions, if you call them that, are perhaps good enough. Whereas now where we're talking more formal citation and sharing, what we need to see is
05:45
something that's more useful to allow for the discovery of data and the access to data. And my apologies to Pat and Ross who I'm sure have never done anything like on the left hand side of the screen. So here we see the same data set professionally published through the
06:06
Griffith University Research Hub. And you can see that there's a nice clear statement, data citation statement. And while it's covered down here, I think where you can see the blue circle, where it's covered in the actual record
06:25
for this particular data set, I've just blown up the actual citation so that you can see it in a bit more detail. So it's been, as I said, professionally published with a very clear statement of attribution that can be used in a reference list like we saw previously.
06:47
Here's the same citation, just blown up a little bit more so you can see it in a bit more detail. Again, very similar format to what we saw in that first slide where the data set
07:01
was referenced in the paper. But what I've sort of also wanted to have a look at here is this DOI that's appended to the end of the citation. So the DOI is a digital object identifier and we'll have a look at these in a bit more detail shortly. But I guess what
07:21
I wanted to introduce here is the clarification that while DOI's are considered best practice for data citation, they're not essential for data citation. They're two different but related concepts. Authors can certainly cite their own data or data from other sources that they've
07:46
reused in their papers without a DOI. For example, using a URL or a handle to link to the data or a detailed description of the data. In this example from Research Data Australia,
08:01
you can see that a handle has been used in the citation rather than a DOI. So there's the indication of how the data should be cited if reused and this number here is actually the handle that's been assigned, essentially a URL. So I just wanted to clarify there
08:22
that these are two concepts that are related but not necessarily interdependent. Let's have a look at DOI's in a little bit more detail. DOI's or digital object identifiers are globally unique identifiers that can be assigned to various resource types,
08:44
including research data and journal articles. And I'm sure many of you are quite familiar with their use in journal article citations. They provide easy and persistent access to research data and to other resource types that they're assigned to. Some terminology,
09:02
DOI's are minted and are resolvable. So by minting we mean creating a DOI and attaching it to a record that describes research data and by resolvable we mean being able to click on the DOI as a link and have it resolve or take you to the metadata page that describes the data,
09:26
including how to access the data itself. Minting a DOI implies a long-term commitment to maintain the resource it's assigned to. So this is to ensure that anyone clicking on a DOI doesn't receive the dreaded 404 message. So the minter of a DOI needs to commit to keeping
09:48
the DOI and associated metadata page current. So for example if there is a server upgrade and the data or metadata is moved, the DOI must be updated so that it remains current
10:04
and persistent. DOI's also important in that they support automated tracking of reuse of data which is sometimes known as or now becoming known as data citation metrics. This works in pretty much the same way as citation metrics for other scholarly outputs such as
10:24
journal articles and I'm sure many of you will be familiar with products like Web of Knowledge and Scopus that track citation metrics for journal articles. Well there are similar services now emerging for tracking data citations and these services largely rely on machine matching
10:42
of citations. So standard citation formats and the use of DOI's make this process more reliable and accurate. You'll see DOI's presented in various formats. In some cases, as in the top example here, you will see the name, you might see the name of an organization or a data
11:04
repository embedded in the DOI. But in most cases they're essentially a pretty meaningless but unique machine readable string. As you can see here there's no real pattern to this except that it's the important bit is that it is globally unique.
11:24
So how do you go about getting DOI's assigned to research data? Well ANDS offers the SiteMyData service which Liz will talk about in more detail shortly. It's a service that's offered free to Australian publicly funded research organizations who wish to assign DOI's to
11:44
research data, software or workflows. And when I say free, I guess it's probably fair to say that it's at no cost to you to mint. There is of course still the associated cost of the long term maintenance of the DOI and the associated resources. Just before we move on to
12:02
hearing more about the service, I just wanted to quickly cover why we care about data citation and DOI's. Well data citation is becoming accepted scholarly practice as data is increasingly being recognized as a first class research output. It's only fair and reasonable that it's
12:23
appropriately acknowledged and potentially rewarded. It's also fair to say that journals are now embracing data citation. Some of you may be aware that PLOS recently announced a new data policy which requires authors to formally publish the data associated with
12:44
submitted journal articles. And what we can see here is the publisher community increasingly coming on board with the concept of publishing data and by association a requirement to cite data. We can also see that research funding will have more emphasis on data access and reuse.
13:06
Some of you may be aware that the ARC earlier this year released new funding rules that encourage researchers to, and I'm using their words, consider the ways in which they can best manage, store, disseminate and reuse data generated through ARC funded research.
13:25
So again data access, data reuse really implies data citation acknowledging the origins of that data. And in the future what we can I guess see in our crystal ball is that scholarly
13:41
metrics are likely to include citations to data. So in the same way that researchers and institutions are now often asked to provide statistics and data around reuse of journal publications through citation metrics there's the potential for the same sort of metrics to apply
14:02
to data. And the tools are emerging now to enable this. And Thomson Reuters who run the Web of Knowledge platform which is commonly used for citation metrics for journal articles first off the rank with a commercial service offering in this area with the release of the
14:23
data citation index in 2012. And finally a key thing is the DOIs and the assignment of DOIs as best practice for persistent access to data products. So I just wanted to finish up with this data citation readiness checklist. Unfortunately we don't have time today to go into this
14:45
in any great detail but you might find it a useful reference to come back to and hopefully Anne's may run some events in the future that will go beyond the basics and start to address some of these issues in a bit more detail. And I'd like to thank Dave Connell
15:02
from the Australian Antarctic Data Centre for sharing this. The AAD were early adopters of DOIs and data citation and so have a lot of experience that they have happily shared with others in the community who wish to go forward in this area. So that's it for me
15:20
and I'll hand over to Liz now to talk about the Anne's Cite My Data service. Great to be here everyone. Thanks Geri. Now I'm going to give you a brief overview of a service that Anne's office called Cite My Data. You'll normally come in to us and ask for this service after you've been through Geri's checklist. You've identified the fact that you have data
15:41
that is citable, that you have a data management system that allows you to identify how to cite that data and that you're willing to take on the assistance of keeping that data. And what the Anne's Cite My Data service does, it enables research organisations to assign the DOIs to research data sets or collections. What needs to be noted is it's a machine to
16:02
machine service. A researcher cannot sign on to Anne's to use a service of the latter form to obtain a DOI. An organisation makes an agreement with Anne's to be able to access our service from one of their machines to us. The clients embed the service usually within their data management workflows. When they've identified that yes they have data they want to keep,
16:24
that they have the information about that data and they're storing it in their own university repositories, that's when within that automated workflow they'll probably access our data to obtain a DOI to go along with the other information they have about their data. It is not accessible for individual researchers to use. We have however developed, Cite My Data
16:45
has been around for a couple of years, we've now developed a user interface for organisations to access to be able to list the DOIs that have been minted for their organisation and to perform various other functions on those DOIs. At this point, and there is no talk of definitely in the future that it will happen, there is no talk of us having that interface
17:03
available to actually mint a DOI. Now more information about Cite My Data and Anne's approach to it can be found at the sitemydata.html.ans.org.au. Now to describe where we fit in Anne's amongst the organisations and within the global aspect of the DOI system, the DOI system
17:22
the Digital Object Identifier provides a framework for persistent identification. It's based on the DOI minting services that integrate with this global system. An organisation must be registered as a registry agency. Now Anne's is a member of an established international registration agency
17:45
called DataSite and as a member of DataSite we can now register data centres with DataSite. Once registered, those data centres can then mint DOIs. The data centre for our purposes roughly equates to an organisation or an institution, a university CSIRO, that would be
18:04
classed as a data centre. The Anne's Cite My Data machine to machine service utilises the DOI services offered through our membership with DataSite but we also added a layer of our own administration and business logic to the organisations who are minting through us. Now more information on DataSite and it is worth a read because it describes not only DataSite and
18:25
its organisation but the importance of DataSiteation quite well. That can be found at www.datasite.org. Once you have gone through your checklist and you know you wish to mint DOIs and you are a not-for-profit organisation within Australia or you're publicly funded, you can register with
18:43
Anne's to mint DOI and how you go about that is you contact our services division and to register you'll need a DOI account name which would normally be the name of your organisation. You can provide us with an IP address or a range of IP addresses or a list of ranges of IP addresses
19:03
of the organisation machine that will be used to mint the DOI through us. Now this is not compulsory anymore as of our last release of the service, however it is used for the authentication to make sure that the registered organisation is the organisation attempting to
19:21
mint DOI. We also need a contact name of a person responsible for the DOI registrations from an organisation and an email address for that person as well. When a DOI is minted for an organisation it has to be known in advance what domain the resolvable URL is in and that
19:41
top-level domain must be provided as well to register the SiteMyData service. It's not only Anne's that wishes to know this but data sites themselves will not mint a DOI with the resolvable URL pass that does not belong to that top-level domain. It's another security feature. Once registered the organisation will have access to mint DOIs through Anne's API.
20:07
However, initially they will only be able to do so with what's called that test prefix. This is 10.05027 at the moment I believe. This test prefix will be used in our production
20:21
system. It will mint DOIs on the data site production system. However, periodically data sites will go through and just wipe those DOIs. It's just a proof of concept for your organisational data management plan. For a client to implement they will use their Anne's provided
20:42
authentication details. When they apply for registration to use SiteMyData a client will receive an app ID which is a 32 character long unique identifier. They will also receive a what's called a shared secret if they don't wish to use the IP range for their authentication
21:01
They'll use a shared secret which will use the authentication of the HTTP service to pass that through and they will access the endpoints of URLs. Now I've provided an endpoint here. It's long and looks quite confusing to most probably. The services.ance.org.au slash DOI 1.1 refers to the version of the service we're using. The word mint refers to the
21:25
activity we wish to do which is mint. The variable response type will be the response type that the organisation which is to receive back from SiteMyData service. That could be JSON, it could be a string or it could be XML. They must also pass along this service point in the URL,
21:42
their app ID and the URL the resolvable URL of the DOI they wish to mint. What they also need to provide to mint a DOI is data site XML. Data site have their own schema of their XML and that XML will describe the data set. It will describe most importantly the title,
22:02
the creators or authors, the publisher which is usually the organisation in which you will reside in and the year of publication. They must provide these compulsory data site XML parameters but you also know that these are quite accurately with the norms of citation.
22:21
As mentioned previously, the client will initially be allocated a data site test DOI prefix 10.5A72 for their testing and implementation. The DOI example that Jerry put up previously was a 10.5, 4.0, 26 probably, a slash, a zero one, a slash and then a unique character. The zero one will refer to the client ID that we at ANZ give our organisation
22:46
that we register. I don't know which client it is but it's not us, we have a zero zero and that will allow people to easily, without using the name of an organisation, to easily identify which organisation has minted that DOI. Now when clients are ready to start minting
23:01
production DOI's because they've proved that their implementation method is true, they'll need to sign with ANZ a site my data participation agreement. The bulk of that will make sure that that data is persistent. It is their responsibility to ensure that that
23:22
DOI is always resolved. Once signed they send that agreement to ANZ and once ANZ has agreed that yes it's all, everyone's working well, then you'll be assigned a reduction DOI prefix. For ANZ we have three production DOI prefix that data site have allowed us and 10.4.26,
23:44
25 and 27 I think. However once minted under the production prefix those DOI's will never be deleted so they always must be maintained and that really is a big component of that data site agreement. How to implement, I've rushed over it I know because I don't know how many
24:01
developers are out there and I'm pretty sure you don't want me to start speaking too many acronyms at you but we have a fairly extensive document on how to implement the site my data service and that can be found on the ANZ website with cmd-technical-document.pdf. Now I did mention that data site XML must be passed to the minting service.
24:24
There are compulsory elements that must be passed in that of that XML. The first one I've listed the URL isn't actually part of the XML that gets passed on the command line parameter however it is compulsory. Title, creators, publisher, publication year are all compulsory
24:41
elements of that XML they must be included. If not it'll just file the schema validation and it won't mint. That's an example of the XML it's just it's letting you know that it's a data site schema that it gets validated against at this instance in here it's version three of their schema and you can see that publication year publisher title and creator
25:04
are all there. It's worth noting at this point that a lot of our first adapters of the site my data were confused that they had to provide to ANZ a different XML schema. They were very used to our riffcs schema which is our way of describing a data set however because we are
25:23
minting DOIs we need to have the data site XML. It is possible in some instances for us to generate a data site XML from a riffcs schema especially if a contributor in their riffcs object have used the citation metadata element and filled it out and people who have done that will
25:45
recognize that the elements within the compulsory elements in the data site schema match up quite neatly with the riffcs citation metadata schema. There is no thought in process at the moment however that we are going to go and run through our riffcs and develop data site XML for
26:03
users. So just a little summary on how you would approach using the site my data service. You'd have to contact ANZ to talk about the service and send the account information described to us. You then have to use your own developers within your own data repository methods to
26:26
automatically from machine to machine mint DOIs through Allianz service. You do so at first with the test prefix. Once that has been proved you would then sign a site my data participation agreement and that agreement would be sent to ANZ and then you'll be given a production
26:48
prefix and be able to make production DOIs for your data sets. As I mentioned earlier due to client request ANZ developed also a user interface into the site my data service.
27:03
What this user interface currently will do is will enable a client to list all the DOIs minted. It'll be able to ensure that the URL that they provided what's minting their DOI is a resolvable URL. They'll also be able to click a button and they'll have in front of them the data site XML for that DOI. They can also update the URLs for their DOIs through this
27:26
application. They can also check that all the DOIs minted by them are resolvable and have a report sent to the email address that was provided during the registration process. They could have a look at their registration details and they can also view the activity log
27:43
of every time they've minted, updated, activated, deactivated a DOI. As of yesterday afternoon we had 27 organizations or clients who have registered with data site via us to use the site my data service and those clients have minted 5,343 production DOIs through our site my data
28:04
service and that's it for me. Great well thanks very much Jerry and Liz. It's great to have sort of a discussion about the advantages of data citation and DOIs but also the nuts and bolts of how that's done. We're now going to have an opportunity for our audience members
28:24
to ask questions of Jerry and Liz. So I'll just read the baton. So first question we've got is Jerry seems to be saying that DOI should resolve to metadata whereas my experience with persistent identifiers, handles, is that they would typically be resolved to the digital object
28:42
itself rather than the descriptive metadata. Do we then need two separate persistent IDs, one for the metadata and one for the actual research data? Jerry is that something that you could offer some advice on? What we would generally recommend is that a DOI would resolve to a splash
29:00
page or a metadata page. What we find is that with research data what people commonly want is for people to be able to see things like license conditions, access options, citation requirements, the sort of information that's contained within a metadata page rather than perhaps taking somebody
29:23
directly to a data set. I mean that is possible but the preference is generally that it would resolve to a splash or metadata page that describes the data and the conditions around the reuse of the data. So you wouldn't necessarily need to have two persistent identifiers.
29:44
What you would do is ensure that the DOI associated with that metadata page that you maintain the resources so that you would still have access to the data and it's probably also worth mentioning that in some cases DOI may resolve to a landing page that actually describes
30:06
a physical rather than a digital object or that describes how to access data rather than giving you direct access to the data. So in some cases you may need to register first, you may need to contact somebody if there may be some ethical requirements around access to
30:26
the data and this information can all be provided on the metadata page. I hope that's answered the question. Thanks Jerry. It sounds like it would have and we have the opportunity if people want to follow up question to that please do type in. I'll move to the next
30:43
question we have now. Do handles as opposed to DOIs have any disadvantages in terms of data citation metrics? That is can either be used in Altmetrics or TR data citation index for example. So what's the advantage of minting a DOI over a handle, over another handle?
31:05
I guess the DOI seem to have emerged as I guess the gold standard or the preferred persistent identifier. If you look around at many of the data repositories you'll see that
31:22
they assign DOIs or that they prefer that you assign a DOI. You can see how well established DOIs are in the journal community. So they've really emerged I guess as the gold standard and therefore it is preferable if you are able to mint DOIs and assign them to data
31:45
that is preferable. In our dealings with people like Thomson Reuters who have developed the data citation index they will accept records that don't include a DOI but they have stated that they would prefer a DOI. I think it's because they are guaranteed
32:04
globally unique they have this implication of long term persistence and maintenance. So certainly for those sorts of providers DOI seem to be the preferred standard. Thanks Jerry. I'll move to the next question again. Are there any issues to consider if you
32:24
have a range of producing organisations in your data collection? So I imagine that means if your data collection involves a number of organisations or institutions through which institution or organisation would you mint the DOI? What we say I guess in general is that
32:48
if the data is the result of collaboration then it really needs to be decided I guess a bit like when you're publishing a journal article who the lead person is or who has the capacity
33:06
to publish and perhaps assign a DOI to the data. So certainly you would want to try and situations where the same data set is published multiple times and assigned multiple DOI's through that process. That's really not particularly desirable. So really in those
33:25
cases it would come down to having that conversation amongst the collaborators to determine who will publish and who will assign a DOI. Happy to talk to people offline if they've got specific questions that don't get covered today if I've misinterpreted.
33:43
Thanks Jerry that sounds like a really helpful option as well. Another question. For a single experiment with several associated data sets is it preferred practice to mint a single DOI or mint multiple DOI's per data set? Is it possible to have sub DOI's? This is a question
34:01
I've heard before so this is obviously a popular one. Yeah this does come up about I guess the granularity of what you assign a DOI to and what we try to encourage people to think about is the granularity at which the data is likely to be reused and therefore cited.
34:26
There may also be some very practical considerations. Speaking to somebody recently who's involved in managing a lot of astronomy data, the practicalities of assigning DOI's and being able to guarantee and maintain those DOI's at a very granular level just
34:47
you know wasn't going to happen. So a decision was made to actually assign a DOI at a higher level. So it does really depend a little bit on the discipline and what the expectations are of the reuse and the citation and it is possible if you have a look at the data site
35:05
metadata guidelines it is possible to cite subsets of data so if there's DOI's assigned at a quite high level you are able to cite a subset within that DOI within that level. So there are some options there and you know there's some also very practical considerations
35:25
around being able to guarantee that that persistence and maintain DOI. Thanks Jerry. That's a great and a really extensive answer to that question which as I said does come up quite a bit. I've just got a follow up to an earlier question which
35:43
you answered as well so just to follow it up because people might still be thinking about it. So just in relation to the question of if you've got a range of producing organisations for your data collection the follow up question being so does that mean that does that mean that the organisation minting the DOI is considered the publisher
36:01
or can you distinguish between the data producer and the data producer or owner and the DOI minter? So the data site metadata schema which Liz described as the mandatory requirements does allow you to describe a number of roles apart from the creator and the publisher and I guess
36:28
it depends so that is possible to describe that through the data site metadata schema. It's also possible to describe that in your metadata page or your splash page.
36:41
So I guess it depends on whether you're wanting to use this for tracking reuse or to ensure appropriate acknowledgement of the owner as opposed to the publisher or the creator. But essentially the publisher is the organisation that takes on that role of
37:03
long term custodianship of the data and the release of that data. Thanks Jerry. This next one might be a question for Liz. If an organisation is already minting DLIs for publications through other means do you think it would be worth separately using
37:21
CiteMyData to mint DLIs for data? Absolutely, mainly because they are linked to a data site. That's where your XML will be stored and data sites themselves are now set up almost to people will go and search there to find information about your data and data topics and it's designed
37:41
for citing data. I would highly recommend that an organisation even if they are making DLIs for journals would still come to CiteMyData or another organisation that may work through data sites to make DLIs for their data. That's a big yes. Next question is what would the format be for citing a subset within a data set? So would it be similar to a book chapter within a
38:12
this comes down to sort of citation conventions or whether there is currently a convention for doing that? Yeah there are there are some conventions that you'll you can have a look at
38:25
some examples at the data site metadata schema and I'm happy to talk with whoever asked that question offline if they've got a specific use case or example that they might like some assistance with. We've looked at examples with people where a subset may actually be something
38:49
what what you're actually citing is perhaps the the the data in the query against the database that so that where the data may be updated or it could be something that's more analogous
39:03
to that example of you know a book chapter or you know pagination within a journal article. It really depends on the type of data and I'm happy to sort of talk about some talk with somebody about specific examples. It might be quite variable depending on on what
39:26
kind of data it is. So if people want to contact that very generous offer of duty to contact her directly she can provide some examples or guidance on that. That seems to be the end of oh no apologies here's another question. Can a single DOI be used for ongoing research data
39:47
or is it a preferred practice to meet a new DOI upon release of new data associated with the experiment? I guess this is a very pertinent question for example in repeated or longitudinal data sets and again quite a popular question. I've heard it myself a few times so I know
40:06
Derry has a good answer to this one. So there's a couple of ways of looking at this and again it's a little bit dependent on the type of data and how you see it being reused and what commitment you're able to make to the you know minting and maintenance of DOIs.
40:23
But in some cases people choose to assign a new DOI if it is for instance a longitudinal study you know cut off that depending on what it is but perhaps at a yearly basis time slice that
40:41
and assign a DOI to that and then you know do the same thing year in year out and package those up to say as an annual data set and assign a DOI to that. In other cases it may be a data set that continues to grow over time it may be so it may not it may not
41:04
be something that you wish to sort of wrap up in a in an annual snapshot and in that case you might assign choose to assign a DOI to the data set as a whole and again ask people to cite the say the date and time that it was accessed when they actually took
41:21
some of the data out to reuse. So again there can be multiple ways of looking at this and we've seen you know examples of both ways being of both ways being implemented. Thank you Derry. This next question I think will fall to Liz. If we are minting DOIs
41:42
through ANZ what will happen if ANZ finds its end? So will we work directly with data site? Yes that's possibly what would happen. Every data center that we have registered with data site do actually have their own then account with data site and as ANZ at the time of
42:03
registration was a member that data site will run on those data centers and they will be able access data sites API to continue to mint as well. We did have a question earlier on and I might just ask for a bit of clarification from the participant who asked that
42:23
question because we're just not sure whether it's in relation to an earlier question. Would it be appropriate to add a comment based on experience for example don't include a readme.txt file? So can you just repeat the question?
42:41
Oh sorry we lost it. Yes yeah it looks like it might have been in relation to another question. Would it be appropriate to add a comment based on experience for example don't include a readme.txt file with the data? Perhaps if the participant who highlighted that question might add a little bit of context around that but
43:06
we could offer an answer and if they're not to perhaps contact us. I mean there's I guess just in general terms there's absolutely no reason why as part of a data collection or data set that you wouldn't include a readme.txt file that actually can be
43:24
quite useful in providing greater context than can be provided perhaps in a simple metadata record and in some cases those readme.txt files you know provide information about software versions that you might need to reuse the data or tools that might be available to
43:46
reuse the data how to cite the data so there's it's certainly very acceptable practice to include that sort of information as part of you know wrapped up as part of the data set and to provide as much context information with the data as you can.
44:07
Thank you Jerry. As I said I hope that's answered the question. We've just got a couple of new questions that have come through. Can more than one unit within an institution or a university get their own DLIs or is it at the university level?
44:24
I'm sure this sort of comes down to the same discussions we have between a group and an institution with RIFs. Yes that would, I'm sorry, that's a almost political answer that you'd need once you'd had the discussion with the ANS services team.
44:41
I think the preference normally is to have one per institution where possible but I think we've seen some instances where there's more than one but I think I think the preference is one per institution where possible.
45:01
Another question, is there a process or requirement for checking if a DOI already exists in relation to those? No if you're going to use my data we do the checking for you and in fact we actually allocate your DOI to you and that non-representative string at the end is always going to be unique as in you cannot mint a DOI that's already there. It is
45:24
possible however if the question is meant that you can have two DLIs that have the same resolution which is obviously something you'd want to avoid and that would have been a mistake in your implementation. I'm not quite sure which way that question meant as in duplicate resolving or duplicate DOI itself just cannot.
45:47
The comment actually from another ANS member in relation to the reading text comment would be which might be helpful as well for the participant. It would be good if the readme
46:00
was made accessible via a URL and then the metadata page could link to it. So like the related info element on the Research Data Australia review. Thanks for that comment and just an update on that question again. So some clarification around the original question was that the participant meant if the data exists in two places it might already have
46:25
a DOI. This one I should hand over to Geri probably. I think we've had discussions similar about amalgamators of data so the data itself may exist in two places. Yeah and if it's already been assigned a DOI in one organisation say somebody deposits the
46:50
data in let's say the Panjaya data repository and it gets assigned a DOI and then someone wants to publish or expose the same data through another repository, perhaps an institutional
47:06
repository. Yes ideally you would reuse the DOI assigned by Panjaya rather than generate a new one. So in that sense I guess if you suspect that the data has already been assigned a DOI or
47:22
published elsewhere it would be worth checking so that you can reuse the existing DOI rather than create a new one. You don't really want you know it's not ideal to have multiple copies of
47:42
the same data published in you know if you think about it in the same way as other types of scholarly output it's not you know it's preferable to not have multiple copies of the same data itself published but you could have many metadata records describing the data that point to a
48:04
reference copy. We seem to have come to the end of our questions so we might wrap up there. If there was any further clarification needed or add-on questions do feel free to contact them and at this point we'll say a big thank you to both Liz and Jerry for offering
48:26
expertise today and thank you to audience members for your questions as well. I think that's really clarified a number of issues for everyone hopefully. Yeah thanks everybody. Thank you.
Empfehlungen
Serie mit 18 Medien