PiDs Short bites #3 - Linking data and publications - the Scholix initiative - 21 Jun 2017

Video in TIB AV-Portal: PiDs Short bites #3 - Linking data and publications - the Scholix initiative - 21 Jun 2017

Formal Metadata

Title
PiDs Short bites #3 - Linking data and
Title of Series
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2017
Language
English

Content Metadata

Subject Area
Abstract
When a CSIRO dataset is mentioned in a journal article by a third party, how can CSIRO find out? How can the CSIRO scientist find out? When several journal articles have already been published using a dataset from the University of Melbourne, how can a prospective new user of that dataset find them? When a research facility’s data archive knows that one of its datasets underpins the conclusions of a journal article, how can the journal publisher’s site link to that dataset? These are the questions that Scholix, a new global initiative, is addressing. About Scholix This webinar provides an introduction and overview of the Scholix (SCHOlarly LInk eXchange) initiative: a high-level interoperability framework aimed at increasing and facilitating exchange of information about the links between data and scholarly literature, as well as between data. The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and global service providers. The framework is a global community and multi-stakeholder driven effort involving journal publishers, data centers, and global service providers. Speaker: Dr Adrian Burton, Director, Services, Australian National Data Service Of interest to anyone but especially: -- Research data managers -- Data librarians -- Data technologists -- Publishers and journal editors managing research data
Web service Keilförmige Anordnung Service (economics) Information privacy
Building Link (knot theory) Information Observational study View (database) Set (mathematics) Bit Sphere Number 10 (number) Web service Data management Goodness of fit Software Internetworking Telecommunication String (computer science) Internet service provider Data center Endliche Modelltheorie Traffic reporting Resultant
Goodness of fit Software Closed set Factory (trading post) Data center Parallel port Set (mathematics) Bit Endliche Modelltheorie Mereology
NP-hard Area Dialect Group action Service (economics) Information Interface (computing) Internet service provider Set (mathematics) Bit Number Formal language Telecommunication Internet service provider Finitary relation Data center Endliche Modelltheorie Quicksort Object (grammar) Descriptive statistics Physical system
Group action Implementation Service (economics) Source code Set (mathematics) Open set Mereology Number Uniform resource locator Medical imaging Web service Mathematics Natural number Extension (kinesiology) Data type Source code Information File format Software developer Internet service provider Bit Testbed Repository (publishing) Personal digital assistant Web service Data center Website Video game Quicksort Object (grammar)
Web page Subject indexing Focus (optics) Information Real number Data center Set (mathematics) Database Quicksort Cartesian coordinate system Computer architecture
Subject indexing Dependent and independent variables Service (economics) Identifiability Information Query language Structural load Set (mathematics) Database Abstraction Traffic reporting
Web page Mathematics Identifiability Standard error Line (geometry) Chemical affinity Civil engineering Cellular automaton Data center Sequence Number
Fraction (mathematics) Binary file Sequence
Group action Greatest element Identifiability Information Projective plane Numbering scheme Bit Mereology Metadata Testbed Data center Website Information Endliche Modelltheorie Routing
Slide rule Group action Identifiability Service (economics) View (database) Channel capacity Mereology Likelihood function Metadata Formal language Measurement Different (Kate Ryan album) Finitary relation Species Descriptive statistics Standard deviation Information File format Keyboard shortcut Bit Line (geometry) System call Type theory Process (computing) Extreme programming Repository (publishing) Blog System programming Data center Quicksort Routing Data structure
Web page Intelligent Network Greatest element Group action Information Open set Protein Mereology Number Computer configuration Internet service provider Universe (mathematics) Website Physical system
Type theory Group action Link (knot theory) Set (mathematics) Website Information
Focus (optics) User interface Information Personal digital assistant 1 (number) Website Principal ideal domain Object (grammar)
Communications system Information Moment (mathematics) Bit Mereology Event horizon Number Event horizon Query language Operator (mathematics) Data center Information Endliche Modelltheorie
hello everyone and welcome to today's and webinar on linking data and publications the scolex initiative so let's get started my name is Natasha Simons and I'm from the Australian national data service for M and I'm your host for today so I'd
like to introduce our speaker for today dr. Adrienne Burton Adrienne is the director of services for ants and he's based in Canberra I'll now hand over to Adrienne thanks Patricia and today we're talking
about the scholars initiative as natasha indicated this is about linking data and literature and why literature remain this scholarship with other scholarly communications could be journal articles books reports and why data we really quite broadly thinking about data data sets of services models software etcetera so they actually is a story it's a story of a lonely data access portal way up in the southern hemisphere in Australia a very important data set that's held in there starts at D that we can see it's a very important research asset and the data access portal is pretty pleased that they've been able to make it available as well as method for further research as it happens a half a world away in the Journal of studies on the other side of the planet someone has published an article that's article a and it actually has a reference to mentions the famous that's that D from the data access portal which is really good news because that's why they started the data access portal to have build new research on on the old research to return investment on the West moving data and to spark innovation and new research so this is really good news that a journal article has been written based on the data access portal the problem is it's a very long way away and in fact that people over the data access portal have no idea that this journal article has been written so all the good works have come to fruition that there's no way for them to know about where and which articles may have referenced that so the manager of the of this data center says we really do need to know you know what's happening with our data we really need to know when it's been used in research so Sept a couple of information professionals are footman says ok we need to sort this I want you to search the internet I want you to get access to all these journals I want you to get the full-text we need to mine through all these journals and we'll put the the title of our data set that'd be there you know people are using it we need to find it somewhere in Discovery Communications and you can build tools and so you know for a number of years they've got you know three information professionals scouring the internet building tools building up a view of what it means now there are lots of
journals but the provider here this sent out tens of thousands of journals to get an idea of how many journal articles there are CrossRef has over 75 million do eyes so that's at least 75 million others journal articles and then there's a lot more scholarly literature than that so it's quite a big job and over five years really they put in a lot of work they've got some cool text access to the particular journals that I know they're particular scientists we publish it but not terribly satisfactory result because it's still not really very comprehensive and they can't be sure and it's really a bit of a coat hanger in string rather than isn't terribly robust the other problem is that there are lots
and lots of data centers as well and they're also putting in exactly the same if it all over the place scouring searching trying to get access trying to build tools so this part of our this chapter of the story comes to a close here in a little bit of an ancestor factory Swift I'm really looking for needles in haystacks all over the world in a parallel chapter there's another
story a different journal here often some remote parts of the world they have published another article article 18 which is like that's what you almost should do but people have said to them wait a minute this is really good research we'd love to be able to get our hands on the data that underlies that underpins the findings I'd like to look at the models we'd like to see what software we'd like to see the data but there's no mention of it in your journal so I think right we need to do something about this we need to start contacting that census as it turned out there is a data center in Australia where a data set has been deposited and they even mentioned the fact that this data set underpin the the research that was published in that journal and that journal article a so that's the cruel and bitter irony of this is that actually that ever had been deposited somewhere but the journal don't know that the journal starts think right we need to start establishing these
bilateral relationships with a number of important data centers so that we can find out whether the journal article that we have have been mentioned in the descriptions of any datasets anywhere in the world again this is even more sort of cottage industry because each data center has a slightly different interface expresses the information about a link between the data set and literature in a slightly different way this really is a hard slog and there's lots of data centers centers obviously and so that means it's a lot of these individual bilateral arrangements that need to be made to try and find again this needle in the haystack of which does they might have data that underpin Belgium and as we saw there are lots of journals that's rather publications around the place and they're all trying to do again all these either not doing it that's just too hard or if they do then they're replicating all these model bilateral regions separately again so the second chapter of our story comes again to a rather sort of unsatisfactory end where we got a little bit of a view because of the links between data and literature but not really a very upbeat ending enter stage left colleagues trying to move into that center area a whole set of players in the scholarly communications world trying to see whether we could do this a little bit better it's a working group sponsored by the research data Alliance and world data system it has a number of publishers peek publishing bodies data centers service providers infrastructure providers I've all come together in this working group to say look we really should be able to do something a little bit better here first steps that happen is so know what lets you all have at
least a common idea of what's happening here and some common language really or what we're talking about is quite simple there are two objects in the scholarly literature once that second one is piece of literature they are linked and there is a relationship between the two we get that information from some of the players in the scholarly system so the working group starts to build up you know at least a set of common language so that we can start to attack this problem that you know that in a shared way now going back to that very messy exchange of information as it turns out
some of the members of this working group are lateral hubs for this kind of information so crossref collects all sorts of references from journals all over the world thousands of journals actually a providing information needs CrossRef about the references from journals so there's already a kind of a natural community out there in cross rec that that was another of the members of the working group and they were already receiving information about that fits and one of the pieces of information who provide the data site is I related identify which means I related the piece of literature in lots of cases so data site will already have this relationship with hundreds and hundreds of data centers around the world and they're collecting that information over there with another example of a global aggregator of information from institutional repositories and these institutional repositories do contain data and literature and sometimes they know about the links between those data sets and literature or vice-versa so we already had some natural community hubs that could at least tidy up a bit of all of that cross information so the idea was if some of these natural hub could then simply exchange information between them then that would simplify always one-on-one relationships that we saw earlier in the story so that was what was proposed at least as a start off now quickly they're not there in these communities in the world and maybe colics initiative is open to new hearts and new communities who can bring their information in the idea was it's not that all the thousands of data centers and all the different journals in the world they don't all need to be exchanging information when there are some answers the hub's can just exchange the information in that makes life easier for everyone so they did a great
in these community and that there's a scholar link information package that they agreed on and so there's now away from in these big community hubs to exchange information I'm going to all the details of this if you would like to have things John the working group that we can get all this information but very minimal information about the two objects the source of target one for example being a journal and the other being a data set and you've submitted the basic information about the two objects and a little bit of information about the link itself so that was part of the workings of this working group to agree on an image change format that can be interchangeable ways currently they agreed to exchange that through some very simple open API using Jason so once their information can flow between the hubs we were able to establish an aggregation of that information called the DLI service extensive data literature interlinking service where our colleagues at open air and currently did all the development for this it's the first of an aggregation all this information is open information so we're encouraging lots people to aggregate potentially for a domain or for a community but this D live services the first of global aggregation of this information that it's there really to to push forward with a number of the testbeds implementations so the VI service aggregates information from those hubs and so now we have a much tinier kind of
architecture so what does it mean now for those two very sort of unsatisfactory stories that we started with one of those stories you remember was that there was a data center that we were about the link between a data set and the journal articles that have been published by this journal like the journal didn't know about it so what would that mean in this new world here's
an example a real world example of this Scopus this is a page from skyfirst focus is not a journal but it is an extract in the indexing database that cause information at that journal publication so here they've put some information about a particular journal publication that had been published previously they as nobody had any idea about me or what data was linked to this application than the journal certainly didn't know that so there's no way that Scopus would've known that so I just added in a new
entity here and extracting in indexing database you've abstracted this who index the nation about that Journal article but they still have no idea of whether there is any link to the underlying data so now it's possible is for them to fire off a query to that service and find out actually there is a link to a data set somewhere and now we can provide that link so now what happens when you are in scopus as they
load with Asia they probably far off that will clear into the DLI service it's based on this is an identifier of the journal article the deal one is Farah didn't say do you know are there any data sets that are related to this journal article and as we saw the report the response came back and said yes there is so now there's a little information panel in scopus books and the title of this data set and you can click on it and go back to the University of Adelaide so that's a much happier ending collectible story and as you can see it's a nice little panel that's on every appropriate age within SCOTUS do you remember the second story
and that was where a journal published an article there were the reference to some data but the original data center actually didn't know about that so how would that looking under this new arrangement so I'm going to be example
here of a data center its GenBank they're very going to be Gary there because they're very modest about how they market until here so this is a GenBank page a gene sequence as much as I know about this being a linguist they all look like what are they good to me that something that nice so there's a gene sequence here for something about nice now important thing to note here even predominance and once all about is that there is a identifier for this data the this sequence is the reference sequence number and then 0 101 85 so in and around new arrangements the data
center that I've portrayed over here they've got this data about the gene sequence that they might have to ensure for which published literature has a reference to so they can now think that this is an identifier across and the reply comes back saying yes actually we do have a journal article but as a reference that sequence so there it is
it in the denomination veterinarian you know immunology and immunopathology and there are two references to that sequence in there so that again is a much happier ending so if we think that
is a happy ending and people you know we're now encouraging people to so I should pause here and say that this is these have been outlined the projects and test beds and part of the working group I think there's a pretty good model where that can work and the first step is to get a bit more coverage into these scholars information ecosystem so how do we get information in there the
good news is remember we said that the individual data centers were thousands of them and the thousands of journals at social no one needs to change what they're doing there if you have for example a relationship with data sites and you just simply need to add this little piece of information to the dust like metadata it's a related identifier there's the the DOI and that will be included into the Scott ecosystem information on that is in the dark side scheming militants down the bottom route if you're a journal again
you don't need to do anything different from what you're doing now you're already giving information to process in likelihood and there are a couple of different ways which journals can give references to cross through a lot of money of the standard sort of citation format the top one is in you can call related item which allows you to divert a slightly more and richer view of the related item is and as you can see it's got the identifiers and little description then but this is all standard thing to note here this little stand and prosperous exchange metadata there's no new language to learn you my new information pathway either that you just use these the existing pathway that you have to cross through again there's more information on risk from CrossRef how to how do you deposit your data citations as a very helpful blog on that and a little bit of that slide I won't
go same thing applies for open-air and institutional repositories what I will just mention is if you're in Australia there is a shortcut method that we have because and has been part of this working group the ends researched our Australia service is a mini Scala Club and all the data collection descriptions that ends has been researched other Australia pumped into the scolex information world if they have a related publications so in another way for a very easy way for Australian repository and Madison's manages to do this just included here a related information type of publication using the standard with CS exchange that we use with all the data centers in Australia and you can include the identifiers at high calls and notes etc and so you can get some pretty nice information in there the free of charge by the exact same route would you use for violating researched out of Australia the URL for the information that related info and after at that to your feet you stand on this line and just to go back do you
remember the Journal article that appeared since the die set that appeared in scopus it was this particular one from the University of ma the molecular simulations of proteins and peptides adsorption so that was the exact thing that is currently being displayed ins in the Scopus page for the publication and the information has come from this research down in Australia page you see the bottom where the related publications and because of the system we just take that information and push it into scholars so I don't know whether the one from University of Adelaide watching today that number hoping with there then will describe that this information is just as indicated in and there goes through all the providers to search out of Australia in Australia alright so if you're not included in any
of those paths so potentially you don't have a relationship across earth or data site or and or open air or etc of any there's always the option of becoming to help yourself and we encourage you to join the working group and collect information from your so if you are a specialist astronomy dust and for example you may not have a relationship with a psychic but your own identifies if you can just join the working group and start to explode that information you will be aggregated and be part of this new ecosystem so that's another option now how to get information back
from the scholarship ecosystem but another interesting question really this is being developed as we speak nurse are probably better leader in the working group if you want to do this but I'll quickly go over that so we've got the broader idea there is the
DLI website you can go and type stuff in it also has a set of API that have been
developed Strega site will give you the information about the different methods that you can use the ones that we showed here in focus and other people are using our is links from PID so we provide a PID and it will return you with all that research objects that are related that the ID please join the working group if
you're thinking of using that information because it's it's been optimized as we speak and your use case can help to design what those api's olivia's can be there's some stuff up coming from both crossref and our site
they're exposing their event data using this column's model so those will be more community focused queries where they'll only cover the the deal live side of things so that's wrecking that I think that's a little bit of a happier ending to the story of linking literature and data I should make it clear at the moment that here these are pathfinder it's not comprehensive the aggregation of information is not comprehensive yet it's not fully established but we are levering from very established global infrastructure in that slight crossref open there and the number of operational data centers and journals around the world so I think there is a really good model there that can be made commented and can come an established part of their scholarly communication system liking further
information then it's colics website and and also has some information about working with scholars so I will pause
there and hand back to the tesha feel very much Adrian that was really great
Feedback