PiDs Short bites #2 - Identifying and linking physical samples with data: using IGSN - 7 Jun 2017

Video in TIB AV-Portal: PiDs Short bites #2 - Identifying and linking physical samples with data: using IGSN - 7 Jun 2017

Formal Metadata

PiDs Short bites #2 - Identifying and linking physical samples with data: using IGSN - 7 Jun 2017
Title of Series
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
This webinar: 1) introduced the IGSN, outlining its structure, use, application and availability for Australian researchers and research institutions 2) discussed the international symposium "Linking Environmental Data and Samples". Speakers: -- Dr Jens Klump, OCE Science Leader, Earth Science Informatics, CSIRO -- Dr Lesley Wyborn, Adjunct Fellow, National Computational Infrastructure Facility and Research School of Earth Sciences at ANU About IGSN Do you or your institution collect physical samples in the course of research? Are you interested in how you can reference these online using a world standard, globally unique persistent identifier scheme? The International Geo Sample Number (IGSN) is a unique identifier that preserves the identity of a sample even as it is moved from lab to lab and as data appear in different publications. It applies to any physical samples, not just samples from the earth. This webinar is the second in a series examining persistent identifiers and their use in research.
Geometry Service (economics) Identifiability Number theory Uniqueness quantification Numbering scheme Physicalism Demoscene Sampling (statistics) Integrated development environment Series (mathematics) Series (mathematics) Sample (statistics)
Ewe language Link (knot theory) Identifiability Sample (statistics) Series (mathematics) Information Series (mathematics) System identification YouTube Computer programming Row (database) Neuroinformatik
Dialect Link (knot theory) Identifiability Principal ideal domain Neuroinformatik Line (geometry) Mereology Cartesian coordinate system Field (computer science) Sampling (statistics) Computational physics Digital photography Sample (statistics) Personal digital assistant Series (mathematics) Integrated development environment Data structure
Link (knot theory) Identifiability Mapping State diagram Java applet Uniqueness quantification Number theory Virtual machine Analytic set Database Menu (computing) Database Mereology Analytic set Mereology Rule of inference Arm Revision control Sampling (statistics) Heegaard splitting Sample (statistics) Repository (publishing) Uniqueness quantification Sample (statistics)
Greatest element Identifiability Computer file Variety (linguistics) Multiplication sign Materialization (paranormal) Water vapor Metadata Field (computer science) Element (mathematics) Green's function Repository (publishing) Uniqueness quantification Sample (statistics) Physical system Link (knot theory) Inheritance (object-oriented programming) Number theory Metadata Computer network Library catalog Cartesian coordinate system Element (mathematics) Sampling (statistics) Sample (statistics) Internetworking Software Repository (publishing) Data center System identification Quicksort Physical system Address space
Mobile app Logistic distribution Motion capture Library catalog Data storage device Public key certificate Field (computer science) Metadata Derivation (linguistics) Sample (statistics) Address space Link (knot theory) Arm Number theory Video tracking Data storage device Metadata Analytic set Bit Motion capture Library catalog Software maintenance Digital object identifier Sampling (statistics) Process (computing) Sample (statistics) Repository (publishing) Video game System identification Website Resultant
Ocean current Implementation Link (knot theory) Coalition Number theory Client (computing) Image registration Tracing (software) Independence (probability theory) Reynolds number Architecture Sampling (statistics) Sample (statistics) Implementation Sample (statistics) Resource allocation Physical system Spacetime Physical system Self-organization
Point (geometry) Group action Link (knot theory) Video projector State of matter Number theory Archaeological field survey State of matter Materialization (paranormal) Archaeological field survey Water vapor Image registration Image registration Sampling (statistics) Sample (statistics) Repository (publishing) Personal digital assistant Repository (publishing) Sample (statistics) Arc (geometry)
Geometry Trail Service (economics) Archaeological field survey Metadata Data management Different (Kate Ryan album) Sample (statistics) Library (computing) Collaborationism Service (economics) Link (knot theory) Information Software developer Archaeological field survey Projective plane Computer program Metadata Directory service Open set Hand fan Sampling (statistics) Sample (statistics) Personal digital assistant Natural number
Area Implementation Link (knot theory) Sample (statistics) Physical law Self-organization Solid geometry Endliche Modelltheorie Implementation Perspective (visual) Resultant Self-organization
Identifiability Link (knot theory) Software developer Code Cumulative distribution function Image registration IP address Data model Architecture Hierarchy Data Encryption Standard Repository (publishing) Endliche Modelltheorie Physical system Computer architecture Collaborationism Dependent and independent variables Link (knot theory) Namespace Digitizing Number theory Projective plane Image registration Number Sample (statistics) Software Repository (publishing) Website Codec Physical system
Point (geometry) Suite (music) Identifiability Proxy server Real number Archaeological field survey Translation (relic) Image registration Sign (mathematics) Personal digital assistant Sample (statistics) Resource allocation Proxy server Link (knot theory) Scaling (geometry) Inheritance (object-oriented programming) Mathematical analysis Bit Image registration Sampling (statistics) Process (computing) Sample (statistics) Integrated development environment Personal digital assistant Data center Self-organization Forschungszentrum Rossendorf Arithmetic progression
Building Implementation Identifiability Service (economics) Link (knot theory) Software developer Event horizon Internet forum Object (grammar) Integrated development environment Endliche Modelltheorie Office suite Implementation Traffic reporting Domain name Service (economics) Link (knot theory) Software developer Expert system Internet service provider Expandierender Graph Sampling (statistics) Keilförmige Anordnung Explosion Sample (statistics) Integrated development environment Internet forum Office suite Object (grammar) Arithmetic progression Domain name
Web page Link (knot theory) Link (knot theory) Information Web page Expert system Virtual machine Device driver Device driver Semantic Web Sampling (statistics) Sample (statistics) Interpreter (computing) Set (mathematics) Self-organization Resultant
Point (geometry) Presentation of a group Divisor View (database) Water vapor Function (mathematics) Mereology Perspective (visual) Theory Neuroinformatik Formal language Twitter Element (mathematics) Mechanism design Pattern language Energy level Cuboid Software framework Process (computing) Physical system Domain name Link (knot theory) Sound effect Term (mathematics) Evolute Element (mathematics) Degree (graph theory) Process (computing) Sample (statistics) Time evolution Function (mathematics) System programming Codec Pattern language Moving average Pressure
Discrete group Group action Multiplication sign View (database) Materialization (paranormal) 1 (number) Water vapor Parameter (computer programming) Open set Image registration Perspective (visual) Computer programming Mechanism design Mathematics Core dump Data conversion Series (mathematics) Sample (statistics) Physical system Area Graphics tablet Collaborationism Email Bit Type theory Arithmetic mean Digital photography Process (computing) Drill commands Self-organization Website Species Spacetime Point (geometry) Implementation Sine Vapor barrier Service (economics) Identifiability Pay television Connectivity (graph theory) Similarity (geometry) Digital object identifier Rule of inference Metadata Latent heat Goodness of fit Intrusion detection system Business model Data structure Tunis Addition Standard deviation Information Uniqueness quantification Number theory Consistency Cellular automaton Projective plane Total S.A. Basis <Mathematik> System call SI-Einheiten Sampling (statistics) Uniform resource locator Kernel (computing) Spring (hydrology) Personal digital assistant Video game Object (grammar) Family Library (computing)
welcome to today's webinar which is on the topic of identifying and linking physical samples with data using IGS MV International geo sample number so let's get started my name is Natasha Simons and I work with the Australian national data service and I'm going to be your host for the webinar today my colleague Susanna Sabina is in Canberra behind the scenes and co-hosting the webinar with me so this webinar will look at how you can reference physical samples online using a world standard globally unique persistent identifier scheme the International geocell port number as well as discuss the international linking environmental data and sample symposium which was held last week at the CSIRO Black Mountain laboratories in Canberra this webinar is the second in a series
examining persistent identifiers and their use in research the first women of webinar we looked at citing grey literature using do is and the recording of this is available on the ant youtube channel the third in the series will look at how thick publications and data through the international scholar initiative so I would also like to
acknowledge the Commonwealth Government for their support of and under the en Chris program I'd like to introduce you our speakers for today dr. Leslie wyborn who is an adjunct fellow with the National computational infrastructure facility and research school of Earth Sciences at the International University in Canberra and dr. yen's klump who's OCE science leader earth science informatics for the CSIRO middle resources and based in Perth I'll
now hand over to our first speaker dr. Leslie work on alright so what I'm
starting out with is I'm going to do the first part which is about identifying samples it's something that's dear to my heart because on you know I used to be a field geologist and I have collected thousands of samples in line career so how we're going to organize this is
introduce the ideas and identifiers for samples outlining the application availability for researchers and then Yentl handover to the global picture and given up out on the symposium that was held last week so what we want to do is you know what is it and how do you use it how do you get an idea of things and the science use case is really typical examples are the first five helpful scientific research in my opinion lot of outdated dialectal falafels and hence photo op applications so why do we need
unique identifiers for samples part one and you can see on this map comes Kirsten manis is that in the earth CEM database run by the NSF you can see all the samples that are labeled in one from everywhere and in probably stands to me number one and so what we find is we start doing revisions of samples this is a very common problem a second problem that emerges is again I do a lot of
analyst to do a lot of analytical work and it was quite common to go and get a sample and from that decide to use expansive split and quite often the sample split to given different numbers always put on a different machine it was given a different number or the sample was given to somebody else and they went and reliable it according to their in house rules and repositories and so here is all different names for a highly valuable sample from a cruise in the Pacific and so as we are moving now towards data aggregation we really need to be able to uniquely identify samples and analytical data and publications derived from these samples and this was a Java behind on person minutes in the US which others I'm getting involved in trying to set up the unique identifier session for samples so what the idea
then does is it provides citizen identifiers for guaranteed to be unique hierarchical system it's all silicate internet-based discovery and access to digital file all variety of applications and programmatic access to sample metadata catalogs and it helps networks example repositories and data centers it ensures preservation of an access to sample data it in those aids in the identification of samples in the literature and there is one that you can actually click on um if you such a time so what could it be use of I GSM Center
international Geoscience recycle number but increasingly we are finding it being used for water biological materials and all sorts of things we use for collections ripping examples or for example feature such as a borehole or at outcrop and sample convenient to each other through the as related identifiers metadata elements so that green thing down the bottom is actually a rough hollow mineral called olivine and above it you can see that we often take minerals separate out of those rocks or with some great solutions and so you bring that one thing from the field which probably cost you a fortune to collect and through our GSN mission which the parents are all derived samples that come from it it enables you to select the sample
lifecycle so in sorrow its use for planning examples and to support sample logistics so starting out in the field and we call it our G essentially the birth certificate you have on our biggest identification and metadata capture with the mobile app and it's been given the igf n in the field then as we take that sample into the lab you can identify all the derivative processes and analytical techniques to apply to that sample and tie the data to it and then try and meet the sample goes into the repositories and we can trace it through unselect uns and samples in storage catalogs and maintain sample logistics so when a sample is sent out to another a museum or another - its arms like I say like when you're born get a birth certificate now what we're enabling examples as there are collectives to be given the first certificate that goes through them with them through life as the end we'll
explain it closed on the VII data site and so what we can do now is we've got the specimen idea then we then link it to the spectral results and finally we leave it through to the publication and this address and number has attracted a fair bit of attention and it is endorsed
by the coalition for publishing data in the urban space sciences and you know severe and competitive of science journals you are encouraged to put the ideas and number on samples that you cite in the literature and so again with digital age you can get interested in the particular specimen and trace through its history and anything else that has been done on that specimens so
is a system review what you do is you register a sample with what we call an allocating agent the elephant agents and registers the sample with IGS then in Z which is the international implementation agent as you can see here there are three current allocation agents in Australia CSIRO to South Australia and Curtin University so what I'll do next is I'm take you through how these agencies are using it in different ways so far I became a member of our G
pen in 2013 and I currently use it for the repository of the cross the research group over in Perth and it takes Little Rock's synthetic materials in a Capricorn digital project using it for water vegetation soil rock and regolith and farro is looking now to use it for their soil collection in Canberra and their insect collections so that's why we kind of refer to it as the agency are not the International gear sample numbers it certainly is getting big news more pagoda Geoscience Australia the
second largest collection in the world registered charcoal one point six mineral million samples covering minerals mineral mineral separates rocks inflections at microscope slides of rocks and fossils Geoscience Australia also is about to be they're not already the registration agent for the geological surveys and the states and territories I Curtin University has a
different um use case and they're using it more as we mentioned earlier tracking sample sample switch through directories and I like to acknowledge fans for those sponsors the development of this project in collaboration with the Curtin University library and the sorrow you know Geological Survey was in Australia are actually working together with it so as I said we've got the three agents um Curtin is only operating for Curtin University and I hoping to expand that so it can become more available to the rest of the research community and so again we were able to get some more funding from increase through the
research data services projects and she made it possible to develop a demonstrator for a common geo sample portal which you can actually see here and so metadata shown the three on Asians is harvested your common metadata portal to discover on samples created by any Australian ideas and member and Australians have agreed to a common metadata schema even those not quite a diversity of samples and so if you are hoping that as it grows if you want to find any information about a physical sample this will be the place you go I will now hand over to dance or tape it
into a global perspective and also a burst of some of the technical issues and results from North symposium that we held last week which was about trying to actually extend it into the environmental areas that is from its original temples in the solid earth sciences thanks Leslie yes I like to
start with saying may all your problems be technical usually technical problems can be overcome the it's also a whole social network behind more technical solutions and this is where the global perspective comes in so the I just in implementation organization is the body that we created to carry this on the global stage it's an organization it's a charitable organization incorporated under German law and registered in Pakistan and Germany a president has 19 members on 4 continents the governance model meant
less he already mentioned is a so-called hierarchical model our rightful delegation you can think of it in the way that you assign IP numbers and they internet in the network and the I just in identifies themselves are registered through the edges and agents and through make sure that there's no overlap in numbers each is an agent is given them so called namespace for registrations of ideas in them as an example all I just and registered by Geoscience Australia start with a you and then after that it's up to Geoscience Australia to make sure that these identifiers are unique CSIRO starts identify Brooks des so we delegated some of that through the Catholic on digital footprints project and gave them names place yes VAP and then after that that their responsibility to make sure the names are unique they don't interfere with any other projects or infrastructure since us our own using IG essence technically
I just unbolt on an existing technical base and community not the the data site model so we basically drone they decide to use their technical base which is ultimately based on the handle system fire persistent identifiers but also a lot of the governance and how this is run is based on the example of data side and we work with them very closely also to see that we align our techniques architectures and to make the collaboration and interlinking as easy as possible there's two links here are through our technical on documentation and to our code repositories on github so the status are
I just in on the global scale is it's a work in progress but we have active registrations agents in Australia suits are all currently niversity Geoscience Australia but also just in Pakistan the German Research Center for geosciences and the data center for the environmental sciences at Columbia University the data center for marine and environmental sciences at the University of Bremen and then sometimes it's difficult for government institutions to join another foreign organization so the German Geological Survey BGR and the US Geological Survey have some technical or legal issues to join this organization so they register I just ends by proxy through other allocation agents the interesting story
we're hitting now is that we're not only identifying samples within one institution but we are now moving samples between institutions and this is where the real value of edges n becomes visible that we already mention the case of the translator Center at Curtin University and here they have adopted edges and so in this case if the sample has an idea and it will be carried through the process and any data that can from the analysis or processes are linked with this already existing identifier if the sample is not yet identified that is and the generator Center signs it and I choose and the other case is something let me mention that already that sometimes you take subsamples and here becomes a bit more complicated it depends on where this is done by whom and where the sub sample then resides so I won't go into details now because that is something that needs to be discussed for the particular use case the important point here is that any sub sample should be identified with it by its own ideas and to make it uniquely identifiable as well and then make you to the to its parent sample so what's
happening next and what we saw at the symposium last week is that we have already made good progress building a developer community around I just end but that needs to carry on further we and document this practices to show how it can be used and also build reference implementations or services that others can test their services against and the next steps which we are really taking is expanding through link through identifying and linking objects and other domains not only in the geosciences but ultimately what we want to see is that other events start reusing the ideas and technology so maybe not ideas n in the strict sense through the existing organization that as we copied data side and other domains might copy IGS N and technology and government model for persistent identifiers their specific domain so that's what I want to say
about a day's end from my side and I want to give you brief report back on the supporting we had last week called linking environmental data and samples this was a cutting-edge science symposium which got its seed funding from the CSR L research plus office and the goals were to bring international researchers leaders to Australia and also to provide a forum for early career researchers to engage with others and with the and international experts so we
have a web page and probably easiest to note the short link and Google um shopping but we also besides the seed funding from Cesaro and received sponsorship from other organizations the Australian Bureau of Meteorology Geoscience Australia the US National Science Foundation the earth science information partnership in the u.s. NCI Atlas of living Australia Ostrow and tan which those organizations may need funded travel for international experts what we discussed this symposium
was the science drivers why are we interested in linking anything with something to Semantic Web technologies and that's because we have the rich results of samples that support scientific investigation and we wanted to discuss and we did discuss how we link these to the datasets that were derived from the samples and then how can we link samples and data to the literature where they were the samples are interpreted and put into context and that but not least how can we include machines as users why do we want to do that because our body of data information knowledge is growing at a much faster pace than any of our minds can comprehend and machines can be very helpful in trying to find things in these very very rich holdings to me it
was also an important point not only to discuss the theory and the future perspective of linked data but also to forget their solutions can we get it to work and so we discussed what is the role of infrastructure for building the linked data Federation and how can we support the evolution of linked data what we saw is that heterogeneity is inherent and we have to have mediation mechanisms we cannot build one thing for all and this raises one question is that is how precise do trends need to be because the commonly held wisdom is that computers don't understand ambiguity so you have to be a 100% precise but that we cannot achieve so we have to suffer some degree of imprecision but vocabularies that are useful will be adopted that is something that we can already see but to distill what is the essence is that we have a fabric of science where we ask which elements produce output we have the process of science how are these outputs produced and we have a language of science how do we describe these elements this is something that needs to be where we have to find solutions at differing levels in the linked data framework as I mentioned
at the start of my part of this presentation there are social and community factors and to get things like this working and Paul box from CSR roll landed water said that the greatest organizational effectiveness is achieved when technology systems fit social systems so it's not that you build it and they will come but it has to support the processes that already exist that will make it more likely to have success certainly crust pressures who bears the costs and what would be the incentive to contribute and we've been building things in this domain for a while and so it was also very useful to discuss the pale patterns and we identified two major failed patterns the one is the anti-life of frying pattern that I am different so that's why I have to do things differently that can lead to failure and the other one is the too-big-to-fail pattern where something should have ended long ago but we've invested to me too many resources and so everybody is embarrassed to pull that like it would have been better to allow things from to fail quickly and start with a fresh view thanks very much yen's
Thanks yen's and Leslie just while people are thinking about questions perhaps you could give us an idea of how many IDs ends have been assigned and what types of samples those have been assigned to so in the grower total we're approaching 6 million I just ends most of those are geological materials but we also have an increasing number of water plant materials soils and also places the ball is not an object it's something else but since the material that's coming from above very tightly coupled through that feature we also identify that feature ok research is generally a bit more familiar with their lives can you put forward a few arguments that you would put to research that why they would select an idea sin over a deal why reason that's one historical reason that where when we decided to go for a global system catalyst back that was before data set existed and Gi be handover was running the show they they pushed us back and said it is a really great idea but it's out of scope so we went our own way and then that we discovered that there are specific evidence issues in how we create these identifiers and resolving mechanism and what metadata we use that are quite specific and not well covered by the more bibliographic world of of DOI but dhilae data sides are changing they're changing the business model they're changing the way things are run so we are in the conversation and let's see how things develop in future the systems are technically compatible so maybe we will merge one day okay so watch this space there's a question from Josh Brown other than the international legal issues what barriers are there to idea send being adopted the legal issues are a basic actually a very specific problem through government institutions that cannot easily join foreign organizations the main problem to adoption is that it needs to be introduced in through workflows and so that changes how people work and then is in my view the main barrier to adoption that yeah people have to send to changes to what they do and then usually they're not there's busy enough so they don't want any extra work so we have to make this simple or provide other good reasons to to make it worthwhile and another issue too is that if you've got an organization that photograph it up and has an internal system but guarantees unique identifies and if your organization registers it is a reasonably simple process what we have noticed is organizations that are full in one in two and thirteen are people using repeated numbers and they don't have an internal inconsistent unique identifier or number I've got another benefit number then they do struggle a bit to introduce this much more complex and that's why real ovae's were fairly good at this because they had unique systems so that's a very good point because in the case of Geoscience Australia we just had to put au in front of their numbers was done okay we're coming up to time there's one other question there how does Curtin University library support IGS n um his name's Matata tires forgotten that somebody doing something areas but it is Lee yeah yeah and I would suggest you go and talk to him but the curtain like University Library has been very supportive of this whole project yes so Joshua's sorry John Brown has made a good point that Matias is now at UWA yes so maybe someone else at Curtin University but we could share perhaps when I do the follow up email if there's some contact details perhaps for each of the allocating agencies that would be useful if I could share that in the email for people Brett McInnis from Curtin University would do the first thing because they kind of runs the project in collaboration for the libraries okay okay and John Brown is at Curtin University John Brown family call now so I think he's saying that you could talk to him if you want to his own information so I'll check in with you John after after the webinar and sorry one other questions are there competing IDs in this space or does this look like this will be the gold standard not in the Geoscience space but they certainly are those in the other areas and one of the while we're being sort of open about our jirisan is that it has been one of the more successful ones and we often wonder why and we think it's because of one it has a very good governance structure and secondly its compatibility with data flight in their eyes I mean would you like to add anything to it um it was an interesting discussion during that suppose in last week where we had quite a number of people from the biodiversity new world who had tried to introduce a life science identifier over the past ten years but then that system was quietly buried recently because adoption was just her heart was technically immature and it did not have a governance structure that made it easy to apply so good biodiversity world is now discussing how to proceed and there may be more information about IDs ends published shortly - we hope yes um yes we're working on an overview paper to describe how the system is set up in from an organizational perspective and science use and then then it will be a separate paper outlining the technical implementation a question from will can the system be used for samples that are not able to appear I think it's supposed to be in a public temple yes it can be so you can think of this in the same way as DOI are being used when you resolve a DI it doesn't always get you to the object that you are like in the paper both papers are not publicly you have to have a subscription so now there are good reasons why you will give why you don't want to disclose the details and our sample to the public that can be for there can be rare species or a sensitive site and so the only rule is it has to resolve to something but what you want to disclose is up to your discretion lastly do you have anything to add to that it's just that certainly as I said we have a lot of fossil puzzles in this and I can assure you the locations of many of those fossils in certain organizations I'm not publicly available but you do know there has been a fossil collected from our Springs but only certain people who are qualified will get what that specific location is so the system does definitely have safeguards around that okay are the other thing I want to add is may be of interest to people listening is that it's not just for land-based specimens in the US it is why we use the ocean drilling program and for marine sciences as well and once you get into marine areas it was a lot of it inspire samples so it's just kind of organically way starting to grow into other areas because as you said the life sciences identify system collapsed and people just see the need for having unique identifiers for their samples and this is what's happening so related to that the next questions are you aware of anything similar for pathology specimens I'm not yet what about you I have read about identifiers for cell cultures but I'm not aware that they have this kind of resilient resolving mechanism also just apartheid months the basis of what we were doing last week and why groups like Jebus and pad week are getting interested in what we've got it's that core kernel that applies to the registration of a sample with service group in Germany but that you can then go into a next flyer out that with a metadata is more in tune with rock sample or something else you know or a plant sample you know you in the communities develop their own additional metadata but it's like core component that is the bits that can be cloned for other groups if they so want to ok well that brings us to the end of the questions so thanks everyone for attending today's webinar and thanks very much to yen's and Leslie for their time in sharing that there was a lot of good discussion a lot of interest around IGS ends which I think we'll have to follow up through the email after the webinar and s-mint as i mentioned earlier this is actually a series on persistent identifiers and the third one is on linking publications and data so you can find out about that the webinar series through the end website or subscribing to and views so thanks for coming everybody and bye thank you very much for having us thank you yeah