An Open Source Web Service For Registering and Managing Environmental Samples
Seoul, South Korea

Records of environmental samples, such as minerals, soil, rocks, water, air and plants, are distributed across legacy databases, spreadsheets or other proprietary data systems. Sharing and integration of the sample records across the Web requires globally unique identifiers. These identifiers are essential in order to locate samples unambiguously and to manage their associated metadata and data systematically. The International Geo Sample Number (IGSN) is a persistent, globally unique label for identifying environmental samples. IGSN can be resolved to a digital representation of the sample trough the Handle system. IGSN names are registered by end-users through allocating agents, which are the institutions acting on behalf of the IGSN registration agency. As an IGSN allocating agent, we have implemented a web service based on existing open source tools to streamline the processes of registering IGSNs and for managing and disseminating sample metadata. In this paper we present the design and development of the web service and its database model for capturing various aspects of environmental samples. Previous work by the System for Earth Sample Registration (SESAR) was aimed primarily at individual investigators, whereas our work focuses on curating sample descriptions from larger collaborative projects. The paper describes the linkage between the IGSN metadata elements and the sampling concepts specified in existing common data standards, e.g., the Open Geospatial Consortium (OGC) Observations and Measurements standard. This mapping allows the application of the IGSN model across different science domains. In addition, we show how existing controlled vocabularies are incorporated into the service development to support the metadata registration of different types of samples. The proposed sample registration and curating approach has been trialled in the context of the Capricorn Distal Footprints project on a range of different sample types, varying from water to hard rock samples. The observed results demonstrate the effectiveness of the service while maintaining the flexibility to adapt to various media types, which is critical in the context of a multi-disciplinary project.
the but yeah so good afternoon everyone has
highlighted the offenders for me and I'm from University of lectures of the viral load 1st sorry this is a 3rd of them bedroom organization cross-entropy of constructing offender at the of the talk about um a metadata model and that's the feature the developed I can do to support the registration and management all different kind of environmental samples in CSI rule
and can sample all planned environment the sentence off them because the thing the to a line wrapped fair insects so encounter that these are the samples are collected and start the by different and you do the improved example individual researcher and Ledbetter rules university state agencies for example juridical the wave and Siemens and at the end of each correct sample collected just use that we often document in the sample description this leads to several problems I have found that I had a friend named can be used to describe the same sample it is possible that this is possible and so it is possible that the did the sample mean you actually just over that time for example and relocate 1 sample to another location orientacion perinatal approach in in the in the sample FIL if you want to use the sample and the thing that the pollution then you won't have any problem because basically maybe you would have some records about the samples and you can easily identify simple but what happened then you want to expose the sample to somebody outside your organization then you look at the depletion of sample it's an important part in factor so if is the solution to the cotton Identification System example we have the International Standard Book Number I mean and if he that globally identifier which then use to add about books so we can look good we have a digital object identifier which can be used to identify publication so that the standard in have I have
city and essentially in the sense that the dimensionality of sample numbers and the the Federal 92 I've been really are clerk which is going to use that in 5 and the things that you and of the things that what I mean by persistent that I have it has the ability to the fact that uh compared to right because if you use you out of the year I can change over the time but if it is the persistent identifier it has a stable link which completing each round to the data in the description of the sample so I just said this is an example of a planned less than men and it consists of to character and presenting the agent i will talk about legacy later but it's going to supplement thermal by code so the basic preventive indicating the German and the court essentially assigned by the user and you can see it on the biggest to center and about some some numbers and if a combination of of number and that's fairly discourtesy successful which from the this element of merely of the Alliance for this I II III . 1 that the outline of the P PDA agent each formally registered idea the agency and there have been many spacecraft I II and everything in the center of a project or individual researchers can reduce the descent through this IEC in the example of the quot opposite to relieve from the government of the victory registered distresses so see the ads stands the chorus of what was the tree and the residual 1 is some number that you assigned to this also so this is the persistent and unique identifier which identifies the vessel so this is a hard wired code work another important aspect about the idea of needs that and that's the kind of people and in the project system project in researchers are immediate as center if the want to talk to me these are just a number the haplorhines you just through educating agent and the set of Good agent the example I lean back and the if our if 1 although uh at the age in which family which is that the guy IGS and top-level agency so you can only obtain these with then adding the prior with the entropy being agent that then we reduce the problem the agency so in the side of the estimated EEG and what we would like to do even if we had a lot of sense of millions of simple different type of center samples bad news simplify Liite insulated they are from kept by the researcher from on the upstairs so we would like to actually at a final and the effects of that is that those you want to
develop a system so this is what I'm going to talk to the some history about hobby between the IDF and member of the become a member in 2013 and it is started that from that that should land from the nodes of structure and current events that these projects are upstairs which you use this system that I would describe in this notation and pursuant to point out that the work under to presented in the relevant to the ongoing effort because I'm to and could be eating in Australia and the 2 have a collaboration example that you assign a failure and in the Canadian University
am Best events over the summer of from the number of symmetry ability have been the metadata model and this is anything things work is mainly focused on geochemical sample of body into a settled we have this kind of semantics and puts the side that you can occur so that that that's that they that imitation in them up with these and then edit metadata model that's the reason why we would like to develop 1 4 C assigned a trick to cater different types of samples reduce the situation
so do things I represented the contribution of the of the uh the metadata model the descriptive descriptive metadata model and the 2nd 1 is that cities which I call as imitating agent that is and will work these are currently being used in the and again that's again this diagram because that is a very important added to understand each part of this will belong to so that client is the distance 3 project which hadn't mentioned before in suicide and they have a different kind of simple and David actually use them in medical model that we developed to actually thank to that so this so the so this is planned by the depicting eating it is this year's arrow and then the have phillies the top level so the the top level so this will actually reduce so the persistent identifier and we tend to their looking think and then this will send the fees than I have idea and close to that client but why we need another look at it the model because from their local media and to the top-level agency the metadata model only cover registration information for this and no information about sample description so as that could engage in it is always possible to develop a metadata model that captures call are characteristic of different kind of sense of OK and another thing that further away and was also used with these and then experts to the best this to the public whatever simple description capture utility we like to expose goes through OAI-PMH or where the image is just like over harvesting protocol and so also that on the part the public can automatically get the descriptions from these surveys all right so I have no
information about them that even more than I am
and I'm not going to explain in detail each element but I have a group header went into uh into several I had group then went into several groups so that they could the have found in describing the sample identification some elements describing how the samples are collected in the field and then the spot will start the time dimension and also other related information under the information for example the the have several relations which she can use to say that the sample the sub-sampled from under sampled are that is also the topic at the sample so these are different kind of relation you can use to describe the samples although there are several and then only few elements are mandatory so examples and the number which is that given a matter the sample mean which is the opening of the sample but it is the public or private because I think this is very important in some project you want to get the number by you'd want to want to retrieve the meta data to the public yet then that you can that if it's a private or public and you have also landing page what is sending based landing page for you further information about the sample that I see before the only capture the characteristic of sample but if you have more detailed information of the sample then this can be or being true the landing page and then of course the simple fact that it's a rare what the plot and so on and then the sample correlation it's very important that the center is located I'm sorry promotion to the we also use the concept of Linked Data on it should forgot we use some control block of a really good described simply type and feature type I think this is a trick to use the power popping data to give the user more meaningful information about the common and he also reuse some elements from the I IGS a registry scheme are what is the industry's scheme
what feared example we use also the lock elements in also the related relations which I described all this is on the right from the top level of the mark that these a tree to sure that let me be when that is not a new scheme out but we use the existing scheme on and then customize accordingly to get a different type of samples
alright so just as an example of what I mean by identification and sampling and the deeper example identification consist of number the name of the the type pacification concept why it is collected and so on sending activity for example in the the collection of emotion very disconnected from that time the whole dissenting feature the host descent but for example ObservationValue collect the water so the observation about the system being collect assembling features the horse that the sample is collected from and who collected the simpler the 5 did the the measurement and the method that can be for example and so on so this is sort this is an overview of descriptive metadata operate once we have the metadata are the next step the medieval serine and this that will be used by clients who the client existing project it can be numerous lecture sold they can actually from idea the . according to these descriptive metadata out conformity yes and the description using these metadata model and then sent to the server we further so this is simply man arrested PID saw some operations which had been supported big number to see all the speed to reduce the namespace and also to reduce the sample general information about emitted about truth and the number let's look at the modi
Exemplary reduced the the sample and this will actually be suppressed matter and it will return a list of successful and unsuccessful sample registration and this
is really small animal it looked good on the project but here and in about an hour and like to say that 1 so you have to the client with them but the there is the project in the Federal that agent is uh the so this and this is the top-level agency so birth you and XML and then we do the evaluation user mitigation them XML decision that that you guys are really data properly and we also believe that means these so because you want to ensure that only have so each data center all clients can actually where the unique 1 and you want to ensure that only you can only those that are really unique name can reduce the sample and then 3rd and so the in process you can if you let the client program spending about 200 500 up and call them samples descriptions but the prominence that in this part
which is the thing that engage in an idea then the only Subpart uh sequentially registrations so you can reduce the 1 centered at 1 and that's the normal about 3 is in my people samples so what you do you need to really make sure that all of them are adjusted once on the precipice of simple which are registered that will be inserted into already done this we keep a copy of the Sample Registration yeah and then send it to the client the list of successful and unsuccessful root samples because he that the highly likely that not all the simple can can obtain their ideas and this is this possible that the % 100 maybe you will get only the you do not know the year or 3rd year and so what we do is to ensure that these dark recesses through these vicious successfully deduce that sample description yeah and then sent to the client are what a simple the charges that successfully and unsuccessfully
so I I will edit the show you 1 example of how the system that this is this scheme are the metadata schema and the need a model has been applied so this a project called Capricorn distill footprint project disabilities project is so the members of the project coming from UW in university invest in Australia suicidal and also at of thank you and also the political so they all our 5th in Australia and the broader basically find mineral binding rules find interesting mineral goals or corpus and given project to collect different kind of like upon water of soil and rock so we have implemented the system basically the template that the compression system will send the request and so with mean to mean to get digest from the top-level reuse the distortion agency and it's called standard description and the thing about the idea saying to the client use the sample gracious system of that because they can't project so this is not without so you
see the 5th example of XML which created both on the metadata model I describe default and that if an idea same CS continuity rule 1 the S. stands for the fire all this is uh means space for it keeping of societal cavities are prefixed broader category project that predict the defender and beautiful about something which is assigned by the center and then use that so this is the mean registration agency be sure that the sample has been reduced so that the hand denied this is the persistent identifier so you you may be due to this 1 0 2 7 3 is the um means that this there or IGF then and this is supported by the idea a number of the future link if you eat the funny thing is that this is the action and with the the the the the click on the link and it'll give you more detailed information about the sample all right to conclude so
what I have described so the did even when all the metadata model of samples and that these implementation and and that that the flow the assigned rule and the contribution that the metadata model is not domain specific it is meant to capture the information for the different and also it's not maybe it's not meant to develop 1st a circuit that simple but they can be used to describe the main properties of different types of samples and this is even in the solution involves the mind those so this is important to actually and to facilitate the sharing of simple description the inside and outside theUS Ciral I'll let you will have to do is the latter the solution to the rest to be added to the positive actually already reduced that some sampled from the work of Australian before officers the center security of introduce some simple about of impact descent sub collections from this we are of the and the next that is also to reduce the the sample from the mirror reflectance spectra then at the start of the nearest fall n we were left also from any document the mapping between other metadata mortal an interesting at any time or example I so in order to see the cease to ensure the correct upward application all the metadata model which we developed across different domains and finally the will at also developed but that that will be continuously the angular signs failure and this step will be used to all are the simple descriptions from different depicting agency in Australia thank you for your attention
if thank you and the the I wanted to ask you mentioned the size tendency of the back of last slides is that
where the observations and measurements and things of that relates to the your arm and we want to be with them a physical model you into everything all the time because we want to hear it you want from me that I that will be for the fall of pride in different kinds of samples physical objects and from that from on can be used but not all all caucus theories which are in the form of a lot of different things but liabilities have observation and measurement path to move everything that the features and the the anything over that of the modern world the different people in the world that is more of a vicious century Oliver something down face in our case is more about the sample and in the core property of course of is 1 of the and so what I mean is that the social what are they in no more than that of all of them involved the aligned to the current things that was the stages so you talking about 1 9 1 1 5 million on the allocation and an official because there's also on observations and measurements slam enhancing lost anything from your notes from ISO also going to answer yes no this is on the point of this and the entire course of the thing yeah I was going to say something In the other person I have another question sorry all my position that it didn't have to define and it's it's not something that you do but I was
just curious tried really explain I 2 years and have all it's fun
use this is a full digits you sort of an even more than thousand samples for 1999 in the late our the 1 of the ideas and accommodation their recommended length is mine the recommended length 9 that consists of the region in space and the cool and the quantities of data centers but you the EEG and think that they're often but for the this she's right they have a really long and then they get ingredients have the right to a treaty committee this kind of uh these so that Canada is not because the in the vocal in but it is recommended that in the in the in the OK of yeah you thought about including some really neat during the location could also in the sample the number of elected that is something called the GUT location coding system for because it is you do me behind the latitude and longitude of this have been getting the best the IDF idf then because learning good to the last
slide um the 2 of
you that I find in the action you you you you you you would think that there are effective and efficient information because not all the same thing that we have Hang on the the the the the the the the the the the but you know that there's a nice thing about the fire is unique but the thing actually I think we need to let these things you more efficient efficient than people the any other questions the thing the