We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

DOI Assignment within the ARGO International Project

00:00

Formal Metadata

Title
DOI Assignment within the ARGO International Project
Title of Series
Number of Parts
24
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2014
Production PlaceNancy, France

Content Metadata

Subject Area
Genre
Digital object identifierData managementData managementPresentation of a groupInformationRow (database)Group actionComputer animation
InformationTelecommunicationSupercomputerDatabaseWorld Wide Web ConsortiumServer (computing)System programmingData managementIntegrated development environmentBayesian networkMathematical analysisSatelliteService (economics)Data managementSupercomputerFluxInformation systemsLibrary catalogPhysical systemRepository (publishing)Set (mathematics)Landing pageSoftware repositoryNeuroinformatikMoment (mathematics)Video gameOnline helpDigital object identifierPolygon meshCASE <Informatik>WhiteboardWeb pageGroup actionIntelligent NetworkComputer animation
Digital photographyGreatest elementQuicksort
Computer programElectric currentMeasurementLevel (video gaming)Position operatorComputer programmingOcean currentLevel (video gaming)Acoustic shadowGroup actionAreaLattice (order)Theory of everythingComputer animation
Cycle (graph theory)MetreSurfaceSatelliteTheory of everythingDialectGroup action
MIDIKey (cryptography)Source codeInformationGreatest elementEstimationGamma functionElectric currentNumberCanadian Light SourceUniversal product codeDisintegrationOperations researchMeasurementData managementReal numberDataflowSoftware testingTrajectoryDistribution (mathematics)GUI widgetSet (mathematics)Projective planeFile formatData centerFile Transfer ProtocolAreaPosition operatorGame controllerSoftware testingWater vaporSatelliteInformationReal-time operating systemNumberOperator (mathematics)Electric generatorDiagramDampingMoment (mathematics)Open setSource codePhysical systemMetropolitan area networkPopulation densityOrder (biology)Ocean currentPersonal digital assistantPolygon meshGroup actionMultiplication signVotingCASE <Informatik>WhiteboardPoint (geometry)Computer animationProgram flowchart
Projective planeCartesian coordinate systemWhiteboard
Electronic mailing listLibrary catalogGeometryComputer networkWeb pageDigital object identifierUniqueness quantificationPhysical systemLanding pageSource codeGoogolTheory of everythingSearch engine (computing)Vector potentialError messageAnalog-to-digital converterDoppler-EffektElectric currentStrategy gameExecution unitMusical ensembleWeb pageSheaf (mathematics)Latent heatExtension (kinesiology)CASE <Informatik>ResultantAreaUniform resource locatorGroup actionMultiplication signScripting languageQuicksortState of matterLogicAdditionVideo gamePredictabilityOrder (biology)Structural loadComputer configurationSystem administratorPosition operatorFile Transfer ProtocolData centerRepository (publishing)SoftwareStandard deviationSet (mathematics)Graphical user interfaceUniqueness quantificationRow (database)GeometrySubject indexingPhysical systemLibrary catalogMetadataIdentifiabilityDesign by contractLanding pageProjective planeSearch engine (computing)InternetworkingDesign of experimentsVotingError messageStrategy gameWeightLevel (video gaming)Computer animation
Projective planeMetropolitan area network
Digital object identifieroutputWorld Wide Web ConsortiumBuildingSystem programmingLanding pageSheaf (mathematics)Graphical user interfaceSet (mathematics)Data storage deviceCASE <Informatik>Repository (publishing)WebsiteDifferent (Kate Ryan album)Software developerLibrary catalogElectronic mailing listoutputWeb 2.0InformationPhysical systemBuildingGroup actionSeries (mathematics)WhiteboardMatching (graph theory)Multiplication signFunction (mathematics)Key (cryptography)Natural numberInstallation artReading (process)SummierbarkeitWeb pageState of matterDisk read-and-write headNetwork topologyLine (geometry)Incidence algebraSource codeXMLProgram flowchart
Transcript: English(auto-generated)
I would like to start my presentation with a few information about data management at Iframare. Iframare is a French research institute for marine science. It was created 30 years ago and it employs around 1,500 people.
Data management has always been set as a priority for Iframare. Indeed, marine data, like I guess most of life sciences, can be reproduced and is very expensive
to collect. So it would be a shame to put so much money on the system that collects the data and not be able to store or manage the data correctly. In most of the cases, the cost of information systems is negligible compared to the cost
of the system that collects the data. So at Iframare there are two teams for data management. The first team develops the information system and the second one uses the system
to collect the data, to validate the data and to give you the data. Of course, anything won't be possible without the help of the team which manages the servers such as the supercomputer which is needed for when the data is too big to be computed.
So at the moment there is about 35 full-time employees to take care about the data at Iframare. So Iframare is managing six data repositories such as the Coriolis repository for oceanographic data.
For example, this repository is used to store the Argo flux data with other data from
other equipments. We also developed some cross-cutting services such as the Sextant catalog which is more a catalog about a computed data set and we used this catalog to manage the landing
page of our DOI. So among other data projects, Iframare like other institutions in France collaborates to the Argo floats program. You can see one of these floats at the bottom of the photo.
So the Argo program is a global array of more than 3000 free-drifting floats which carry out temperature, salinity and currents all over the oceans. It's a really real international program with more than 30 nations participating.
In the following map you can see all the living positions of all the still-running floats from four months ago and in orange you can see the positions of the floats that
have been deployed by France. So how does it work? So the floats are deployed by a ship.
They go down to their drifting depth at 1000 meters. They drift for around 10 days then they go down to 2000 meters and when they go back up to the surface, they record the salinity and the temperature.
Once the surface is reached, the data is sent to the coast by satellite. And the Argo data is now a source of information about the ocean and it is massively used
for example in climate research. And you can see in the graphics that the Argo data is more than used on publication
every year. So every year some new floats have to be deployed to maintain the current number of The lifetime of a float is about four or five years so it runs until there is no battery
left. And at the moment a new generation of floats is being designed and for example in the scope of the French NAROS project and the new floats will have better performances they will integrate some new kind of sensors they will be able to go deeper and perform
under ice operations. So let's have a look about how the Argo data is managed. So this is a simplified diagram of the real-time data flow.
So the data of the floats are transmitted to the land by satellite then it goes to the national data centers for example there is one national data center in France which
is named Covidice. So these national data centers have to collect the data to convert the data in open formats and to apply real-time quality tests and then they have to provide the data within
24 hours to two global data centers. So there is two global data centers one in the rest in France and the other one in California. On these two global data centers are providing the same global data set of Argo data.
So user can get the data from one of these two data centers through FTP service. So as I said the national data centers have to perform the same set of quality tests
for example check that format is okay and they have also to perform some visual control. So for example at IPROMER there is people every morning which check the data of the
Argo floats before sending them to the global data center. This test requires some good knowledge of the area for example the peak of salinity you can see at 700 depth is not on error because in this position in the Atlantic
in fact you can find some water that comes from the Mediterranean Sea and when the water gets off the Mediterranean Sea they go north and they stay at this depth. So this peak is not on error.
So there is a lot of information about Argo data if you want. So we had a project at IPROMER to allow to cite data. And Argo data was the first application of this project.
So we choose DOI unique identifier system to sit data through a contract we signed with INIST. And we select or catalog sextant based on geo network to host the metadata
that are needed to get a GUI. But since we were not happy with the landing page provided by a geo network we developed our new set of landing page.
For example this new set of landing page include how to sit how to cite section. And this section is suggesting to use the GUI when there is a GUI on the record.
We also take care that this landing page was okay for search engine indexation. I don't think the Argo data does not need some extra visibility but this is not the case for all data sets.
Or maybe a standard search engine can provide more visibility for all data sets. As an illustration this map shows the position of all the nodes of the command available in our document repository during the last year.
As you can see more than 80% of the downloads come from Google. And you will see that data sets can give more visibility to publication on this data. So we also develop the tool to declare the GUI through the data set API.
And then we were ready to declare our first GUI. So the very first UI we set was for the global data set of Argo.
So the landing page of this GUI is suggesting the two FTP servers to get the data. So one available in France and the other one in California.
So if this GUI is used it will give credit to the Argo project. It provides also the appropriate weight to the data for further research. Because the global data server, global data center includes some new data every day. And also the data may be correct every day also.
However in very specific cases it does not allow the prediction of a result if a potential error is suspected in a publication. So for the data that are updated on a regular basis,
a data set suggests three possibilities and we select the second one. So we decided to allow the possibility to cite a specific snapshot. So a copy of the anti-data set made at a specific time. So every month we make a snapshot of the global data set.
We save it on a FTP server and we put a GUI on these snapshots. So we will see if scientists will use them. This is one of the main difficulties with the UI. You have to think about how the scientists would like to cite the data.
So for each new kind of data you have to define a specific strategy of the UI. For example, for the ADCP data collected from French votes
that we will put on the internet fairly soon, we decided to put a DOE for each year of data. So it's another politics of publication of data. This project with a data situation also helped us to connect publication
in our repository, Rshimer, on the data set. Indeed, it is now possible in a document we load in Rshimer to list the data set that have been used in the publication.
And if one of these data sets have been defined through our sextant data load, in the landing page of the data set, an automatic is seated by section
is also created in the landing page of the data set. So to do so, we only require to input the DOE, the GUI of the data set in Rshimer. And only a data set with GUI can be seated, and only data set freely available on the web can get a GUI.
So we hope that scientists may deposit more pre-data sets to get credit from a GUI. At least it will help readers to navigate from publication to data set on this website.
At least also it build backlinks for Google, because you know that the most documents get backlinks, and the most it will appear at the top of Google list. What we want to do next is connecting our data set catalog to our people finder.
This is what we have done already for Rshimer. So this means that in the people finder, the bibliographic sections are built automatically with documents deposited on Rshimer. And this is one of the developments that helped the most to get documents in Rshimer,
because a lot of scientists just put documents in Rshimer just to have a complete bibliography in the people finder. So we also hope that it will help to have more data sets available in our repository.
Because you know for the ARCO project,
there is no problem because the data comes automatically. And if Rshimer develops new equipment, we try to do so. Develop new equipment that provides data and stores data automatically. But this is not the case for all data.
For some data, only the scientists have them. So only the scientists can deposit them into the repository. So we hope this kind of development will help to get more and more data freely available in our repository. So with this last development,
we are almost ready to offer at the public of the IFRMA website different kind of information, all linked together. So it will help the public to navigate from a piece of information to another.
We also hope that each system will take benefit from it. For example, some scientists may deposit more freely available data sets to get EUI and be able to link them into Rshimer. And finally, it will provide more guidance and thus more visibility in the world.
Thank you. Thank you very much. Thank you.