DOI Assignment within the ARGO International Project
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 24 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/15287 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2014 | |
Production Place | Nancy, France |
Content Metadata
Subject Area | |
Genre |
11
13
24
00:00
Digital object identifierData managementData managementPresentation of a groupInformationRow (database)Group actionComputer animation
00:27
InformationTelecommunicationSupercomputerDatabaseWorld Wide Web ConsortiumServer (computing)System programmingData managementIntegrated development environmentBayesian networkMathematical analysisSatelliteService (economics)Data managementSupercomputerFluxInformation systemsLibrary catalogPhysical systemRepository (publishing)Set (mathematics)Landing pageSoftware repositoryNeuroinformatikMoment (mathematics)Video gameOnline helpDigital object identifierPolygon meshCASE <Informatik>WhiteboardWeb pageGroup actionIntelligent NetworkComputer animation
02:46
Digital photographyGreatest elementQuicksort
03:04
Computer programElectric currentMeasurementLevel (video gaming)Position operatorComputer programmingOcean currentLevel (video gaming)Acoustic shadowGroup actionAreaLattice (order)Theory of everythingComputer animation
03:50
Cycle (graph theory)MetreSurfaceSatelliteTheory of everythingDialectGroup action
04:36
MIDIKey (cryptography)Source codeInformationGreatest elementEstimationGamma functionElectric currentNumberCanadian Light SourceUniversal product codeDisintegrationOperations researchMeasurementData managementReal numberDataflowSoftware testingTrajectoryDistribution (mathematics)GUI widgetSet (mathematics)Projective planeFile formatData centerFile Transfer ProtocolAreaPosition operatorGame controllerSoftware testingWater vaporSatelliteInformationReal-time operating systemNumberOperator (mathematics)Electric generatorDiagramDampingMoment (mathematics)Open setSource codePhysical systemMetropolitan area networkPopulation densityOrder (biology)Ocean currentPersonal digital assistantPolygon meshGroup actionMultiplication signVotingCASE <Informatik>WhiteboardPoint (geometry)Computer animationProgram flowchart
08:30
Projective planeCartesian coordinate systemWhiteboard
08:46
Electronic mailing listLibrary catalogGeometryComputer networkWeb pageDigital object identifierUniqueness quantificationPhysical systemLanding pageSource codeGoogolTheory of everythingSearch engine (computing)Vector potentialError messageAnalog-to-digital converterDoppler-EffektElectric currentStrategy gameExecution unitMusical ensembleWeb pageSheaf (mathematics)Latent heatExtension (kinesiology)CASE <Informatik>ResultantAreaUniform resource locatorGroup actionMultiplication signScripting languageQuicksortState of matterLogicAdditionVideo gamePredictabilityOrder (biology)Structural loadComputer configurationSystem administratorPosition operatorFile Transfer ProtocolData centerRepository (publishing)SoftwareStandard deviationSet (mathematics)Graphical user interfaceUniqueness quantificationRow (database)GeometrySubject indexingPhysical systemLibrary catalogMetadataIdentifiabilityDesign by contractLanding pageProjective planeSearch engine (computing)InternetworkingDesign of experimentsVotingError messageStrategy gameWeightLevel (video gaming)Computer animation
13:16
Projective planeMetropolitan area network
13:30
Digital object identifieroutputWorld Wide Web ConsortiumBuildingSystem programmingLanding pageSheaf (mathematics)Graphical user interfaceSet (mathematics)Data storage deviceCASE <Informatik>Repository (publishing)WebsiteDifferent (Kate Ryan album)Software developerLibrary catalogElectronic mailing listoutputWeb 2.0InformationPhysical systemBuildingGroup actionSeries (mathematics)WhiteboardMatching (graph theory)Multiplication signFunction (mathematics)Key (cryptography)Natural numberInstallation artReading (process)SummierbarkeitWeb pageState of matterDisk read-and-write headNetwork topologyLine (geometry)Incidence algebraSource codeXMLProgram flowchart
Transcript: English(auto-generated)
00:01
I would like to start my presentation with a few information about data management at Iframare. Iframare is a French research institute for marine science. It was created 30 years ago and it employs around 1,500 people.
00:21
Data management has always been set as a priority for Iframare. Indeed, marine data, like I guess most of life sciences, can be reproduced and is very expensive
00:45
to collect. So it would be a shame to put so much money on the system that collects the data and not be able to store or manage the data correctly. In most of the cases, the cost of information systems is negligible compared to the cost
01:05
of the system that collects the data. So at Iframare there are two teams for data management. The first team develops the information system and the second one uses the system
01:23
to collect the data, to validate the data and to give you the data. Of course, anything won't be possible without the help of the team which manages the servers such as the supercomputer which is needed for when the data is too big to be computed.
01:42
So at the moment there is about 35 full-time employees to take care about the data at Iframare. So Iframare is managing six data repositories such as the Coriolis repository for oceanographic data.
02:08
For example, this repository is used to store the Argo flux data with other data from
02:20
other equipments. We also developed some cross-cutting services such as the Sextant catalog which is more a catalog about a computed data set and we used this catalog to manage the landing
02:42
page of our DOI. So among other data projects, Iframare like other institutions in France collaborates to the Argo floats program. You can see one of these floats at the bottom of the photo.
03:05
So the Argo program is a global array of more than 3000 free-drifting floats which carry out temperature, salinity and currents all over the oceans. It's a really real international program with more than 30 nations participating.
03:31
In the following map you can see all the living positions of all the still-running floats from four months ago and in orange you can see the positions of the floats that
03:47
have been deployed by France. So how does it work? So the floats are deployed by a ship.
04:02
They go down to their drifting depth at 1000 meters. They drift for around 10 days then they go down to 2000 meters and when they go back up to the surface, they record the salinity and the temperature.
04:26
Once the surface is reached, the data is sent to the coast by satellite. And the Argo data is now a source of information about the ocean and it is massively used
04:46
for example in climate research. And you can see in the graphics that the Argo data is more than used on publication
05:01
every year. So every year some new floats have to be deployed to maintain the current number of The lifetime of a float is about four or five years so it runs until there is no battery
05:26
left. And at the moment a new generation of floats is being designed and for example in the scope of the French NAROS project and the new floats will have better performances they will integrate some new kind of sensors they will be able to go deeper and perform
05:46
under ice operations. So let's have a look about how the Argo data is managed. So this is a simplified diagram of the real-time data flow.
06:07
So the data of the floats are transmitted to the land by satellite then it goes to the national data centers for example there is one national data center in France which
06:24
is named Covidice. So these national data centers have to collect the data to convert the data in open formats and to apply real-time quality tests and then they have to provide the data within
06:42
24 hours to two global data centers. So there is two global data centers one in the rest in France and the other one in California. On these two global data centers are providing the same global data set of Argo data.
07:03
So user can get the data from one of these two data centers through FTP service. So as I said the national data centers have to perform the same set of quality tests
07:22
for example check that format is okay and they have also to perform some visual control. So for example at IPROMER there is people every morning which check the data of the
07:41
Argo floats before sending them to the global data center. This test requires some good knowledge of the area for example the peak of salinity you can see at 700 depth is not on error because in this position in the Atlantic
08:00
in fact you can find some water that comes from the Mediterranean Sea and when the water gets off the Mediterranean Sea they go north and they stay at this depth. So this peak is not on error.
08:20
So there is a lot of information about Argo data if you want. So we had a project at IPROMER to allow to cite data. And Argo data was the first application of this project.
08:47
So we choose DOI unique identifier system to sit data through a contract we signed with INIST. And we select or catalog sextant based on geo network to host the metadata
09:07
that are needed to get a GUI. But since we were not happy with the landing page provided by a geo network we developed our new set of landing page.
09:21
For example this new set of landing page include how to sit how to cite section. And this section is suggesting to use the GUI when there is a GUI on the record.
09:41
We also take care that this landing page was okay for search engine indexation. I don't think the Argo data does not need some extra visibility but this is not the case for all data sets.
10:00
Or maybe a standard search engine can provide more visibility for all data sets. As an illustration this map shows the position of all the nodes of the command available in our document repository during the last year.
10:23
As you can see more than 80% of the downloads come from Google. And you will see that data sets can give more visibility to publication on this data. So we also develop the tool to declare the GUI through the data set API.
10:48
And then we were ready to declare our first GUI. So the very first UI we set was for the global data set of Argo.
11:01
So the landing page of this GUI is suggesting the two FTP servers to get the data. So one available in France and the other one in California.
11:21
So if this GUI is used it will give credit to the Argo project. It provides also the appropriate weight to the data for further research. Because the global data server, global data center includes some new data every day. And also the data may be correct every day also.
11:45
However in very specific cases it does not allow the prediction of a result if a potential error is suspected in a publication. So for the data that are updated on a regular basis,
12:01
a data set suggests three possibilities and we select the second one. So we decided to allow the possibility to cite a specific snapshot. So a copy of the anti-data set made at a specific time. So every month we make a snapshot of the global data set.
12:24
We save it on a FTP server and we put a GUI on these snapshots. So we will see if scientists will use them. This is one of the main difficulties with the UI. You have to think about how the scientists would like to cite the data.
12:48
So for each new kind of data you have to define a specific strategy of the UI. For example, for the ADCP data collected from French votes
13:00
that we will put on the internet fairly soon, we decided to put a DOE for each year of data. So it's another politics of publication of data. This project with a data situation also helped us to connect publication
13:24
in our repository, Rshimer, on the data set. Indeed, it is now possible in a document we load in Rshimer to list the data set that have been used in the publication.
13:46
And if one of these data sets have been defined through our sextant data load, in the landing page of the data set, an automatic is seated by section
14:04
is also created in the landing page of the data set. So to do so, we only require to input the DOE, the GUI of the data set in Rshimer. And only a data set with GUI can be seated, and only data set freely available on the web can get a GUI.
14:25
So we hope that scientists may deposit more pre-data sets to get credit from a GUI. At least it will help readers to navigate from publication to data set on this website.
14:41
At least also it build backlinks for Google, because you know that the most documents get backlinks, and the most it will appear at the top of Google list. What we want to do next is connecting our data set catalog to our people finder.
15:05
This is what we have done already for Rshimer. So this means that in the people finder, the bibliographic sections are built automatically with documents deposited on Rshimer. And this is one of the developments that helped the most to get documents in Rshimer,
15:28
because a lot of scientists just put documents in Rshimer just to have a complete bibliography in the people finder. So we also hope that it will help to have more data sets available in our repository.
15:57
Because you know for the ARCO project,
16:01
there is no problem because the data comes automatically. And if Rshimer develops new equipment, we try to do so. Develop new equipment that provides data and stores data automatically. But this is not the case for all data.
16:20
For some data, only the scientists have them. So only the scientists can deposit them into the repository. So we hope this kind of development will help to get more and more data freely available in our repository. So with this last development,
16:41
we are almost ready to offer at the public of the IFRMA website different kind of information, all linked together. So it will help the public to navigate from a piece of information to another.
17:02
We also hope that each system will take benefit from it. For example, some scientists may deposit more freely available data sets to get EUI and be able to link them into Rshimer. And finally, it will provide more guidance and thus more visibility in the world.
17:24
Thank you. Thank you very much. Thank you.