Logo TIB AV-Portal Logo TIB AV-Portal

Evaluation of the semantic research data management system CaosDB in glaciology

Video in TIB AV-Portal: Evaluation of the semantic research data management system CaosDB in glaciology

Formal Metadata

Evaluation of the semantic research data management system CaosDB in glaciology
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
We evaluate the open source software CaosDB for its employment in glaciological ice-core research at the Alfred Wegener Institute in Bremerhaven. Here, the system is used for the logistical management of up to several kilometres long ice cores which are stored as ice-core sections in a freezer storage facility. We provided a loan work flow to facilitate and control the movement of ice samples during ice-core processing. Furthermore, we integrate data analysis algorithms for various processing procedures on the ice-core samples. The application of our research data management system (RDMS) to ice-core research demonstrates in particular how a flexible RDMS with a semantic data model can efficiently improve collaborative research.
evaluation data management evaluation Now part Polare energy systems systems organization
point functionality script services time formating data analysis Polare mathematics scalar Software Hardware environment disk systems link Development expression physical data analysis Symbolic mathematics structured data data management processes Software environment case functions model game
webcrawlers Graph files PID files file system data analysis fields Indexable repo DAS case chaos communication orders repository Indexable file system Results modes record systems
server Actions services Open Source Development states feedback file system total goods Software Indexable user interfaces services sin interfaces files projects Open Source states workgroup particle medical physics Software interfaces organization des
Context sample analysis cores Polare polarization data management structured data data management evaluation sample strategy environment Schätzung radiation box environment model extent systems systems extent
processes states DAS transfer
track ICC storage box data analysis
email time number specific CNN chaos box extent systems response generate The list Content cores category Types data management Location structured data case Query sheaf chain statements record
email track link states administrations unit font mathematics box Representation Alexander addresses systems data types email response information files storage analysis cores bits category Types Location processes real vector case ICC Free Fingerabdrücke addresses
track binding socket table information binding Content Semantics several information category Location processes statements box extent message-based Results
Elektronischer Datenaustausch table views analysis sample SEMs fields part system call product Types data management prototype sample different Alexander systems
Types plots analysis systems
script user interfaces track images analysis data analysis functions image Types specific processes chaos sort Results
projects Open Source archive computational
science and today I talk about that and then every evaluation of cows to being less energy so my talk with reviews but into 2 parts 1st I will give a quick overview of our jails to be and in the 2nd part I will talk about how we can transfer that system to problem at the Avi and they my how often and and and you might notice there is a thought affiliation appearing
in his game how recently founded on some of the Alsace funded service company far providing commercial support and custom developments for that system so what a scalar
case in the 1st of all I would briefly like to introduce some some important requirements for ice in research and for research data management for our research so we are interested in having a symbol but expressive search functionality and we need to be able to store every data for lot of any fights size we also need to be able to have the possibility to start and retrieve raw data but also process data analysis resides and documentation so not limited to raw data and and and you want to support our kinds of data analysis suffer from superscripts to high-level software and and we wondered minimally invasive workflows over the last 2 points are actually related he want and that users don't have to adapt to the new system but you want to adopt the new system to the users and how they work and 1 requirement is very important for us and the scientific and environments especially I come from biomedical physics change we often we have new devices we have new hardware and software so we have the need for a flexible data model so we don't don't we cannot we program the system every time and we buy new hardware on software so and I think there's a fair and scared
has has a lot of solutions for data acquisition and for a date publication so there for example in the few given tried let modes books them many good solutions already available on it's in the field of data repositories for the final communication of some data but there are a lot of really good working possibilities for example also the army and but in an intermediate stage as customers want targeting the intermediate stage graph data analyzes kicks in and where we also want to be able to directly installed resides of our data and is again in history so not only the raw data but also every intermediate step onto the finally push that's stuff on the final result to a data repository and how is in principle work so
for the data acquisition as I said we want you to use your desire workflow slowly probably generate some kind of data files on some kind of fight system and daycare severely as an automatic crawl that needs to be adapted special use cases but in principle is able to automatically index files on the file system in order to create entities and records in the chaos to meet and after what's
on the the trial Rollison data through that user interface or use any of the available and climbed in their favor that communicate whereas for the cement particle was the server software and and yeah depicted the the Python interface which currently use the most established interface you can use it for automatically acquiring data and also for injective data and is
so on the current state and the future of the project and general as scientific project as the research group by medical physics at the Max-Planck Institute for Donaldson said organization getting is developed since approximately 80 years and running status in 2016 originally it was developed for all purposes but we have worries that as open-source software last year and as I mentioned you just a few weeks ago founded and this service company and gossip is currently tested another work groups in and outside of Gooding so far we have received very positive feedback and about 1 of these irradiations
I'll talk about now so this to be in the strategy of the ugly and bring the half
so what were the ends of the collaborative approach could ever of TIFF pro-DPP jects so initially they had a very concrete problem they wanted the system for efficient management of ice core boxes and they want to have the possibility to have flexible extension for example to sample management so approve really analyzed to store the analysis resides in the samples with and that of its and could connect them and for us it was an interesting to test and whether this flexible data model really works outside of our own context so in the context of polar research and we also wanted and radiation of the system in general and a different scientific environment with possibly different and backgrounds from scientists and technicians and of course they're very valuable for us feedback from very new users that haven't used system for so
what are we actually talking about them these ice costs and look like this so this is the state when they're basically drilled in green and 1 doctor come on the glacier and and here we can see this
process during the actual data acquisition of causes the possible to install them like this in Germany so as soon as the transfer to Germany and there's
some that age of piles of pellets within freezer storage facility and here we can see yeah a pile of boxes every every 1 of these boxes contains pieces of ice costs chunks of ice cost and of course this is very difficult to keep track of these ice
especially when they're given away from a data analysis to other institutes on and also the same Institute for people carrying out the data analysis and so
we started designing a very simple data model and no integrating inserting it into chaos to be so we essentially have boxes we have a box type we have loans which start the properties of boxes that are given away for the the the amount of time and we have let's which are connected to the boxes and basically stalled the location within freezer facilities and of course person very simple type that can be used to fall there are kinds of responsibilities so this is the initial modulated show an extension was to something management that 1st of all it allowed
us through right away with the existing system to very simple but useful such as like finding a specific records box lists and were found with specific number on generate overview was over boxes was specific contents on to find boxes which reference other entities that which reference another alleged was a specific number so the system and in this case already proved that it can handle this rather simple but useful requests right ranked data which showed that also very ro complex statements that like these chains queries for linked with properties become possible and the users to search for for example data that is costs that are connected to specific material data and this and so this is how it would look
like if it is that for a box and the vector have fails to meet you see essentially free text properties here and 3 link properties a response person a box type and upon that and yeah this is basically what the Israelites representation of what we just saw it looks like and now and and the important the workflow comes into play when we want to borrow books assume we want to borrow this box to an analysis on this box we can click but we can click this little button here on the top right and for the font so in this case our referred at with my name and my e-mail address and expected return date and as some of the comments that I use for the for the day decorator at this institute and yeah and the system will handle this would connect another type which we've seen previously known which stores all the information about the whole process of lending and returning nice carboxylase in this process and it will also adapt to what's happening here so in this case the administrator of the system can accept the bora request can also put the box into place and handle the whole process of of lending and returning boxes and the system would keep track of the changes so and
this is what the whole I box loan workflow looks like so we have a basic unit of Borel workflow an and state land and we have never workflow for the return and and that another challenge I came into place of the ice core facility where they Starbucks's boxes is handed by an external company and this company's as soon as boxes I returned hands out next she'd basically with new information so was new information about red boxes have been put in the end and and on which palette their start and was a bit clumsy to end all this information about the new new location manually so we decided to integrate and automatically turn and which we integrated into the
air KAST to be so we have we just a small JavaScript IX extension fought has to be which allows you to upload an excellent cheap so the 6 she provided by the company and if you press met and this she will be uploaded process by K a city and all the information that is necessary to to keep track of the new location of boxes will be up there are dated and stored and that property so in the end you get a result that 16 boxes wind up that's it successfully and this it really helps in managing the daily work was the lone workflow at this institute so
for searching the care severe of course we we have those full text semantic search facility in the prairie language and but we provide a lot of short cuts for the end user that facilitates basically all the stock that's just get generate some statements and there's a prairie language through search for a very for statements and used very often efficiently and this is the sockets can also updated and created by the user depending on what is necessary for them and we found that actually the Mccrary language already proved to be quite simple to use people get adapted to this language really quickly and here we see
and also 1 of product that work right away because it was already integrated that you can for example generate an all the views over existing entities for example over all current loans which was a very simple very OK and for the final
part I want to show that is actually possible to go to sample management or not intrude do of automatic data analyzes from within the system itself and therefore we created a prototype so behind each call there's also structure that's builds on that and then we so the these course actually cut into strips and separated into backs and intersections and in the end there are different types of analysis run on these sections and for
example for this fabric types we have 2 types of analysis that can be carried out and 1 analysis might reside for example in 1 plot this later maybe published are used for for the data and business and we we thought might be useful to integrate
this because it's new very often into the system
itself so we fall a specific type of analysis we were where we already have a preprocessed image we provide a button in the user interface called analyze data which you can use to feed this dataset directly into a Python script that is running in the background and this summer and which
automatically generates an output and I think what also really well you so connects this reside directly to the analysis and Stosur's back into chaos to be so that when users use automatic analysis facility they can also directly keep track of process resides of output of results because it's sort and appears to be back right away yeah and was this I would like to thank you for your
attention and highlighted that we have this open-source project on the left on on the computer lab but by the humidity in getting in and also Capon archive which is also currently under review and appear huge on but what would you can already done that here as a preprint sank