HMC Project MetaCook - start your FAIR journey with VocPopuli
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Autor | 0000-0003-4049-4212 (ORCID) | |
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/60349 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
Computeranimation
05:49
Computeranimation
09:21
Computeranimation
13:14
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:00
Today's presentation will be, so first of all, I'd like to have a discussion, so try to keep it short. The second disclaimer is that I'm, it's going to be at least half the time is going to be a live demo. And it is possible that something goes wrong and there is a glitch. So let's see if that happens or not.
00:23
And yeah, so the title as you see, as you've joined already, you know that we believe that with VOC Populi, it's the software that we've been writing for the past nine months. You can start confidently on your third journey.
00:41
If you'd like to convert your lab or your research to practices that observe the fair data principles. But going into that, how did we get into this topic at first place? And for that, I like to, you know,
01:01
tell a quick little story here, and which goes back about three and a half years ago. And when I started my postdoc and I was wondering, how can we make the next generation lab? And what do I mean by that? I wanted to convert my lab practices
01:23
to preserve data better because I had just finished my PhD and I was bashing myself a little bit, not being able to find a lot of my data from the first years of my PhD. So I was thinking, how can we change our practices to preserve data better, which would actually lead to accelerated lab operations.
01:41
And it will also allow me to share data with other people, but also it will allow me to run machine learning. And we started small, we started just with a few people, but then we grew these efforts to the whole lab because, you know, these things are only useful when there is a good, well-sized community
02:02
observing these fair principles. And as I said, yes, fair data satisfies all of these. And I'm sure you all know what these mean, but basically the benefits that we focused on back then was that it will allow us to communicate with other researchers a lot easier. It will make our information machine actionable.
02:22
And last but not least, something that I always emphasize when I give a talk is that fair data is a brain movement in itself. Although it's not presented that way very often. It's if your data is fair and if you can locate the past experiments that you or somebody else has done,
02:42
it leads to saving of resources in any type of experiments, be it modeling or experimental science, there's a huge amount of resources going into these experiments. So fair data is actually a brain movement. And you might have wondered what this box
03:02
has been doing here the whole time. Well, this is the place where I'm going to introduce our domain science, where we come from, and this is tribology. And for those of you who have not heard of this, this is the science and engineering that deals with the issues of friction, wear, and lubrication, basically anything that happens
03:22
between two surfaces at an interface, it can be, I don't know, it can be your joint in your knee, it can be a space shuttle opening in some antenna once it goes into outer space, or any application in your car where you have two moving surfaces.
03:41
So pretty broad. And the first thing that we did back then was to do a proof of concept. And one thing you should know about tribology is the following, and here is an example of curling. I'm sure you've seen how this game goes, basically, as far as I know, I'm not an expert, but there is a stone being slipped on ice floor.
04:03
And then there are people that navigate that stone to end in a particular place. And the navigation happens by brushing the surface. Maybe that's not the technically right term. The whole idea here is that friction is not the property of the material. It's not a property of the ice
04:20
or the property of the stone. It's a property of what has happened to the surfaces. And you can change that very easily by influencing it. So at the end of the day, being fair in these type of applications means monitoring everything that happens around the particular event.
04:42
And these are the defining factors of what's going to happen to this stone. And for us, if we transfer this to the lab, this is just a timeline of the actions one takes to lead to an experiment down here on the right. We realized that there are so many steps until we get to the experiment that we also have to digitalize and make fair.
05:01
So this is where a lot of our time went to basically make sure that we do everything in this process chain fairly so that we can claim that we actually have a fair experiment. Okay, so then we chose a standard showcase experiment, simplified our thing. We challenged ourselves with, can we do it fairly?
05:20
And we kind of defined for ourselves what does fair mean for tribology, our own field. We published two papers earlier this year on that. And yeah, and this is where we kind of, around this time we filed for the agency proposal and we got funding so that we can bring
05:43
some of our experience into production. So I have a slide of the whole process change just to reiterate the amount of work that goes into preparing a sample for an experiment and the fact that digitalization is another beast. So yeah, so as I said,
06:01
we were at the proof of concept stage and then we've got the agency funding which led us to the current framework, which is basically a set of apps that we're designing so that anyone can tap into our experience and make their operations fair if they like what we do.
06:20
And this is what I'm going to focus on today. I'm going to focus on the second part here on the right. And the framework that we're currently working under is the following, this colorful blob over here. And I'm going to go step by step into this, but I wanted to give you a visual cue in the beginning before I get rid of most of these to build them up.
06:42
So the first story that I wanted to tell here is about a PI who decides that they wanted to publish their data in an open repository. So if you have had no experience with their data though, you probably would jump to electronic lab notebooks
07:00
and you will explore the option of how do we integrate these and how do I integrate these in my work in the lab and put my data there so that I can make it open. Because yes, of course you can just upload an Excel spreadsheet to something like Zenodo, but it's a lot better if you organize an electronic lab notebook.
07:22
That would make, this sort of pipeline would make the process open and will make your data open. However, the schema for how these electronic lab notebooks are used are usually missing. So it kind of depends on the lab notebook. I guess if you're doing chemistry
07:41
and you go to Chemotion, then there will be a lot of things pre-made for you. But if you go to something more general like cutting format, then you will need to define your schema and define how you enter your metadata a lot more freely. And we actually use cutting format and this gave rise to us starting on VOC-populi.
08:04
And this is what the demo will be about today. And VOC-populi in a few words is just a package, which it's a program, which lets you define your metadata schema so that you can maintain your records and the way they're organized
08:20
in the lab notebooks externally, where everybody can do it much simpler without worrying about the database aspects of the electronic lab notebooks. So VOC-populi just deals with the metadata. And to me, this is where the process actually becomes FAIR because we know the FAIR stands for findability,
08:41
accessibility, reusability, interoperability. Defining the metadata schema fairly is a basic prerequisite for making data FAIR. Okay, and now I'm going to jump into my demo where things get risky. And to make it even riskier, I opened a brand new fresh window here
09:00
so that I go through all the steps to get you to using the app. But basically we've put a lot of warnings here, you'll see in a second, because this is really an alpha version of this app, which we released last week, just for convenience of collaboration. But by providing this in the open,
09:20
we want to make sure that the people that get to this app will know all the risks of using it, and basically will talk to us before using it. So it's really for early enthusiasts. So this is the basic website. One can click start, and at the start screen you get,
09:40
hey, this is in testing phase, wait until the second quarter of 23 for a reliable version. Please get in touch with myself or Elia Bagov, who is the major developer of this. And yeah, know that if you publish something, it might become public. And then if you've entered something and you click publish,
10:01
it might become, it might go out in public. So once you get to the app, one prerequisite is that you have a GitLab account, and the app encourages you to go to GitLab and basically log in there. And I will use my own login to sign in.
10:21
And then to make this even harder for myself, I have two factor identification, which I enter, and this will return me back to what probably. Take a second, and there we are. So now we're logged in. And I have an example vocabulary to show you here.
10:44
So basically the first thing to do is to select the vocabulary from a vocabulary list, which exists here. And we're going to use a test internal vocabulary for our tribology lab. Once it's loaded here, you see we have one term in this lab.
11:01
So we have a tribometer, and this is the workhorse of a tribology lab. This is the machine that measures tribological phenomenon. And if I click on it, you can see that it has some details. It has some definition. It has, or it could have synonyms and other,
11:20
some other properties and a data type. There is a discussion board with comments and also a vaulting option. Now, the whole premise of work properly is that metadata schemata are defined collaboratively so that the whole entire lab agrees on a schema
11:45
before each of them has to learn any type of ontology language. And this hides a lot of details and makes it easy for any type of user to go in and start defining what they have in their lab. So say a tribometer, we can edit this. If I see that there is, for example,
12:02
no picture associated with that, picture says a thousand words, so maybe I want to add a picture to this. So what I can do is I can upload an image. There is a picture of a tribometer. I can submit the term. And once we do that, it takes a second.
12:22
This picture will, it's supposed to show up over here. I said this is a alpha demo, so there we go. But- I may intervene. That happened because you didn't choose to, you chose to keep the previous version's images.
12:40
So there is a tick mark, yeah. Usability, right? Yeah. Okay, so this is something we prioritize, usability, but again, it is under construction. So basically this is the mistake I made. I'm ticked that because I don't want to keep the empty field and now the picture will be there.
13:02
And again, this goes back to, you know, we want everyone that uses this as an alpha version to go through us so that Ilya can jump in and, you know, assist when things don't go as expected. But there we go. We have a picture now of a typical tribometer, which is used in the lab.
13:28
So if we go back to the term, one point of collaboration is, yes, you can see the different versions. So now I will go to version update two, which includes the picture that I uploaded.
13:41
You can see that the term is in a state of not approved. And because of that, I also have an approved term button over here. What can happen at this stage is that we can add comments and I will ask Ilya to add a comment over here. I will add a comment about this definition that I think this is complete.
14:04
And I will submit a comment and this will get added over here. There we go. And I suspect Ilya wrote his comment, so I refresh as well.
14:22
There we go. This is a nice pic. Thank you. So now we have the discussion board and all of these comments are recorded here. It's recorded which version of the term was discussed. And if we like all of these, we can click approve, which does a few things in the background. We hide all that complexity,
14:41
but essentially the state of this term will, hey, you have a close to the cap, updates to the capillary reload. Basically this term now moves to a state of being approved. There we go. So this is an approved term. Okay. So this is a very basic functionality.
15:03
Now we will add another experiment, another term, another top level term. So we have the machine. What if we want to add the experiment? So I'll click on your term and I will say tribological experiment.
15:21
And this is a process. And I say, this is how we measure correction. Simple enough. And I would just leave the fields like that and I will submit the term. So again, that does something in the background.
15:43
Hey, you have updates, reload. My term is ready. You can see that the term has been not read yet and I can open it. We can go through the same sort of, you see the definition. We can up vault it, down vault it.
16:01
And once it's ready, the lab, the person who's in charge of the metadata can click approve. And this is then approved. Now, what is interesting here is that, okay, there's an update. If I go back into it,
16:21
there is actually a complex scheme in the background that administers the IDs. So we have multiple IDs to follow the whole process. And this is what makes everything fair. So in the background, every one of these actions of the domain experts putting in data is recorded in a Git-like system.
16:43
And this can then be pulled and it could be assembled as the entire provenance of the vocabulary so that it's shipped with the data when published. And this makes this very fair. So just to show you what happened in the background,
17:02
this is actually KIT Tribology Lab. Yes, there we go. All of the data that I put there is actually on a GitLab account and the user, you know, has full control over it. That's why we had to log in with our GitLab credentials,
17:22
but I'm reloading the page. There we go. All of these terms are actually in here and this turned out to be a lot longer than I thought. But if I look for tribological experiment, you can see that the data that I just entered
17:41
is reflected in here. You know, this is how we measure friction. So everything is actually on GitLab. This is the point I'm trying to make here. And this is just the front end for GitLab in one respect. Okay, so if I go back to the main point, the main page, maybe I want to add another term. I'm going to do that very quickly.
18:01
This is going to be a microscope, micro scope. And this is going to be an object and takes pictures at the very least, I hope. Submit term. And then we reload once it's here
18:22
and you see, I have one more object. One thing that I've hidden so far has been the fact that these terms have details and a tribometer is actually defined by some technical specification, some periphery info and some general info.
18:40
And the technical specifications can be structural information about the machine, motion inducing systems or array info and sensors. And all of these extend until all the details are found. In fact, all of these can be visualized with a graph and explored. So you see a microscope is not approved.
19:00
So it's in red, but the tribometer is. And then you can explore this, say there is a computer to this machine. You can see its roles and the role of this computer can be the motion control of the machine and so on. So going back to back here, you might have noticed that there is a general information field for the tribometer.
19:23
And maybe this field is actually shared between the microscope and the tribometer because it's general information about equipment and defines the operator, the how do you define what the operator is. It defines the equipment product model over here. So maybe we want to apply all this to the microscope
19:41
and we don't need to redo this whole thing under a microscope. All we need to do is actually go to the general info and edit the term. And here, one thing I will do will be to add a broader term.
20:00
And this is a heritage from this cost terminology. But basically if I find microscope in here, I can add it and then I can click submit term. Wait for a second or so. And this will, yep.
20:22
I can also approve this change before it's too late. Great, the risks of life demo, that did not work. But basically if I reload my page now, microscope is the same general information already added to it in the chain.
20:43
So in this way, the metadata schema is reused and made a lot easier to use. So once this is all done, one can actually download this entire thing and their behemoth of information
21:01
as it's stored by book properly. You can view all the terms and all the details that are in there. So this is our internal schema. But one other thing which is now, now it's where it gets interesting. You can make an ontology out of this. And the basic lightweight ontology,
21:22
it's not technically correct, but it is one way you can express the ontology is to serialize it as a SCOS vocabulary. So you can take this whole thing and actually publish it as a SCOS. And this is actually going to take about five minutes to run.
21:41
So we actually pre-run it earlier today. And it's basically makes a new vocabulary for you, which contains now the SCOS serialization of that vocabulary. So in there, now you can see not just the book properly internal representation of these terms, but you can see the entire SCOS vocabulary.
22:03
And this makes it interoperable. It makes the vocabulary reusable in other aspects, in other places. And one interesting aspect to notice here is that we also, on the way, have assigned permanent identifiers to these terms.
22:22
And this is based on the PIDA infrastructure provided by the HMC. So you can see that the PRL that is used is from elmots-metadatin.de. And yeah, this puts these terms
22:40
on the linked data infrastructure. And anyone can open them and look at them. So, yeah. Okay, so, so far we have made our vocabulary. We have published it as SCOS. One aspect that I haven't talked about yet is, okay, how does this connect to data?
23:01
And for that, I'm actually going to open our lab notebook. And in the lab notebook, we use Caddy format, as I said. We actually readily upload data here all the time. We've connected this straight to our machine. So you can see that we make records here seven days ago, four days ago, all the time.
23:20
This is where we put our data. We use an electronic lab notebook and in production, I would say. So one example here would be to show you a record of a tribological experiment. And here you can see details about this experiment. I've hidden the density of the person who did that. But basically, you can see when this was performed,
23:44
what were the conditions. Hey, this was done in air with a controlled humidity and so on and so on and so on. You can see there are a bunch of details here. We did not make the student who conducted this test enter this manually. This was all done by the machine automatically. And the files were uploaded.
24:02
Hey, there's a CSV with the average data. And also this was linked to all the other objects that were used, like the block specimen, another block specimen. And then there is optical surface pro-philometry. The picture was taken of the surface.
24:20
This is also linked to this. But to really bring the point home, if I open the graph of all these records and I extend the depth level, you can see that this graph kind of explodes pretty quickly once you use an electronic lab notebook in production for a few months. But you can also start making interesting conclusions.
24:41
You see nodes here that are fairly central. And no surprise, the lab equipment tribometer is in the center of everything. But why am I explaining all this when I'm talking about the metadata? Well, this is because to administer what these fields here are when making a record,
25:02
we actually are starting to use Voq-popoli. And in Voq-popoli, you can export your vocabulary as a template. And what do I mean by that? Well, I can grab this tribological experiment. Here are my top-level terms.
25:21
I can grab this tribological experiment, and I can say that I want a template for tribological experiment, template. And then I want to put it to CUTI format. To connect to CUTI format, I'd currently use an access token, just to show you what that means.
25:41
Basically, you can generate these tokens, which are temporary passwords, which I've given this one time until 11 today to create templates. So you can steal it now, and you can access CUTI format on my behalf, but you'll only be able to create templates.
26:02
Anyway, I click this button, and I guess, success. If I go back to CUTI format, and I reload my templates, now we see that a few seconds ago, there was a tribological experiment template created. You see, I guess it's still created.
26:23
Oh, this was my bad. Yes, this is correct, but the experiment does not have any sub-terms, so you don't see anything. What I had to do was to create a template about the tribometer, which has a bunch more info.
26:46
CUTI format, as easy as that. And then if I go to here, there is the tribometer. And by the way, all of this is not, I'm at home right now. All of this is running externally.
27:02
This is not on the same database. All of this is done over the internet. Just wanted to bring this point up because this is not our internal server or any magic like that. This is happening this fast over the internet. But in any case, I just shipped all of these definitions that I showed you earlier in here,
27:20
and the template is already made for me so that next time you want to use it, you just need to enter the values. And not just that, but an important point here, and this is where the fair aspect really kicks in. All of these say this term, vibration mitigation methods, this is, you know, how do we make sure that the machine doesn't vibrate? Well, it has two things.
27:42
So it has options. So we can put it on a stone table or stone table with passive pneumatic dampers, other place on the lab bench and so on. But the fair aspect of the whole thing is that the terms IRI is now part of the term as it's in the ELM. If you have seen something similar before,
28:02
please let me know, but I have not. So we really take pride into this option. And if you click on it, I hope this works. There you go. You can explore what the definition is, fully fairly, I would say. And because I'm opening this in a browser, you're actually getting the request which will generate the page.
28:21
But if you're doing this programmatically, you can put the right flags in there and get the SCOS schema in response. Okay, so yeah, this was the end of my demo. I just checked my checklist. These are the basic functionalities of what we have in VOC popular right now.
28:41
And I'm going longer than I thought. So just to accelerate things a little bit, basically what we did in VOC popular right now is that we made new terms, we added details, we discussed them, we approved them, we added pictures. We also published the SCOS. We published the SCOS earlier,
29:02
but it's essentially the same function. And we connected the whole thing to an ELN and added the PIDs, which wrapped the whole thing up and make it fair. Okay, another user story here on my giant slide will be about someone who wants to do some complex machine learning on this whole thing.
29:21
And for that, they will actually need to run a second app also making the scope of the Metacook project with Hurion under the HMC funding scheme, which we're currently developing. This app actually takes your basic vocabulary, I mean basic vocabulary from VOC popularly and converts it in a full fledged ontology.
29:42
And I guess I'll save this for another time, but it's quite intricate in how we do that with machine learning and quite interesting. In any case, once you do that, you can augment your schema to your ELN and make things even more connected. And yeah, I guess just to bring the point home again,
30:03
this is the VOC popularly side of vocabularies. And if you want to make a full ontology where you don't have just a one dimensional structure of terms, but you want to have a fully connected graph, then you'll need a full fledged ontology. You make this stuff in VOC popularly, you make this onto fair cook,
30:21
and you do that with a semi-supervised machine learning method that we have implemented. And the whole thing is actually supported by another few pieces of software that we're making called the fair safe package. And we have a few people working on these as well that basically supports our usage
30:42
of the electronic lab notebooks. We make front end interfaces for a process that don't have a computer associated with them with an app called digital book. We also have a validator. This is very important app, which basically takes all details, all the records, all the data from Caddy format. It takes the schema from VOC popularly
31:02
and compares them and make sure that everything is in check and that we're self-consistent in what we do so that you don't have a 404 on your links once you start looking for some old metadata. We also have analysis software, which we've integrated this whole thing with map lab.
31:21
And as I said, we've integrated the whole thing with live view as well, so that this whole fair thing doesn't happen at the expense of the lab scientists spending extra time filling in millions of details. Okay, so yeah, this is the whole thing, but the point I wanted to make with this
31:42
is that it all starts with the vocabulary. To be fair, you need a vocabulary and it's actually one of the easiest ways to get into the fair ecosystem, to make the vocabulary. Okay, these are my acknowledgements. A bunch of people have contributed to this
32:01
and for VOC popularly, the major funding has been coming from the agency. So we're very grateful to them. And with that, I'd like to thank you and my presentations, very looking forward to your questions.