MaRDMO - An RDMO plugin to document and query MSO Workflows using the MaRDI Knowledge Graph
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 23 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/62069 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Leibniz MMS Days 20233 / 23
00:00
Vorlesung/Konferenz
00:16
Computeranimation
05:57
Computeranimation
11:38
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:05
I would like to shortly introduce you to mariDMO which is a plugin that we developed inside of mariDMO to basically develop and to basically document and query workflows using our mariDMO graph and why we would like to do
00:20
this and how I will shortly explain in the following. At this point I don't think that I have to introduce mariDMO itself anymore because Thomas already did this quite well in the previous talk. I just want to mention that I'm specifically working now in task area 4, so this interdisciplinary task area, what you have seen before, and in this task area we believe that next to
00:44
the usual mathematical research data, so like equations, models, software, datasets and so on, there is another important aspect of research data and that is the documentation of workflows, like for example of such model simulation optimization workflows which you all know, where you have a real-world problem
01:02
that you somehow simplify and then you have an input and then you use a mathematical model and an algorithm to solve it and then you have an output which you can then interpret to maybe get a solution to the real-world problem. And such workflows are quite interesting and not just for the researchers who actually invent them, so to say invent them, but also for other
01:23
researchers to reproduce the initial work but also to use these research, to use these workflows then for their own research, adapt them, further develop them and so on. And to do this of course several things about such workflows need to be known. Trivial things like which research question was addressed in
01:43
this workflow, but also things like which mathematical model was used, what algorithms were applied, with which parameters, in which software is all this implemented, what are the properties of the input and output data, can I find somewhere some test data and of course is this workflow reproducible and if yes under which conditions. And answers to all these
02:05
questions are mostly, not always, but mostly just found by reading the corresponding publication and then you also have to read through the corresponding supplementary and if you are lucky then parts of such workflow documentations, parts of such workflows were already introduced in
02:21
previous publications which you can then also read and this rather implicit documentation of such workflows is of course not really helpful when others want to reproduce or reuse your stuff. And therefore we thought in MARDI that it would maybe be quite nice to have a more explicit documentation of
02:41
such workflows, like for example as a wiki page on our MARDI portal which we would then envision something like this, that there would be a first section on this wiki page where all the general information about such workflow is given, so the problem statement, which disciplines are involved, what is the research objective and so on. Then there could be some section where
03:03
basically the underlying mathematical model is described, so what are the variables, parameters, what is the discretization. Then there could be some section summarizing all the different process information with the different process steps and the involved methods, software, hardware and so on. And finally
03:22
there could also be some section about the reproducibility. And we would envision that some of these sections could be held more in, so to say, free text, whereas other sections could be held more in a strictly tabular form, summarizing basically all the important information. And we would also like
03:40
that these wiki pages do not stand, so to say, alone on the web, but they should be connected to other data sources, which is here indicated to some wiki data identifiers used, but all the identifiers of their written are not correct, but they should establish some link to another database. And what we then also like is that we do not just have wiki pages, but
04:06
we would like to be able to specifically search for individual workflows. So we would like to be able to search for a workflow answering a specific research question, using a specific model, a specific algorithm or input data. And therefore we thought it would also be quite nice to integrate
04:22
certain aspects of these workflow documentations also into our multi-knowledge graph, because then we can do some simple queries on this graph and we would get relevant workflow spec. And that were the two ideas we had basically in the beginning and then we thought, okay, how can we now document such workflows? And therefore in the first step, we developed basically
04:44
two documentation templates, one for workflows, which are more theoretical, so to say, and one for workflows, which also have an experimental component. And then we thought, okay, how can we now get other researchers to actually use our templates to document their workflows? And the
05:01
first rather naive approach was maybe to simply distribute these templates among colleagues and ask them to document one of the workflows. This approach wasn't really successful. However, I have to say that the very, very, very few people who actually tried it out were quite positive. And they said that they really liked this to reflect the own
05:21
workflow in such detail once again, and they would really like to incorporate this into their future research routine. But overall, this was of course no approach that could be scaled any larger. The second idea we had then was simply to generate some online questionnaire on some web page where the scientists could then go fill out this questionnaire and the answers would be sent to us and we could
05:43
then incorporate this into the Marley portal. However, we haven't followed this way very long. And then we had the idea that there's already software around, like for example, the Research Data Management Organizer, which at my institute, the SUSE Institute, in principle, at least every researcher should use
06:02
to do this research data management. And therefore, we thought the software already exists. Scientists are using it. Why don't we create a questionnaire for RDMO to document workflows in RDMO? And then we could also develop something like an export plugin, which would connect the RDMO instance with our Marley portal, such that the researchers could fill out the
06:22
questionnaire, could click on one button and their workflow would be published on our portal. So we would provide the researchers with instant visibility, which we think is a very important aspect in science. And we would allow them through the export plugin to query our margin-knowledge graph. And for those of you maybe who do not know the Research Data
06:42
Management Organizer, or RDMO, it's basically a project or software that started being developed in a DFG project. It's a web application for dynamic research data management and data management plans. Up to today, there are numerous discipline-specific question catalogs around and it's used in
07:01
quite many research institutions in Germany and also tested in some institutions outside of Germany. And from what I hear, the number of institutes is growing. And also the National Research Data Initiative, I think, wants to use overall RDMO. Therefore, we thought it would be a good idea to create this Marley-MO plugin. Yeah, and of course,
07:24
as Marley-MO is a plugin to RDMO, it lives somewhere inside of RDMO and consists basically of two things, the export plugin and the questionnaire. And the questionnaire has basically questions serving three different purposes. So some questions simply are parameters,
07:43
so to say, for the export plugin. Then there are questions for the workflow documentation and questions for the workflow search. And if now the researcher wants to do a documentation and finish the questionnaire, he can export his workflow documentation in different formats or he can publish it as a wiki page on the Marley portal and
08:02
then some parts would also be integrated into the knowledge graph. And if he decides to do a workflow search, we would query our knowledge graph and we would return appropriate wiki pages. And here's just a short look at the question catalog of Marley-MO, so I'm not going through all the questions in detail, just
08:20
showing you the seven different sections we have. As I said before, they're serving three different purposes. So, for example, in the first section, we just want to know if the researcher wants to search or document the workflow. If he wants to search a workflow, then he has to say by which entity he wants to search, so software, method, research
08:40
objective, and so on. And then he would have to describe this. And if, on the other hand, he wants to do a workflow documentation, he would have to say if he wants to document locally or with the help of the Marley portal, and then all the questions for the workflow documentation would come where a general description would need to be given, the model would have to be described, the process
09:01
information would have to be given, and something about the reproducibility would have to be stated. Overall, there are quite a lot of questions that need to be answered, but for once, the number of questions to be answered will decrease with the number of workflows already there, because then there's already more information in the knowledge graph which can be basically included, and some questions simply get obsolete. And
09:23
secondly, as I said before, we are using two different documentation templates, and not all questions are listed here, and not all question sets are relevant for both these templates. And the other thing of Marleyo was this export plugin. It's a simple Django app that can be installed in Marleyo as every other app. It's simply
09:43
executed by clicking one button, and if you then do a workflow documentation, you can either download your documented workflow as a markdown file, you can preview it in the browser to see, for example, if it's rendered correctly, or you can send it to the Marley portal. And if you send it to the portal, as I said
10:00
before, there will be this knowledge graph integration. Therefore, the plugin would carry existing databases, it will use entries from there, it will copy them to appropriate places, and if nothing's there, it will create the entities by itself. And if, on the other hand, you decide to do workflow search, we will simply do sparkle queries on the knowledge graph and return appropriate wiki pages. And here's just an
10:22
example of a small part of such a knowledge graph integration. So, for example, when we create a new workflow entity in the Marley knowledge graph, then the user would be asked in the questionnaire if this workflow is based on some publication, and then he would have to provide a Dory, and then the export plugin would check, is an entity with such a Dory already in the Marley knowledge graph, and if yes, it
10:42
would connect this entity to the new workflow entity. If not, we would carry the wiki data knowledge graph to check for an entity there, and if there is an entity, we would basically copy this entity into the Marley knowledge graph and connect it, and if not, we would use the Dory to carry the Dory and Orchid RP to get
11:00
the full citation of this publication and create a new entity for this publication inside the Marley knowledge graph and connect it to the workflow entity. Yeah. And here's just exemplarly shown now how this would look like on the Marley portal. So, on the right hand side, we have the wiki page describing some
11:20
workflow. As you have seen at first, there was a lot of free text, so to say, and then there come a lot of tables summarizing all the information, and on the left hand side, we have the corresponding knowledge graph entry, which is not so nice for the human user, but of course, it's not meant to be directly used by the human user, and then it would be a workflow search. So, in RDMO, we would
11:41
go into the questionnaire. We would decide to do a workflow finding. In this case, we would want to search now specifically for software, in this case, MATLAB. We would save our decision, return to the project page, click on Marley export, and we find then two workflows on our Marley portal and can follow the link to the corresponding wiki page.
12:02
And what we hope overall to get by Marley MO is something like a knowledge loop, so that users actually document their workflows in RDMO, publish them on the portal, but also retrieve workflow documentations from others, reproduce them, reuse them, further develop them, and then again, document
12:21
them on the portal, such that the information is kept alive and reproducible, and there we can also maybe help to overcome this reproduction crisis. Thereby, I'm already at the end of this short talk. I simply want to say that we from Marley think that such workflows should be documented more explicitly, and therefore we developed Marley MO.
12:40
Future tasks are also things like quality control. We have to think about, and you can check out the Marley portal on the web. You can also test Marley MO. The version on GitHub is not yet connected to the real portal, but you can play around with the test portal. And yeah, thank you for your attention.