We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

MaRDMO - An RDMO plugin to document and query MSO Workflows using the MaRDI Knowledge Graph

00:00

Formale Metadaten

Titel
MaRDMO - An RDMO plugin to document and query MSO Workflows using the MaRDI Knowledge Graph
Serientitel
Anzahl der Teile
23
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The Mathematical Research Data Initiative (MaRDI) has set itself the goal of making information about mathematical objects, e.g. algorithms or models, available in a structured and easy-to-find manner in the form of a knowledge graph. The linking of all these objects with concrete research questions, input and output data, software and hardware is done in specific model-simulation-optimization workflows. To achieve reproducibility of these, often interdisciplinary, workflows, detailed documentation is required. For this purpose, a standardized workflow documentation template was developed in the MaRDI project, which can be completed by answering a simple questionnaire in RDMO. Workflows recorded in this way can be published directly on the MaRDI portal. In addition, central information of the documentations is integrated into the MaRDI Knowledge Graph. Next to the pure documentation of workflows, MaRDMO offers the possibility to retrieve existing workflows from the MaRDI Knowledge Graph in order to provide researchers with suggestions for future projects and to document workflows based on these suggestions. Thus, MaRDMO creates a community-driven knowledge loop that could help to overcome the replication crisis.
Vorlesung/Konferenz
Computeranimation
Computeranimation
Computeranimation
Transkript: Englisch(automatisch erzeugt)
I would like to shortly introduce you to mariDMO which is a plugin that we developed inside of mariDMO to basically develop and to basically document and query workflows using our mariDMO graph and why we would like to do
this and how I will shortly explain in the following. At this point I don't think that I have to introduce mariDMO itself anymore because Thomas already did this quite well in the previous talk. I just want to mention that I'm specifically working now in task area 4, so this interdisciplinary task area, what you have seen before, and in this task area we believe that next to
the usual mathematical research data, so like equations, models, software, datasets and so on, there is another important aspect of research data and that is the documentation of workflows, like for example of such model simulation optimization workflows which you all know, where you have a real-world problem
that you somehow simplify and then you have an input and then you use a mathematical model and an algorithm to solve it and then you have an output which you can then interpret to maybe get a solution to the real-world problem. And such workflows are quite interesting and not just for the researchers who actually invent them, so to say invent them, but also for other
researchers to reproduce the initial work but also to use these research, to use these workflows then for their own research, adapt them, further develop them and so on. And to do this of course several things about such workflows need to be known. Trivial things like which research question was addressed in
this workflow, but also things like which mathematical model was used, what algorithms were applied, with which parameters, in which software is all this implemented, what are the properties of the input and output data, can I find somewhere some test data and of course is this workflow reproducible and if yes under which conditions. And answers to all these
questions are mostly, not always, but mostly just found by reading the corresponding publication and then you also have to read through the corresponding supplementary and if you are lucky then parts of such workflow documentations, parts of such workflows were already introduced in
previous publications which you can then also read and this rather implicit documentation of such workflows is of course not really helpful when others want to reproduce or reuse your stuff. And therefore we thought in MARDI that it would maybe be quite nice to have a more explicit documentation of
such workflows, like for example as a wiki page on our MARDI portal which we would then envision something like this, that there would be a first section on this wiki page where all the general information about such workflow is given, so the problem statement, which disciplines are involved, what is the research objective and so on. Then there could be some section where
basically the underlying mathematical model is described, so what are the variables, parameters, what is the discretization. Then there could be some section summarizing all the different process information with the different process steps and the involved methods, software, hardware and so on. And finally
there could also be some section about the reproducibility. And we would envision that some of these sections could be held more in, so to say, free text, whereas other sections could be held more in a strictly tabular form, summarizing basically all the important information. And we would also like
that these wiki pages do not stand, so to say, alone on the web, but they should be connected to other data sources, which is here indicated to some wiki data identifiers used, but all the identifiers of their written are not correct, but they should establish some link to another database. And what we then also like is that we do not just have wiki pages, but
we would like to be able to specifically search for individual workflows. So we would like to be able to search for a workflow answering a specific research question, using a specific model, a specific algorithm or input data. And therefore we thought it would also be quite nice to integrate
certain aspects of these workflow documentations also into our multi-knowledge graph, because then we can do some simple queries on this graph and we would get relevant workflow spec. And that were the two ideas we had basically in the beginning and then we thought, okay, how can we now document such workflows? And therefore in the first step, we developed basically
two documentation templates, one for workflows, which are more theoretical, so to say, and one for workflows, which also have an experimental component. And then we thought, okay, how can we now get other researchers to actually use our templates to document their workflows? And the
first rather naive approach was maybe to simply distribute these templates among colleagues and ask them to document one of the workflows. This approach wasn't really successful. However, I have to say that the very, very, very few people who actually tried it out were quite positive. And they said that they really liked this to reflect the own
workflow in such detail once again, and they would really like to incorporate this into their future research routine. But overall, this was of course no approach that could be scaled any larger. The second idea we had then was simply to generate some online questionnaire on some web page where the scientists could then go fill out this questionnaire and the answers would be sent to us and we could
then incorporate this into the Marley portal. However, we haven't followed this way very long. And then we had the idea that there's already software around, like for example, the Research Data Management Organizer, which at my institute, the SUSE Institute, in principle, at least every researcher should use
to do this research data management. And therefore, we thought the software already exists. Scientists are using it. Why don't we create a questionnaire for RDMO to document workflows in RDMO? And then we could also develop something like an export plugin, which would connect the RDMO instance with our Marley portal, such that the researchers could fill out the
questionnaire, could click on one button and their workflow would be published on our portal. So we would provide the researchers with instant visibility, which we think is a very important aspect in science. And we would allow them through the export plugin to query our margin-knowledge graph. And for those of you maybe who do not know the Research Data
Management Organizer, or RDMO, it's basically a project or software that started being developed in a DFG project. It's a web application for dynamic research data management and data management plans. Up to today, there are numerous discipline-specific question catalogs around and it's used in
quite many research institutions in Germany and also tested in some institutions outside of Germany. And from what I hear, the number of institutes is growing. And also the National Research Data Initiative, I think, wants to use overall RDMO. Therefore, we thought it would be a good idea to create this Marley-MO plugin. Yeah, and of course,
as Marley-MO is a plugin to RDMO, it lives somewhere inside of RDMO and consists basically of two things, the export plugin and the questionnaire. And the questionnaire has basically questions serving three different purposes. So some questions simply are parameters,
so to say, for the export plugin. Then there are questions for the workflow documentation and questions for the workflow search. And if now the researcher wants to do a documentation and finish the questionnaire, he can export his workflow documentation in different formats or he can publish it as a wiki page on the Marley portal and
then some parts would also be integrated into the knowledge graph. And if he decides to do a workflow search, we would query our knowledge graph and we would return appropriate wiki pages. And here's just a short look at the question catalog of Marley-MO, so I'm not going through all the questions in detail, just
showing you the seven different sections we have. As I said before, they're serving three different purposes. So, for example, in the first section, we just want to know if the researcher wants to search or document the workflow. If he wants to search a workflow, then he has to say by which entity he wants to search, so software, method, research
objective, and so on. And then he would have to describe this. And if, on the other hand, he wants to do a workflow documentation, he would have to say if he wants to document locally or with the help of the Marley portal, and then all the questions for the workflow documentation would come where a general description would need to be given, the model would have to be described, the process
information would have to be given, and something about the reproducibility would have to be stated. Overall, there are quite a lot of questions that need to be answered, but for once, the number of questions to be answered will decrease with the number of workflows already there, because then there's already more information in the knowledge graph which can be basically included, and some questions simply get obsolete. And
secondly, as I said before, we are using two different documentation templates, and not all questions are listed here, and not all question sets are relevant for both these templates. And the other thing of Marleyo was this export plugin. It's a simple Django app that can be installed in Marleyo as every other app. It's simply
executed by clicking one button, and if you then do a workflow documentation, you can either download your documented workflow as a markdown file, you can preview it in the browser to see, for example, if it's rendered correctly, or you can send it to the Marley portal. And if you send it to the portal, as I said
before, there will be this knowledge graph integration. Therefore, the plugin would carry existing databases, it will use entries from there, it will copy them to appropriate places, and if nothing's there, it will create the entities by itself. And if, on the other hand, you decide to do workflow search, we will simply do sparkle queries on the knowledge graph and return appropriate wiki pages. And here's just an
example of a small part of such a knowledge graph integration. So, for example, when we create a new workflow entity in the Marley knowledge graph, then the user would be asked in the questionnaire if this workflow is based on some publication, and then he would have to provide a Dory, and then the export plugin would check, is an entity with such a Dory already in the Marley knowledge graph, and if yes, it
would connect this entity to the new workflow entity. If not, we would carry the wiki data knowledge graph to check for an entity there, and if there is an entity, we would basically copy this entity into the Marley knowledge graph and connect it, and if not, we would use the Dory to carry the Dory and Orchid RP to get
the full citation of this publication and create a new entity for this publication inside the Marley knowledge graph and connect it to the workflow entity. Yeah. And here's just exemplarly shown now how this would look like on the Marley portal. So, on the right hand side, we have the wiki page describing some
workflow. As you have seen at first, there was a lot of free text, so to say, and then there come a lot of tables summarizing all the information, and on the left hand side, we have the corresponding knowledge graph entry, which is not so nice for the human user, but of course, it's not meant to be directly used by the human user, and then it would be a workflow search. So, in RDMO, we would
go into the questionnaire. We would decide to do a workflow finding. In this case, we would want to search now specifically for software, in this case, MATLAB. We would save our decision, return to the project page, click on Marley export, and we find then two workflows on our Marley portal and can follow the link to the corresponding wiki page.
And what we hope overall to get by Marley MO is something like a knowledge loop, so that users actually document their workflows in RDMO, publish them on the portal, but also retrieve workflow documentations from others, reproduce them, reuse them, further develop them, and then again, document
them on the portal, such that the information is kept alive and reproducible, and there we can also maybe help to overcome this reproduction crisis. Thereby, I'm already at the end of this short talk. I simply want to say that we from Marley think that such workflows should be documented more explicitly, and therefore we developed Marley MO.
Future tasks are also things like quality control. We have to think about, and you can check out the Marley portal on the web. You can also test Marley MO. The version on GitHub is not yet connected to the real portal, but you can play around with the test portal. And yeah, thank you for your attention.