The MaRDI portal for mathematical research data
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 22 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/57499 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Leibniz MMS Days 202213 / 22
11
19
00:00
MathematicsComputer animation
00:16
Variety (linguistics)PermanentNatural numberIndependent set (graph theory)Cluster samplingBuildingStandard deviationDensity of statesArithmetic meanDirected graphCartesian coordinate systemDifferent (Kate Ryan album)MathematicsMereologyKörper <Algebra>Projective planeResultantRoundness (object)PermanentVariety (linguistics)Natural numberPulse (signal processing)SummierbarkeitNumerical digitState of matterStandard deviationLatent heat1 (number)TorusComputer animation
05:01
Mach's principleAlgebraDisintegrationObject (grammar)Mathematical analysisStatisticsGraph (mathematics)Different (Kate Ryan album)AreaNumerical analysisProjective planeModel theoryMathematicsGraph (mathematics)StatisticsPower (physics)INTEGRALField extensionExpressionBuildingDirected graphVector potentialTable (information)Sign (mathematics)Object (grammar)Well-formed formulaTorusThomas BayesCommutatorMathematical modelModulform1 (number)Computer animation
07:45
Mathematical analysisStatisticsGraph (mathematics)DisintegrationFunction (mathematics)Interface (chemistry)Hill differential equationFunction (mathematics)Universe (mathematics)Many-sorted logicIdentifiabilityBasis <Mathematik>Projective planeInterface (chemistry)Presentation of a group1 (number)Category of beingWell-formed formulaStrategy gameTime domainMereologyMultiplication signMathematical objectFunctional (mathematics)Symmetric matrixAreaMathematicsDifferent (Kate Ryan album)Latent heatGraph (mathematics)Position operatorSlide ruleLinearizationBounded variationDirected graphMatrix (mathematics)Descriptive statisticsLink (knot theory)Statistical hypothesis testingAdditionGroup actionGrothendieck topologyArithmetic meanTranslation (relic)Rule of inferenceCausalityCycle (graph theory)State of matterArithmetic progressionMetreKörper <Algebra>Product (business)Extension (kinesiology)Goodness of fitResultantPossible worldFlow separationGreen's functionComputer animation
Transcript: English(auto-generated)
00:01
So our next speaker is David Nolte from Zuse-Institut Berlin and he's a postdoctoral researcher in the MARDI team and he will be talking today about the MARDI portal for mathematical research data.
00:27
Yes, thank you for the introduction. Nice to be here. So this talk is not about research results but more on a community service that we are developing in the MARDI portal.
00:42
So this is part of the MARDI project, specifically of the Task Area 5 which is hosted at the Zuse-Institut and at FXAT. So what do I mean by portal? It's really a wiki, a web portal based on media wiki, wiki-based technology stack known
01:04
from Wikipedia and Wikidata which will allow us to access research data, to find and access research data from a variety of sources, repositories that are right now existing
01:20
but in many different locations, not connected to each other and not communicating. So here we are putting these things together and thus try to, this is an important step towards implementation of fair principles for research data.
01:42
I'm going to talk about what this means in a second. First some context, so this is part of the National Research Data Initiative that's managed by DFG and started a couple of years ago. It consists of 30 consortia representing the German scientific landscape and the aims
02:05
of the project are to create a permanent digital repository of knowledge in which research data is made available to researchers and according to these fair principles. So what are these principles? Fair principles are guides for data management, research data management and stewardship
02:27
in the digital age, specifically not only suited to human users but also to machines, so to automatise applications for example such that these can also find, explore,
02:42
access data automatically. So FAIR stands for findable, accessible, interoperable and reusable. This means that data has to have rich annotation by means of metadata such that they can be easily found also by machines.
03:00
When data is found it needs to be accessible so there have to be protocols that ensure that the data actually can be accessed. Interoperable means that since data is part of some data workflow, work progress, there needs to be a common language of data, how the data is described by metadata such
03:22
that data from different sources and tools can actually be linked and understand each other. And finally the ultimate goal of these fair principles is that data is reusable, that other researchers can work with data once it has been published from another institute
03:42
for instance and that research is reproducible which is a big issue in data intense fields right now. Okay, so the NFDI is currently made up of 19 consortia from engineering sciences,
04:03
natural sciences, humanities and social science and life sciences and a third round of calls is currently undergoing and by start of next year there is going to be in total 30 institutions, so right now it's 19 then 30. One of these consortia is MARDI which is the mathematical research data initiative
04:28
representing all of mathematics. Its mission is to create a robust infrastructure for mathematical research data and to set standards, confirmable workflows for certified data that is trusted
04:44
and validated and also to provide services for the mathematical community and also for the wider community. It also has a vision which is formulated to build a community that embraces a fair data culture and research look.
05:02
So just briefly on how the MARDI project is constructed. So there are four task areas which are from different subdomains of mathematics, so there's computer algebra, scientific computing, statistics and machine learning and also an interdisciplinary task area which works together with other consortia
05:28
and these try to implement the fair principles in their respective subfields which have different needs, different requirements, different kinds of data and all these or their findings are going to be implemented in the portal
05:40
which we're building and then there's also task area for data culture and community integration and governance. So what is mathematical research data? It's not only numerical data, tables of numbers which one might think at first but also mathematical expressions and formula, mathematical models,
06:05
software, codes, implementation of algorithms, 3D objects, visualizations and of course documents also and possibly much more. All these have to be considered in this framework.
06:20
So the current situation is that there are already a lot of interesting and very powerful services and data repositories for different communities. For example there's OpenML, the more Vicky, ZPMath which are important but they're not connected to each other so they perform a service
06:43
for a certain community but that's it. They're placed in a silo where they're all disconnected. In order to harvest really the potential of digital data and digital data repositories this project wants to link this knowledge and to make it all accessible
07:06
also between services. So this portal that I'm talking about will be a unified gateway just a centralized solution where we can access the data from all these different sources. And like I said for machines and also for humans.
07:24
And then there's a couple of services that are planned that are going to be implemented within during this project like for example knowledge graphs, databases for numerical algorithms, a model database, a benchmark framework,
07:40
repositories for computer algebra and many more. And also there's going to be external services like Zenodo or the digital library for mathematical functions which are going to be integrated within the portal. Just to describe how this can be useful, a very simple use case.
08:01
We want to start working on an American method for example linear solver like maybe create a new variant of GMRES. Then we could go to the portal and explore it for all the variations we have of that algorithms. For example the algorithm database would give us the algorithms,
08:25
their relationships, we get all the publications on the topic, software implementations, software environments, actual code within virtual containers, maybe a test and performance data. And then there's the actual data sets that we can work on,
08:42
data that has been certified, that we can trust, that we can use for benchmarks. We can see who are the experts in the field, whom should we contact if we start working on that topic. And then there's going to be the services also that are being provided by the community. The benchmark framework for example
09:01
that may be in the future a possibility to remotely execute algorithms and workflow and data storage. So for the technical side, the portal consists of a knowledge graph really which is based on Wikibase
09:20
which is used by Wikidata for example. It is very proven and widely used and highly scalable. Also widely used by the other consortia of the NFDI which we're going to work with. This knowledge graph will only import metadata so we're not going to import all the database in the world
09:41
but try to get the metadata, leave the data and the actual repositories but link to the data. And there's some requirements that we have to make in order to account for mathematics. So there's an ontology of mathematical data. What are the relevant properties that mathematical objects have?
10:01
What are the data types? So this has to be accommodated. There are going to be persistent identifiers that the data is actually citable. Also if the real data is deleted we can still access the metadata and so forth. There's going to be advanced mathematical search functions
10:21
like formula search is already implemented. We're going to have a talk about this after mine. Advanced filter filtering for mathematical properties in the search. For example we want only data sets for symmetric positive definite matrices
10:42
and we can filter for those. The knowledge graph is generated in an automatized fashion from created sources that we trust. So there's no need to put in data by hand but this is all automatized and in addition there's going to be APIs
11:00
for our partners such that they can also insert their data. So some more features like I said there's going to be a user friendly interface and also possibility to do advanced queries to the knowledge graph by means of the SPARQL query language.
11:20
There's going to be a machine actionable interface. We're going to provide an extensive documentation and best practice guides and the MADI knowledge graph is going to be integrated with the overarching knowledge graph of the entire NFDI. So there's going to be links
11:41
and interactions with the other disciplines. Finally in the future there may be a distributed computing service such that also codes, algorithms, workflows can be directly executed through the portal and act on the data that's stored there.
12:04
Okay that's it already. Expected launch date is end of the year. There's already the initial version available on this link where you can maybe track the progress but until then thank you for your attention. Thank you very much.
12:25
Now it's time for questions. Thank you very much for your presentation. You talked about data that you trust which kind of raises the question on what basis can you decide
12:42
what data to trust? Sorry I think it's a bit provocative question. It's a question of the domain specific task areas. I think they will provide criteria
13:01
or provide the actual data also that they know they can trust. I don't know how they will decide that. Okay thanks. It's a very important question obviously. My question is on data,
13:23
the amount of data storage as we heard in the presentation from Professor Rizzola. Data in the order of pentabytes. Is it anticipated such size of data will also be possible to store on MADI?
13:41
No because we're not going to store the actual data. We assume that data is already in some repository of some group, some institution and we just import the metadata. So the description of the data. We can find it because we have all the information
14:02
that describes the data. What is there? How is it stored? What are their properties and also the link to the actual data and the information on how to get the data from the source. But we're not going to copy the data to our site. Okay thanks.
14:25
How are the metadata unified? When you say they are already the repositories so they might have different kinds of metadata. How is it now then realized that you have this top level
14:42
and that you get all the information or that's not different if I look for the information and find something and then go to that and it is different in every place. How is it managed? Well part of the repositories the ones that were on the slide there they're also part of the MADI project.
15:01
So I guess they are working also on providing or on using metadata that interacts or links. On the other hand we have to well that's part of the
15:20
well there's some universal languages that can be used to describe metadata and also translators so there's some kind of flexibility but I think our one part is with partners that we can work with who can maybe be coerced to make the data
15:40
such that we can use it and the other ones that we cannot that we just have to accept and try to use the most universal way to describe it.
16:01
Yeah a major problem when the data is stored elsewhere or distributed among all sorts of repositories is linked rod. So where I don't know if sometime in the 2030s MADI is live, it is well accepted people are using it and
16:21
I don't know maybe I'm looking for an interesting project someone did a decade ago and the data is supposed to be on some university servers and in the meantime the whole IT infrastructure over there has changed and the data no longer exists or somewhere else
16:42
are there strategies in place or planned for that sort of maintenance to monitor disappearing data or automatically or manually chase down
17:01
change locations that sort of thing Yeah there's no actual details up to now but we're going to implement a data lifecycle management so we're aware of that and we're going to track the data is still available
17:22
or not as part of the plan. Also sometimes it's clear already from the beginning that the publishers for example they can guarantee how long the service is going to be available in other cases it's not but the idea is that at least
17:41
the metadata is always there and can be cited even if the data has been deleted but of course this has to be done in a controlled manner and not just that the data disappears without letting people know but yeah that's going to be done.
18:02
Okay let's thank the speaker again Thank you