10 Years Later: The Mathematics Subject Classification and Linked Open Data - TIB AV-Portal

10 Years Later: The Mathematics Subject Classification and Linked Open Data

00:00

3

Zugehöriges Material

Conference on Intelligent Computer Mathematics (CICM)

Runnwerth, Mila

Formale Metadaten

Titel

10 Years Later: The Mathematics Subject Classification and Linked Open Data

Autor

Runnwerth, Mila

Mitwirkende

0000-0002-1019-9151 (ORCID)

0000-0002-4957-5812 (ORCID)

Schubotz, Moritz

0000-0001-7141-4997 (ORCID)

Lizenz

CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.

Identifikatoren

10.5446/54361 (DOI)

Herausgeber

Conference on Intelligent Computer Mathematics (CICM)

Erscheinungsjahr

Sprache

Produktionsjahr

2021

Produktionsort

Timisoara, Rumänien

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

Ten years ago, the Mathematics Subject Classification MSC 2010 was released, and a corresponding machine-readable Linked Open Data collection was published using the Simple Knowledge Organization System (SKOS). Now, the new MSC 2020 is out. The new SKOS form includes explicit marking of the changes from 2010 to 2020, some translations of English code descriptions into Chinese, Italian, and Russian, and extra material relating MSC to other mathematics classification efforts. We also outline future potential uses for MSC2020-SKOS in semantic indexing and sketch its embedding in a larger vision of scientific research data.

Schlagwörter

Mathematics Subject Classification

Linked Open Data

Knowledge Engineering

Transkript
Annotationen

Sprache

Text

Bild

00:00

ZahlentheorieKlasse <Mathematik>MultiplikationssatzProzess <Physik>IndexberechnungGruppendarstellungAusdruck <Logik>DifferentialGraphAlgebraisches ModellÜbergangEndlich erzeugte GruppeModelltheorieDerivation <Algebra>MaßerweiterungUmwandlungsenthalpieIndexberechnungZweiIdentifizierbarkeitKlasse <Mathematik>RelativitätstheorieModulformMathematikGruppenoperationGüte der AnpassungNumerische MathematikWiderspruchsfreiheitKartesische KoordinatenEvoluteAlgebraische StrukturAusdruck <Logik>KonditionszahlProzess <Physik>Mereologiep-BlockMetrisches SystemGruppendarstellungStandardabweichungStichprobenfehlerOffene MengeRechenschieberDichte <Stochastik>VerschlingungComputeranimation

Transkript: Englisch(automatisch erzeugt)

00:00

Thank you for being able to give this presentation. We are a group of five people who try to get the mathematics subject classification to a linked open data representation. And this is how we try to do it. This is still work in progress, I have to warn you.

00:22

We're not finished yet. And you will see what the challenges were. First, for all of you who are not familiar with the mathematics subject classification, it's an indexing schema, which is used in libraries or in publishing houses,

00:41

but even in research institutions to annotate mathematical information and literature. But it's also used to get metrics on scientific activities, for example. It is, well, the most popular indexing schema

01:03

in mathematics. I think it's the only one with this reputation. And it is updated every decade. So who is editing this classification system? It's both edited by the mathematical reviews, which

01:23

are located in America, in the US. And it is edited by Zibi Maath, which is like the European sister institution, we can say. And there is a little ad block right now, because Zibi Maath has become open this year.

01:44

And I would like to encourage you to check out the link. You find these slides on Zenodo. And can participate in Zibi Maath open or make a survey. But back to the MSc now.

02:02

We are in the fourth decade of the MSc. The MSc has become 40 years old in this year, actually, because it started 19, 30 years old, sorry, 1991. And since it is updated every 10 years, each decade,

02:22

we are now in the fourth generation of the MSc, which has grown over time. And what you can see here is that it has no home for all the generations.

02:41

The first generation can be found on MSc 2010. But the second generation already, we had to find on Google, on some other mathematics website. Luckily, the MSc 2010 is still very much findable

03:02

on the official channels. And MSc 2020 can be found both on Maath SciNet and Zibi Maath, but only in human readable form. And this is the core of this talk that we wanted to transfer the MSc from human readable

03:21

into machine readable. But we were not the first to do this. And I will come to this in a moment. A little numbers and facts about the mathematics subject classifications. There are more than 6,000 classes altogether

03:43

with respect to the MSc. We have like three hierarchical levels. There are 63 top levels, which include like algebraic geometry, graph theory, differential equations, numerical analysis.

04:03

So like the top topics in mathematics. Then we have second level classes, which go into the 1,000. And we have third level classes of which there are more than 5,000, which cover like the real deep topics.

04:25

Also, there are meta classes like for reference works or historical works. And new in this generation is that each top level class is assigned a meta class on research data and software, which we are quite happy about.

04:44

If you are wondering how the MSc is structured, you can see two nice pictures here. There is a Wolfram project, which did this visualization, which shows you that the structure varies quite a lot in the MSc, which depicts

05:06

the research activity, but also the evolution of a mathematical research topic. So in 2010, there has already been

05:22

a SCOS version of the MSc, which we tried to copy and improve. And what did it do exactly? Its motivations were that you could

05:40

derive other machine interpretable versions or human readable versions, like that you can derive a LaTeX version or an HTML version from the linked open data version, which is still a bit of an ideal. It's not really possible right now.

06:03

Then they wanted to encourage people to use semantic applications. They wanted to do visualizations, which we saw earlier. One very important use case is mapping to other classifications, especially in libraries

06:23

we do this, because we mostly use one classification and try to derive this one classification to other classifications. And I think it should encourage an open maintenance process and an open editorial process.

06:44

What happened there? Well, they used SCOS core, which is very evident. But they also introduced an extension, MSc vocab, which should have, well, the MSc has own relations,

07:02

which we see below this SCOS core thing. See also, see mainly related part of, and very interesting, the C conditionally. If you only know this PDF version, for example, which is shown here, I don't know if you see my mouse,

07:22

then you see these brackets, I think they are, these very edgy braces. And these are one of the relations which we try to cover in an MSc style, because they do not relate to these SCOS relations of the same name.

07:45

And what does it look like for semantic web aficionados? You see that we loaded this in Protege. And Protege was very unhappy with this 2010 version with a lot of errors.

08:02

And Protege didn't recognize the standards that were loaded into this SCOS version, for example, the SCOS. And you see that we have, like in this matrix, we have a lot of zeros, which showed us that something must be wrong. And what happened there?

08:23

One of the things that happened is that the URIs were not, were misconstructed in some way. We found spaces, for example, which are not allowed in a URI. Then we found that XML literals were not very tidily used, and we tried to repair that.

08:46

And yes, so we tried to standardize and harmonize the situation a bit. However, there were really great things already, which are like the best practices in semantic web.

09:03

Each URI that could be resolved led to a web page which provided information on the specific URI. Here it is 68 minus xx, which is the top level

09:20

class of computer science. And we could see all the triples that were given and all that has to be extended, for example. Oh, sorry. For example, now I got lost.

09:41

Oh, this is a bad example. Yeah, the RDF we had to include, for example, and XML. So what were the lessons learned of this first SCOS version?

10:01

There were like two layers that were learned. First, for the editorial process, they standardized the relations within the MSC and separated the class names and the descriptions, which bled into one another, kind of.

10:20

And we had some lessons learned with respect to linked open data standards. We fixed the identifiers. We changed the reification, which we'll see in a minute. The XML literals were tidied up. And we had to find the MSC vocab extension

10:42

because that was lost. And that was an indication for us that seemingly nobody asked about this extension, without which the SCOS model wasn't workable at all. But we found it and we tried to tidy it as well.

11:02

So what did we do exactly? So what is new? We have specific use cases. We didn't do it just for fun, but because we have use cases that are waiting for this version, actually. One is the automated indexing in libraries,

11:22

where we use UNIF to do this. And UNIF needs the classification in form of a SCOS model. Then we have a national-wide extensive classification mapping where every library classification and every terminology is mapped onto another automatically,

11:43

which works more or less. Then we need a mathematical template for the Open Research Knowledge Graph. The Open Research Knowledge Graph tries to derive semantic web models for articles and conference contributions.

12:03

And we learned quite early that without a classification for each subject, this doesn't work very well. And we think that the MSC might be a big improvement there. Then we did a consistent formula representation.

12:24

There were like MathML was mixed with LaTeX and so on. And we decided to do only MathML. And we tried to automate this whole process that we don't have to like type everything. And we used OpenRefine, Protege, and Excel for this.

12:45

And we didn't use the Python scripts that did the transition from LaTeX to SCOS last time, but we tried to do it from SCOS model to SCOS model. And one major thing that was asked of us

13:02

was to give a good license information. So we introduced a license within the turtle file actually, and within the GitHub repository where it is now. But this can only be like a provisionary solution because we need a proper home

13:21

to guarantee the transitions from one MSC model to the next. Thank you very much.