ANTELOPE: open source service for accessible semantic annotation in GLAM
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 15 | |
Author | ||
Contributors | ||
License | CC Attribution - ShareAlike 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/69648 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
12
00:00
Computer animationLecture/ConferenceMeeting/Interview
10:06
Computer animation
11:38
Computer animation
12:51
Computer animation
13:46
Lecture/ConferenceMeeting/Interview
Transcript: English(auto-generated)
00:05
I will talk about our tool Antiloop. Antiloop is a tool for helping researchers with the task of semantic annotating their data. And it's funded by NFDI for culture mainly, but we think of it to have impact also in other domains of knowledge that have the same tasks.
00:33
We had a look at our own projects, so when we wanted to do semantic annotation, we used a lot of specialized tools for linking entities, searching in terminologies, transferring data.
00:50
And I thought about an image, so how these pipelines and workflows looked, and I came up with this picture.
01:02
I think many tools we use today that are using machine learning, for example, or natural language processing, very specialized tools. I mean, they are totally doing their job, and often you have a lot of possibilities to fine-tune the methods they are using.
01:22
So these are really great tools. The problem is that most of these single tools are very complex. And especially if we have interdisciplinary projects, for example, in our case, if you have art historians who want to use machine learning to take the data, or libraries, or employees,
01:45
you have the project that these people, there's a large technical area for the use of these specialized tools. It would allow them to make more use of that data, and annotating that data, and benefit from the results.
02:00
But the technical area is currently, from our point of view, too high. And that was the intention to build an antilogue. And one could say that when you extend technology or high technology, the tools are by nature more complex than useable tools.
02:22
But there are some examples where there's a second picture here, I think a smart phone example of high technology that is usable by nearly everyone. So if you created a smart phone to a person that never used it before, he or she would figure out in short time the main figures.
02:44
But that also means that we have to hide functionality, and we have to focus on user interfaces as well as workflows. So, for example, one of our intentions was that we don't want to break the workflow of the researchers when you do semantic annotation. So we don't want to leave your research environment, use another tool, open another web page to have a specialized tool.
03:09
And that's why we created antilogue to bring all or most of the functionality that people use when they do semantic annotation to a single API and a single front end that acts as a middleware to get access to all this functionality.
03:25
In our case, we use Wikibase a lot in our projects, and we have a lot of tools that enhance Wikibase or make it easier to use. For example, we just created a Wikibase deployment pipeline if you want to host it by yourself.
03:42
Antilogue is one of the services that can be used standalone, but will be also used within Wikibase in all projects to get semantic annotation functionalities on your data out of Wikibase.
04:02
We supported for antilogue a web-based front end, which is mainly to get in touch with the functionalities of antilogue. So you can use it to annotate some data, but if you have a lot of data and want to have bulk processing and want to integrate it really into your workflow, I think you wouldn't use the web front end we created.
04:22
You would use either the API or we are now creating widgets that can be integrated into your own research software. But the main functionalities can be explored via this web page, and these are three main functionalities. The first one on the left side here is terminology search, so searching for
04:42
the correct terminology or ontology for my project, searching for terms within these terminologies. The second one is entity linking in texts or text classification. And the third one is entity linking or classification of images.
05:01
Let's have a look at the first one, terminology search. What we do here is that you can search for terms, and you can either define your own dictionary or terminology to search for terms in that, or you can use our preselected datasets.
05:24
For example, you can mark that you want to search within GND or icon class, or we also have access to the tip terminology service to all terminologies that are in that. And then we do a search for the term, and we not only search for the term, we also create a graph out of it.
05:46
So what you see on the right side is that we create the class hierarchy. So for example, if we search for Mona Lisa, we get a lot of entity results from Bikidata or from DBpedia, but instantly, by looking at the queue number, we do not know is this Mona Lisa, is it the musical, or is it Mona Lisa the painting?
06:04
And that's essential to work with the entity, and that's the information our function here provides. So this is for term search and term identification. The second function is for entity linking in texts. So if you have a text, for example, if you have a museum which has a lot
06:26
of full text about objects, you want to link entities from your controlled vocabulary to these texts. This is what we provide here using word embedding models. And one option is to link entities as written, but we can also use or define a semantic distance that should be applied.
06:52
So for example here, a point is linked to a sensitive point, or for example, a tomato is linked to
07:04
a vegetable, so you don't have the exact writing, but within the semantic distance, the same meaning or nearer meaning. Also you can use these functions not only to identify word entities, you could also
07:23
use it to link full texts to the entities in your terminology or your dictionary. For example, in a project we used in philosophy, we linked paragraphs of philosophy texts to entities.
07:44
And then in this dictionary, there were 5,000 words organized in 50 subclasses like antique philosophy or realism or like these things. And then, for example, what you can do is displaying a word cloud, so how many links we get in this text from a specific category.
08:07
So this is, for example, a use case where you can use entity linking to display a visualization that helps users to get an insight about what this text is about or what knowledge domain this text is about without using a machine learning model to interpret the content of the text.
08:29
And this is, I think, a good technique to use machine learning without leaving too much control to the machine.
08:43
And the last feature is nearly the same thing, not on text, but on images. So, for example, you can use this feature to identify icon class entities within an image or also you could use your own dictionary, like, for example, to let the system identify oil painting or is it a pen painting.
09:10
Maybe you say, is it more about architecture or is it more about landscapes? So you can use it for broad classification, but also for nearly detailed entity linking.
09:23
And also you can use more information. For example, if you have descriptions of the image, you can use the image and description in combination to get a scoring of the entities. And we are suggesting to use this in semi-automated processes.
09:42
So normally you have a lot of entities where you get the best results, not in the first place, but you can use it in a semi-automated process where then a researcher can get the suggestion from the machine and get to faster results with entity linking than you would do when we do it manually.
10:06
And all this functionality is provided also by a programming interface, by an API. So the front end, as you see on the web page, is using this API to provide a front end. That means that the whole service can be integrated in any research software that has access to the web.
10:25
And also the whole code is open source. So you could also host it by yourself. The whole machine learning functionality is not developed by our own. We are connecting external services that are specialized, for example, in image
10:42
recognition or text entity linking and bring it together in one place. And when we have a look at this functionality about text entity linking, I think what is essential when we use this kind of technology is that we talk about transparency.
11:00
So because if we are using this to get categories out of texts and visualize information for the user, what this text is about, we need to have a look at what models do we use and what do these models do when we have this. So in this project, we have this dictionary of 5,000 words categorized in 50 subcategories.
11:25
And then we used Antelope to create an AI model view. So we visualized the AI model, the 5,000 words in this dictionary.
11:41
And we asked the model via Antelope, what is the semantic distance from your point of view between these words. But we colorized the words here as they were in the original dictionary. So blue are all antique philosophy, for example, and red is another domain knowledge.
12:01
And here you see that there are gaps between words. So there are word clouds from the point of view of the model, which are very near to each other. And you can see here, there are some words that are distinguished in your original dictionary but cannot be distinguished by the trend AI model.
12:28
And it would be a clear decision not to use this model if the research question is that this group of words needs to be distinguished. So these are some of the features that we can get when we use Antelope.
12:43
This visualization, for example, is part of our Wikibase for Research project. As an outlook, we will further integrate all of our projects. So for example, currently, Antelope is a standalone service that can be used together with Wikibase.
13:05
But now we are creating a widget, a full integration from Antelope into Wikibase, which can be used within Wikibase. And also we bring it into the new NFDI terminology service to make it even easier to work with these tools.
13:26
So thank you very much. I would suggest if you are interested to use the service, have a look at the web page and also have a look at our Wikibase for Research project that makes it easier to work with your own Wikibase instance and bring him up within a few minutes.
13:45
Thank you very much. Thank you, Kolya, for your talk. As far as I can see, there are not any questions yet in the chat.
14:01
To connect to the first talk about machine learning, do you also share the machine learning models and datasets? You are creating in the project? Yeah, we are using public models, so we don't train the models by ourselves.
14:23
Because we think that what people need are, okay, they start with common models. So for example, trained on Wikipedia data or whatever. But then in the end, they need custom trained models or their specific domain of knowledge. What we are currently doing is that we are working together with some for example in philosophy to create use cases
14:49
and to enable them to create their own models for their knowledge domain. And in the next year, we will provide a new version of Antelope where you
15:00
can bring your own model in and use Antelope UI and API with your own models. Or the model that was trained in your field of knowledge. Okay, thank you. Now there's a question in the chat.
15:20
Will the Antelope integration also be available in Wikibase Cloud at some point in the future? That's a good question, maybe. But as we are not the developers of Wikibase Cloud, we are in touch with the whole Wikibase community. But it's not a development we can do.
15:43
So at first, it will be available in the self-hostage instances. But we make it easier to do that with Wikibase for research, which you can use to install it within minutes. And you just mark in a file that you want to have the Antelope widget installed and then it will be installed automatically.
16:08
Okay. We still have some time. Are there more questions? Otherwise, I will ask again the question you won't be surprised, probably, because I already asked it in another context.
16:22
Are you planning to support the entity reconciliation API developed in the W3C community group for terminology lookup and entity linking? Currently, we are not supporting it actively, but we look at it with a lot of interest.
16:40
So I think there are a lot of activities out there who see the same problems and searching for solutions. And from our point of view, it's great to combine all these activities, at least sharing information. I think in the end, we won't have that one activity and that one pipeline or program that will solve all of our use cases and problems.
17:04
So we will have some solutions that might slightly differ in the user group or in their workflows. But I'm a big fan of combining these activities along and sharing information to get the best results out of it.