Crowdsourcing Scholarly Discourse Annotations
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Deutschland: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/52370 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | |
Genre |
00:00
TelekommunikationGruppenoperationVirtuelle MaschineOrakel <Informatik>Dichte <Stochastik>Maschinelles LernenProgrammierungEreignishorizontInhalt <Mathematik>MultigraphDigital Object IdentifierAppletSkriptspracheDatenmodellMaßstabTaskEndliche ModelltheorieKomplex <Algebra>Natürliche ZahlProzess <Informatik>Formale SpracheVirtuelle MaschineEndliche ModelltheorieBenutzeroberflächeGruppenoperationEntscheidungstheorieInhalt <Mathematik>SkalierbarkeitDeskriptive StatistikSoftwareschwachstelleMultigraphResultanteFormale SemantikArbeit <Physik>CASE <Informatik>DatenmodellProzess <Informatik>MetadatenTelekommunikationt-Testsinc-FunktionElektronische PublikationNatürliche SpracheArithmetisches MittelBildschirmmaskeComputeranimation
02:26
BenutzeroberflächeGraphVirtuelle MaschineProzess <Informatik>GraphBenutzeroberfläche
02:43
LeistungsbewertungSystementwurfOntologie <Wissensverarbeitung>Klasse <Mathematik>Element <Gruppentheorie>Virtuelle MaschineBitfehlerhäufigkeitStrukturierte ProgrammierungInterface <Schaltung>Notepad-ComputerGraphPhysikalisches SystemKlasse <Mathematik>Zusammenhängender GraphGraphCASE <Informatik>Interface <Schaltung>ComputeranimationFlussdiagramm
03:20
VersionsverwaltungRechenwerkPROMPrimzahlzwillingeWurm <Informatik>Interface <Schaltung>Inklusion <Mathematik>Normierter RaumWarpingBenutzeroberflächeCASE <Informatik>EntscheidungstheorieKlasse <Mathematik>AutorisierungInhalt <Mathematik>Computeranimation
04:01
BitfehlerhäufigkeitDichte <Stochastik>HypermediaRechenwerkLokales MinimumInterface <Schaltung>AlgorithmusVirtuelle MaschineKlasse <Mathematik>PlastikkarteElement <Gruppentheorie>Ontologie <Wissensverarbeitung>Physikalisches SystemCASE <Informatik>Zusammenhängender GraphVirtuelle MaschineKlasse <Mathematik>Notepad-ComputerSchnittmengeAbstraktionsebenePlastikkarteInhalt <Mathematik>Lokales MinimumMultiplikationGenerator <Informatik>Element <Gruppentheorie>Computeranimation
05:54
SystementwurfLeistungsbewertungInformationSyntaktische AnalyseMultigraphInhalt <Mathematik>Kontrast <Statistik>SystemprogrammierungVirtuelle MaschineGruppenoperationProzess <Informatik>RechnernetzOffene MengeGraphMotion CapturingTaskAbstraktionsebeneVektorpotenzialAdressraumQuellcodeAlgorithmusMaschinelles LernenBeobachtungsstudieInterface <Schaltung>SondierungFeasibility-StudieDialektBenutzerfreundlichkeitEuler-WinkelForschungszentrum RossendorfDatenmodellDatenstrukturPlastikkarteBefehl <Informatik>IntelTurm <Mathematik>RechenwerkVollständigkeitErneuerungstheorieUnitäre GruppeSchlussregelZeitbereichFormale SpracheMenütechnikKomplex <Algebra>TopologieRückkopplungFarbverwaltungssystemTurtle <Informatik>EbeneTVD-VerfahrenMinimalgradDichte <Stochastik>Multi-Tier-ArchitekturBeanspruchungSichtenkonzeptFunktion <Mathematik>CodeProgrammbibliothekDefaultViewerDatensichtgerätDatenbankTheoretische PhysikDigital Object IdentifierKomponente <Software>GarbentheorieBrowserProgrammierumgebungMultiplikationMaßerweiterungAbfrageVersionsverwaltungOntologie <Wissensverarbeitung>DigitalsignalObjekt <Kategorie>FokalpunktMAPInterface <Schaltung>VollständigkeitBefehl <Informatik>ResultanteCASE <Informatik>PlastikkarteDichte <Stochastik>Klasse <Mathematik>TypentheorieLokales MinimumMailing-ListeComputeranimation
07:19
SystementwurfLeistungsbewertungFeasibility-StudieTaskMaßstabIndexberechnungLastVirtuelle MaschineInterface <Schaltung>BenutzerfreundlichkeitPhysikalisches SystemBeanspruchungLeistungsbewertungTaskBenutzerfreundlichkeitMereologieEuler-WinkelPhysikalisches SystemPlastikkarteZusammenhängender GraphAutomatische IndexierungFeasibility-StudieLastComputeranimation
08:06
MaßstabMeta-TagAnalysisBeanspruchungFaktorenanalysePhysikalisches SystemBenutzerfreundlichkeitLastTaskIndexberechnungLeistungsbewertungVersionsverwaltungDatentypPlastikkarteFunktion <Mathematik>Service providerTaskIntegralBenutzerfreundlichkeitPlastikkartePhysikalisches SystemResultanteInterface <Schaltung>Euler-WinkelAutomatische IndexierungTypentheorieMetaanalyseLastComputeranimation
08:54
Virtuelle MaschineTaskFeasibility-StudiePhysikalisches SystemLeistungsbewertungBenutzerfreundlichkeitTeilmengeElement <Gruppentheorie>BeanspruchungElement <Gruppentheorie>ResultanteTaskPhysikalisches SystemBenutzerfreundlichkeitLeistungsbewertungComputeranimation
09:12
Virtuelle MaschineFeasibility-StudieTaskLeistungsbewertungPhysikalisches SystemBenutzerfreundlichkeitTeilmengeComputerElement <Gruppentheorie>Prozess <Informatik>LeistungsbewertungProzess <Informatik>Inverser LimesComputeranimation
09:33
Forschungszentrum RossendorfComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:02
Hello, my name is Oller Tullen and I'm a PhD student from Hanover, Germany. Today I will present crowdsourcing scholarly discourse annotations. Let's get started. Scholarly communication is largely document-based. For example, when reading articles we are often faced with PDF files.
00:23
While document-based communication generally works fine for humans since it's created for human consumption after all, machines cannot easily parse the contents. Or in other words, the articles are not machine actionable. And one of the consequences is that finding relevant articles becomes more cumbersome
00:41
as more articles are getting published. In this example, more than a million results are displayed in Google Scholar when looking for user interfaces based on machine learning. When knowledge graphs are used, it is possible to create paper descriptions that can be understood by machines. This makes it possible for machines to understand the semantics or the meanings of the article
01:04
and for example provide the ability to have better support for finding articles. In this example, a paper is displayed. In grey, you see the metadata of the paper and in red some of the contents in this case an annotated sentence with a respective discourse class.
01:23
Having structured paper data has many benefits. However, creating this data is not that straightforward. Broadly speaking, there are two approaches. Firstly is to automatically generate these descriptions. For example, using natural language processing techniques. But this is not very accurate currently and often requires domain-specific models
01:44
and they are not always available. Secondly, there is the manual creation of structured data. For example, using crowdsourcing. Compared to automated techniques, the accuracy will be higher but since human labor is involved, scalability is an issue.
02:00
Also, it requires skilled users to make decisions regarding the modeling of the data. In this research, we suggest a hybrid approach to generate these structured paper descriptions using a paper annotation system. This uses both machine intelligence and human intelligence and it tackles the weaknesses of the techniques when they are employed in isolation.
02:22
It even provides a synergy in the form of an intelligent user interface. To create such a hybrid system, we will answer the following research questions. How to design an intelligent user interface to populate a scholarly knowledge graph using crowdsourcing? And a second question, how to employ a machine-in-the-loop approach to assist users in this process?
02:44
Let's now discuss the system design. This is an overview of the system design. The interface is displayed in the middle and the use cases for generating structured data on the left and for using the structured data on the right. The workflow is as follows. A user or a researcher uploads a paper in PDF format to the system.
03:04
Afterwards, the user annotates key sentences within the paper. Each sentence is annotated with a discourse class describing what the sentence is about. Based on the annotations, a knowledge graph is created. Let's now have a look at the individual components.
03:21
First, let's look at the data entry use cases and specifically the paper submission use case. This describes a scenario where researchers can annotate their papers during the paper submission process. This could, for example, be during the camera-ready submission. In this case, the author is of course familiar with the paper's content
03:41
and can therefore make good decisions on which sentences are most important and therefore which sentences should be annotated. Now let's have a look at the user interface. In the interface, you'll see the article is displayed on the right side and the annotations with their discourse classes on the left.
04:01
Now we'll discuss the three machine assistant components starting with the automatic sentence highlighting. The first smart component automatically highlights potentially important sentences in the paper. For that, the paper is first automatically summarized using an extractive summarization tool.
04:20
This created summary is split by sentences and each of those sentences are highlighted in the PDF file. Here you can see the highlighted sentences within the paper. And here you can see a button to disable the highlights in case they are not helpful. Highlights only serve as suggestions and it's therefore not mandatory to use them.
04:42
Next, the maximum sentences per annotation. This component counts the amount of selected sentences for a single annotation. A warning is displayed if more than two sentences are selected. This encourages users to only select key sentences and not to annotate whole paragraphs. Also, this warning is only a suggestion.
05:02
It's still possible to annotate more than two sentences. Next, the automatic class suggestions. Once a sentence is selected, a set of potentially relevant classes are displayed. These classes are determined using a zero-shot classifier and are based on the contents of the selected sentence.
05:20
Also here, in case the suggestions are not helpful according to the user, they can be ignored. Now let's have a look at the human role in this process. A user, of course, has to select the sentences within the article and then choose the relevant discourse class. They can choose from 25 classes which are coming from the Discourse Elements Ontology, or DEO.
05:44
Finally, there are multiple use cases related to how to use the generated knowledge. This includes an improved search system or automatic generation of paper abstracts. Now we will look at the demonstration of the interface. This is the annotation interface.
06:02
The first thing we do is to upload a paper in PDF format. Then we can start annotating a sentence by making a selection. We can either select a type from the list of all classes, or we can choose one of the suggested types.
06:22
In this case, the class problem statement seems to be right. After adding the annotation, it is displayed in the left sidebar. The completion bar indicates if there are at least two annotations for the five most important classes. We can enable the smart sentence detection, which will highlight potentially interesting sentences throughout the article.
06:50
A highlighted sentence can be annotated by clicking on it and then selecting an appropriate class. In case more than two sentences are selected for a single annotation, a warning will be displayed.
07:08
This warning informs the user that it is better to annotate a maximum of two sentences. Finally, the result can be saved. Now I will discuss the evaluation of this research.
07:23
The main goal of the evaluation was to determine the task feasibility and see whether researchers are willing to annotate their papers with this system, for example during paper submission. In total, 23 researchers were part of the evaluation. They were asked to annotate a paper they altered themselves, or to annotate a recently read article.
07:44
Afterwards, they were asked to fill out a questionnaire. The questionnaire consisted of three parts. First, to assess the attitudes towards the task itself and the smart system components. Secondly, a system usability evaluation was conducted to evaluate usability.
08:00
And finally, the task load was evaluated using the NASA task load index. Now the results. The system usability skill results show a score of 76, which is considered good. The NASA task load index was 35, which is low compared to a meta-analysis related to this index.
08:22
And in this case, the lower the better. Finally, here are some results of the participants' attitude towards the interface. For example, most participants are willing to annotate a paper during the camera-ready submission. Interestingly, they were mostly positive about the integration of smart technologies in the interface.
08:43
Even though they indicated that both the smart type suggestions and the sentence highlighting tools were not always providing helpful suggestions, they still appreciated the integration. Finally, the conclusion. We conclude that machine-assisted paper annotation is a feasible task.
09:01
Researchers are willing to annotate their papers. Also, they appreciate the smart elements. The other evaluation results show the usability of the system is good and the workload is acceptable. The limitations are mainly related to the evaluation. For this, more participants should be included from more diverse backgrounds.
09:22
This is also something we are addressing in future work. Additionally, we will use other NLP tools, possibly supported by humans, to further process the annotated sentences. This was the presentation. Thank you for your attention and please let me know if there are any questions.