Bestand wählen
Merken

A learning-based approach to combine medical annotation results.

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
yeah minus mixture of so and much of each of whom will construct a knowledge graph so what is a great idea for the conference and
so that I know I can improve on the construction of of graphs and was approaches for annotating
documents to extract entities that are relevant for the knowledge graph construction yes this is a joint work with the Singapore Institute of Technology and the University the University of Lax University and them as a reminder of
annotation this and so the input of an annotation approach in this piece of text on a document non-soccer compartment and we liked tool identifier entities text fonts and all this
city as a disease and foreign Connery terrier diseases and if we have identified these kind of entities of would like to lead them to concepts for the
ontology then it so this is a work of a would be there and the annotations though we have this piece of text and reasoning that to the concept to a certain constant in this case the Unified Medical Language System what the cutting out mentioned yesterday but that is a huge ontology and the same case for the other disease on so what do we need them
so the especially in the biomedical domain we have a lot of documents at the region's documents for instance in this case electronic health records that reactor documents a history of a patient and there furthermore we have applications and partner that and was certain results and we have also
case report forms for instance of like to excellent it in a could try that we have to recruit patients was a study and therefore we have to define eligibility progeria z said have to be satisfied by dissipation served on with in the study and what we what other annotations those annotations courts to come to a complete barrier ability that we are able to compare different studies so I a yesterday and that someone worked about something about clinical studies and you like to compare clinical studies and it's quite difficult to compare our unstructured text and therefore you can they can use annotations and to compare the studies all out to construct a knowledge graphs and you have to identify the entities that are in your domain and have to formalize somehow was the these entities are related to the other institutions so now we know why we need annotations
so and the question is when I started with my PhD students OK I developed a new annotation to and that is better than the rest because was quite motivated but yeah after a few months I recognize OK there are some of the tools and there was not so bad slow say all but to produce different quantity according to his a certain domain and it's quite difficult to competencies these tilt to it's because each to it had a lot of configuration possibilities and it's a question of how can we reuse of existing tools and the idea is to apply a set of 2 and combined at the results and to get a final on annotation mapping for a set of documents and and
in our previous go up and and what it was at the same conference and 2017 we hear suppose an approach where you was a set of tools take on a set of documents with the Unified Medical Language of us concepts from the Unified Medical Language System and each to of generates an annotation mapping and sanitation mapping consists of of the payoff to document fragment to a certain concept and based on the set of parents we can apply a set base of operations such as the union so that we can also take all of Paris all annotations to the final result at all we only use of its annotations that all to have identified always around take only these annotations where the law of the majority of 2 it's a have identified these annotations and the result was all kind of annotation mapping for these documents but what we have
observed is that we get in some situations false negatives all what is the situation when we get false negatives and if for instance I only want to identify the sanitation and but also the combination strategy is the majority of all intersection we use Zuse is this correct annotations there are cases that really get false positives and because we don't use the discourse was a confidence values that are generated by such with von annotations little each annexation Allen hasn't certain probability that this up annotation is correct Union but the problem is that a majority of 2 with identify certain annotation but it was no confidence was I it was no continents where used this annotation is also included in all final results these are the issues for the set is annotations idea knowledge of to utilize aging erectus groups by the jewels and to use a set of verified annotations to building classification models that we can use some but faltered data correct annotations then so now I can choose the approach and so we have a set of documents that are not annotated so far and we have a set of tools and we apply each true 2 is a set of documents identify the annotations for each tool and and based on that reside of to was we drop the number of annotations that are generated by the 2 was that should be verified by use all of the domain expert and more will be considered in ratio of positive and negative centers so that we draw so much menu about annotations that are generated by the 2 words that we haven't balanced strain of some training data sets from for instance that we have and GP racial 50 per cent we say OK we need that 50 per cent correct annotations and 50 per cent of incorrect annotations after that we have all our training set we are able to be of an undertaking vector and this annotation vector represent a so confidence values for each song and these lectures can be labeled bands so a set of annotation vectors can then be used to generate and justification models such as support vector machine desicion created random forest floor neural networks Allen so now I was out from the last that OK and that I use this model to classify and not verified on notation system and there is prediction would be set this annotation on not so how does
it look like these annotations vectors our example and we have these pieces of text from from a litigation end where 3 2 and that's from 1 of other identified these concept and annotation fall the other this piece of text our own and to to of the recognized 3 2 wouldn't have to is really identified 2 concepts and now we trans form of this intermediate results the 1 2 vectors so for each concept and to certain document fragment and rebuild the lecturer and where each entry of this vector also represent around 2 continents and you will also certain to is for instance and discounts this record I recognized by will 1 as well as by 2 extreme was a score of 1 and 4 2 1 and 0 comma decimal 8 6 4 2 3 is the vector entry as a vector entries are 1 0 comma decimal 6 and to compensate the influence of a single in the thing that's will we also and basic score could be the basic string similarity measures such as soft you by phone or the but are constant heritage based on through your own and now we have all annotation rector that is also will label this is annotations core correct all instrumentation a smart way at
that so every set of labeled annotation necklace we are able to build and classification models so in all vector space and and this is shown here is you can separated from the correct annotations from the incorrect connotations by play a hyperplane and after that we can also classify not verify imitations from It's like this here and it's this annotation is correct or not the then so this is a
whole broke now we come to the evaluation so in our case we have a set of documents about eligibility criterium and and quality assurance of forms and so it's looks like a questionnaire and the task was to undertake each question now of which concepts are included in this is the question and our experience and set up a we apply it but the following 2 was that we consider our middle at was quite famous in the biomedical too many domains he takes and of I developed annotation to on on the and for we choose the following prop parameters for TQ of for the T ratios of ratio between the number of positive and negative examples on 21st and such % 30 or 40 per cent and 50 per cent and we also vary bearer of the standard size from 50 100 and 200 undertake a annotations that should be verified and we use as the basic strong as basic on matters self if idea of string similarity earn now we come
to so resides so in this variance you see the position on the Y. axis the recall on the Y axis and the position on its axis and was 1st of represents the national and each point and characterize this configurations so we have not 3 2 was but each to and different configuration because you have a new time at once he takes a lot of configuration possibilities on on the map as well and therefore we show them all configuration and this pure and it's what we had observed this and said a higher number of positive examples fossil or 50 per cent we had 50 per cent of the positive examples for the training we get higher recall that is of show although I am not so that we have here a higher recall and and analog position and the opposite cases and we use no number of positive examples at all and we have a higher position but of low recall competitors other GPU ratios then now I like to show the
results and for the different sample sizes and so what we have observed is that we also get the good results for no no number of use for standards the but we have here
the increase of reuse more the training data the sun is is it and glottis fall the random the forest and aboriginal different at the ratio of the relations and configurations what we have also investigated is the amount of different classification models such such as as the and the random forest and at position 3 about a year didn't absolved of a lot of difference in this case so a the slide
and so is a summary of all all along the evaluation of this error of the comparison between it's a different tool this was only we like to improve so we on to quality also different with and we can observe that we are able to do that small in the case of but this was the result was he takes and time for the datasets and without any optimization and in this case we use than other selection that yeah that ended and explain it but is readable and all previous brought where we have an improvement and we have also improvement if we apply a set S is a set this combination of the approach but we get higher improvement is the use of the machine learning approach in state has so city I have to improve the the results of the 2 words from by using a combination and especially by using and machine learning approaches so this will be also
the concluding all my chance that we propose that machine combinations all annotation mappings of generated by different tools and therefore we have to generate these annotation that 1st on based on the computed score so each tumor and the result Scholtz show that we can improve the quality of coke at is a set of various combinations and the singing of 2 results for future throughout the world and we like to consider different similarity measures to extend the the vectors all all all and promising approach country also mentioned as and to use active learning techniques also set so we can move easily extend all training and data and to improve all models and therefore we need of some some of these these techniques that the use was able to easily verify annotations fast and and so that we can use the results of validation and to improve all annotation of classification models for annotating on medical documents and thank you for your attention
and if it English
Zusammengesetzte Verteilung
Konstruktor <Informatik>
Graph
Desintegration <Mathematik>
Ungerichteter Graph
Font
Desintegration <Mathematik>
Identifizierbarkeit
Ontologie <Wissensverarbeitung>
Ein-Ausgabe
Resultante
Domain <Netzwerk>
Ontologie <Wissensverarbeitung>
Leistungsbewertung
Desintegration <Mathematik>
Formale Sprache
Kartesische Koordinaten
Physikalisches System
Dialekt
Domain-Name
Maßstab
Momentenproblem
Strom <Mathematik>
Ontologie <Wissensverarbeitung>
Mustererkennung
Benutzerführung
Instantiierung
Resultante
Domain <Netzwerk>
Dissipation
CD-I
Desintegration <Mathematik>
t-Test
Ungerichteter Graph
Unendlichkeit
Bildschirmmaske
Domain-Name
Konfigurationsraum
Meta-Tag
Feuchteleitung
Beobachtungsstudie
Vervollständigung <Mathematik>
Verhandlungs-Informationssystem
Konfigurationsraum
Paarvergleich
Mapping <Computergraphik>
Bildschirmmaske
Menge
Einheit <Mathematik>
Ruhmasse
Verkehrsinformation
Instantiierung
Support-Vektor-Maschine
Resultante
Wellenpaket
Ortsoperator
Desintegration <Mathematik>
Schaltnetz
Formale Sprache
Programmverifikation
Gruppenkeim
Zahlenbereich
Gebäude <Mathematik>
Maschinelles Lernen
Wald <Graphentheorie>
Gesetz <Physik>
Zahlensystem
Domain-Name
Negative Zahl
Informationsmodellierung
Zufallszahlen
Prognoseverfahren
Bereichsschätzung
Wellenpaket
Gruppe <Mathematik>
Randomisierung
Vererbungshierarchie
Nichtlinearer Operator
Expertensystem
Wald <Graphentheorie>
Vektorgraphik
Stichprobe
Datenmodell
Negative Zahl
Vorzeichen <Mathematik>
Physikalisches System
Vektorraum
Support-Vektor-Maschine
Mapping <Computergraphik>
Menge
Strategisches Spiel
Wort <Informatik>
Entscheidungsbaum
Neuronales Netz
Instantiierung
Resultante
Desintegration <Mathematik>
Kondition <Mathematik>
Vektorraum
Maschinelles Lernen
Gebäude <Mathematik>
Datensatz
Bildschirmmaske
Informationsmodellierung
PCMCIA
Einflussgröße
Vektorgraphik
Programmverifikation
Datenmodell
Ähnlichkeitsgeometrie
Vektorraum
Menge
Funktion <Mathematik>
Ein-Ausgabe
Selbstrepräsentation
Speicherabzug
Extreme programming
Bitrate
Zeichenkette
Instantiierung
Parametersystem
Wellenpaket
Ortsoperator
Desintegration <Mathematik>
Leistungsbewertung
Stichprobe
Zahlenbereich
Ähnlichkeitsgeometrie
Kartesische Koordinaten
Menge
Mapping <Computergraphik>
Task
Bildschirmmaske
Negative Zahl
Bildschirmmaske
Domain-Name
Menge
Parametersystem
Modelltheorie
Konfigurationsraum
Analogieschluss
Varianz
Meta-Tag
Zeichenkette
Leistungsbewertung
Resultante
Subtraktion
Wellenpaket
Wald <Graphentheorie>
Ortsoperator
Desintegration <Mathematik>
Relativitätstheorie
Stichprobe
Zahlenbereich
Wald <Graphentheorie>
Rechenschieber
Informationsmodellierung
Stichprobenumfang
Randomisierung
Messprozess
Konfigurationsraum
Standardabweichung
Resultante
Subtraktion
Wellenpaket
Desintegration <Mathematik>
Schaltnetz
Textur-Mapping
Ähnlichkeitsgeometrie
Virtuelle Maschine
Informationsmodellierung
Trennschärfe <Statistik>
Algorithmische Lerntheorie
Einflussgröße
Leistungsbewertung
Meta-Tag
Vektorgraphik
Validität
Ähnlichkeitsgeometrie
Vektorraum
Paarvergleich
Zeiger <Informatik>
Mapping <Computergraphik>
Singularität <Mathematik>
Menge
Messprozess
Visualisierung
Fehlermeldung
Aggregatzustand
Desintegration <Mathematik>

Metadaten

Formale Metadaten

Titel A learning-based approach to combine medical annotation results.
Serientitel Data Integration in the Life Sciences (DILS2018)
Autor Christen, Victor
Cardoso, Silvio Domingos
Mitwirkende Lin, Ying-Chi
Groß, Anika
Pruski, Cédric
Da Siveira, Marcos
Rahm, Erhard
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/38608
Herausgeber Technische Informationsbibliothek (TIB)
Erscheinungsjahr 2018
Sprache Englisch

Ähnliche Filme

Loading...
Feedback