Add to Watchlist

A learning-based approach to combine medical annotation results.


Citation of segment
Embed Code
Purchasing a DVD Cite video
yeah minus mixture of so and much of each of whom will construct a knowledge graph so what is a great idea for the conference and
so that I know I can improve on the construction of of graphs and was approaches for annotating
documents to extract entities that are relevant for the knowledge graph construction yes this is a joint work with the Singapore Institute of Technology and the University the University of Lax University and them as a reminder of
annotation this and so the input of an annotation approach in this piece of text on a document non-soccer compartment and we liked tool identifier entities text fonts and all this
city as a disease and foreign Connery terrier diseases and if we have identified these kind of entities of would like to lead them to concepts for the
ontology then it so this is a work of a would be there and the annotations though we have this piece of text and reasoning that to the concept to a certain constant in this case the Unified Medical Language System what the cutting out mentioned yesterday but that is a huge ontology and the same case for the other disease on so what do we need them
so the especially in the biomedical domain we have a lot of documents at the region's documents for instance in this case electronic health records that reactor documents a history of a patient and there furthermore we have applications and partner that and was certain results and we have also
case report forms for instance of like to excellent it in a could try that we have to recruit patients was a study and therefore we have to define eligibility progeria z said have to be satisfied by dissipation served on with in the study and what we what other annotations those annotations courts to come to a complete barrier ability that we are able to compare different studies so I a yesterday and that someone worked about something about clinical studies and you like to compare clinical studies and it's quite difficult to compare our unstructured text and therefore you can they can use annotations and to compare the studies all out to construct a knowledge graphs and you have to identify the entities that are in your domain and have to formalize somehow was the these entities are related to the other institutions so now we know why we need annotations
so and the question is when I started with my PhD students OK I developed a new annotation to and that is better than the rest because was quite motivated but yeah after a few months I recognize OK there are some of the tools and there was not so bad slow say all but to produce different quantity according to his a certain domain and it's quite difficult to competencies these tilt to it's because each to it had a lot of configuration possibilities and it's a question of how can we reuse of existing tools and the idea is to apply a set of 2 and combined at the results and to get a final on annotation mapping for a set of documents and and
in our previous go up and and what it was at the same conference and 2017 we hear suppose an approach where you was a set of tools take on a set of documents with the Unified Medical Language of us concepts from the Unified Medical Language System and each to of generates an annotation mapping and sanitation mapping consists of of the payoff to document fragment to a certain concept and based on the set of parents we can apply a set base of operations such as the union so that we can also take all of Paris all annotations to the final result at all we only use of its annotations that all to have identified always around take only these annotations where the law of the majority of 2 it's a have identified these annotations and the result was all kind of annotation mapping for these documents but what we have
observed is that we get in some situations false negatives all what is the situation when we get false negatives and if for instance I only want to identify the sanitation and but also the combination strategy is the majority of all intersection we use Zuse is this correct annotations there are cases that really get false positives and because we don't use the discourse was a confidence values that are generated by such with von annotations little each annexation Allen hasn't certain probability that this up annotation is correct Union but the problem is that a majority of 2 with identify certain annotation but it was no confidence was I it was no continents where used this annotation is also included in all final results these are the issues for the set is annotations idea knowledge of to utilize aging erectus groups by the jewels and to use a set of verified annotations to building classification models that we can use some but faltered data correct annotations then so now I can choose the approach and so we have a set of documents that are not annotated so far and we have a set of tools and we apply each true 2 is a set of documents identify the annotations for each tool and and based on that reside of to was we drop the number of annotations that are generated by the 2 was that should be verified by use all of the domain expert and more will be considered in ratio of positive and negative centers so that we draw so much menu about annotations that are generated by the 2 words that we haven't balanced strain of some training data sets from for instance that we have and GP racial 50 per cent we say OK we need that 50 per cent correct annotations and 50 per cent of incorrect annotations after that we have all our training set we are able to be of an undertaking vector and this annotation vector represent a so confidence values for each song and these lectures can be labeled bands so a set of annotation vectors can then be used to generate and justification models such as support vector machine desicion created random forest floor neural networks Allen so now I was out from the last that OK and that I use this model to classify and not verified on notation system and there is prediction would be set this annotation on not so how does
it look like these annotations vectors our example and we have these pieces of text from from a litigation end where 3 2 and that's from 1 of other identified these concept and annotation fall the other this piece of text our own and to to of the recognized 3 2 wouldn't have to is really identified 2 concepts and now we trans form of this intermediate results the 1 2 vectors so for each concept and to certain document fragment and rebuild the lecturer and where each entry of this vector also represent around 2 continents and you will also certain to is for instance and discounts this record I recognized by will 1 as well as by 2 extreme was a score of 1 and 4 2 1 and 0 comma decimal 8 6 4 2 3 is the vector entry as a vector entries are 1 0 comma decimal 6 and to compensate the influence of a single in the thing that's will we also and basic score could be the basic string similarity measures such as soft you by phone or the but are constant heritage based on through your own and now we have all annotation rector that is also will label this is annotations core correct all instrumentation a smart way at
that so every set of labeled annotation necklace we are able to build and classification models so in all vector space and and this is shown here is you can separated from the correct annotations from the incorrect connotations by play a hyperplane and after that we can also classify not verify imitations from It's like this here and it's this annotation is correct or not the then so this is a
whole broke now we come to the evaluation so in our case we have a set of documents about eligibility criterium and and quality assurance of forms and so it's looks like a questionnaire and the task was to undertake each question now of which concepts are included in this is the question and our experience and set up a we apply it but the following 2 was that we consider our middle at was quite famous in the biomedical too many domains he takes and of I developed annotation to on on the and for we choose the following prop parameters for TQ of for the T ratios of ratio between the number of positive and negative examples on 21st and such % 30 or 40 per cent and 50 per cent and we also vary bearer of the standard size from 50 100 and 200 undertake a annotations that should be verified and we use as the basic strong as basic on matters self if idea of string similarity earn now we come
to so resides so in this variance you see the position on the Y. axis the recall on the Y axis and the position on its axis and was 1st of represents the national and each point and characterize this configurations so we have not 3 2 was but each to and different configuration because you have a new time at once he takes a lot of configuration possibilities on on the map as well and therefore we show them all configuration and this pure and it's what we had observed this and said a higher number of positive examples fossil or 50 per cent we had 50 per cent of the positive examples for the training we get higher recall that is of show although I am not so that we have here a higher recall and and analog position and the opposite cases and we use no number of positive examples at all and we have a higher position but of low recall competitors other GPU ratios then now I like to show the
results and for the different sample sizes and so what we have observed is that we also get the good results for no no number of use for standards the but we have here
the increase of reuse more the training data the sun is is it and glottis fall the random the forest and aboriginal different at the ratio of the relations and configurations what we have also investigated is the amount of different classification models such such as as the and the random forest and at position 3 about a year didn't absolved of a lot of difference in this case so a the slide
and so is a summary of all all along the evaluation of this error of the comparison between it's a different tool this was only we like to improve so we on to quality also different with and we can observe that we are able to do that small in the case of but this was the result was he takes and time for the datasets and without any optimization and in this case we use than other selection that yeah that ended and explain it but is readable and all previous brought where we have an improvement and we have also improvement if we apply a set S is a set this combination of the approach but we get higher improvement is the use of the machine learning approach in state has so city I have to improve the the results of the 2 words from by using a combination and especially by using and machine learning approaches so this will be also
the concluding all my chance that we propose that machine combinations all annotation mappings of generated by different tools and therefore we have to generate these annotation that 1st on based on the computed score so each tumor and the result Scholtz show that we can improve the quality of coke at is a set of various combinations and the singing of 2 results for future throughout the world and we like to consider different similarity measures to extend the the vectors all all all and promising approach country also mentioned as and to use active learning techniques also set so we can move easily extend all training and data and to improve all models and therefore we need of some some of these these techniques that the use was able to easily verify annotations fast and and so that we can use the results of validation and to improve all annotation of classification models for annotating on medical documents and thank you for your attention
and if it English


Formal Metadata

Title A learning-based approach to combine medical annotation results.
Title of Series Data Integration in the Life Sciences (DILS2018)
Author Christen, Victor
Cardoso, Silvio Domingos
Contributors Lin, Ying-Chi
Groß, Anika
Pruski, Cédric
Da Siveira, Marcos
Rahm, Erhard
License CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/38608
Publisher Technische Informationsbibliothek (TIB)
Release Date 2018
Language English

Content Metadata

Subject Area Information technology, Life Sciences
Abstract There exist many tools to annotate mentions of medical entities in documents with concepts from biomedical ontologies. To improve the overall quality of the annotation process, we propose the use of machine learning to combine the results of different annotation tools. We comparatively evaluate the results of the machine-learning based approach with the results of the single tools and a simpler set-based result combination.
Keywords biomedical annotation
annotation tool
machine learning

Related Material



  403 ms - page object


AV-Portal 3.7.0 (943df4b4639bec127ddc6b93adb0c7d8d995f77c)