We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Word2Vec model to generate synonyms on the fly in Apache Lucene

Formale Metadaten

Titel
Word2Vec model to generate synonyms on the fly in Apache Lucene
Serientitel
Anzahl der Teile
56
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
If you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It's not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain. The term "daemon" in the domain of operating system articles is not a synonym of "devil" but it's closer to the term "process". Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other. This talk explores our contribution to Apache Lucene that integrates this technique with the text analysis pipeline. We will show how you can automatically generate synonyms on the fly from an Apache Lucene index and how you can use this new feature along with Apache Solr with practical examples!