We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Word2Vec model to generate synonyms on the fly in Apache Lucene

Formal Metadata

Title
Word2Vec model to generate synonyms on the fly in Apache Lucene
Title of Series
Number of Parts
56
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
If you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It's not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain. The term "daemon" in the domain of operating system articles is not a synonym of "devil" but it's closer to the term "process". Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other. This talk explores our contribution to Apache Lucene that integrates this technique with the text analysis pipeline. We will show how you can automatically generate synonyms on the fly from an Apache Lucene index and how you can use this new feature along with Apache Solr with practical examples!