Query Embeddings: Web Scale Search powered by Deep Learning and Python

485

Cite

Related Material

EuroPython

Bahuguna, Ankit

Formal Metadata

Title

Query Embeddings: Web Scale Search powered by Deep Learning and Python

Title of Series

EuroPython 2016

Part Number

Number of Parts

169

Author

Bahuguna, Ankit

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/21105 (DOI)

Publisher

EuroPython

Release Date

2016

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Ankit Bahuguna - Query Embeddings: Web Scale Search powered by Deep Learning and Python A web search engine allows a user to type few words of query and it presents list of potential relevant results within fraction of a second. Traditionally, keywords in the user query were fuzzy-matched in realtime with the keywords within different pages of the index and they didn't really focus on understanding meaning of query. Recently, Deep Learning + NLP techniques try to _represent sentences or documents as fixed dimensional vectors in high dimensional space. These special vectors inherit semantics of the document. Query embeddings is an unsupervised deep learning based system, built using Python, Word2Vec, Annoy and Keyvi which recognizes similarity between queries and their vectors for a web scale search engine within Cliqz browser. The goal is to describe how query embeddings contribute to our existing python search stack at scale and latency issues prevailing in real time search system. Also is a preview of separate vector index for queries, utilized by retrieval system at runtime via ANNs to get closest queries to user query, which is one of the many key components of our search stack. Prerequisites: Basic experience in NLP, ML, Deep Learning, Web search and Vector Algebra. Libraries: Annoy.