Query Embeddings: Web Scale Search powered by Deep Learning and Python
Formal Metadata
Title 
Query Embeddings: Web Scale Search powered by Deep Learning and Python

Title of Series  
Part Number 
45

Number of Parts 
169

Author 

License 
CC Attribution  NonCommercial  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Content Metadata
Subject Area  
Abstract 
Ankit Bahuguna  Query Embeddings: Web Scale Search powered by Deep Learning and Python A web search engine allows a user to type few words of query and it presents list of potential relevant results within fraction of a second. Traditionally, keywords in the user query were fuzzymatched in realtime with the keywords within different pages of the index and they didn't really focus on understanding meaning of query. Recently, Deep Learning + NLP techniques try to represent sentences or documents as fixed dimensional vectors in high dimensional space. These special vectors inherit semantics of the document. Query embeddings is an unsupervised deep learning based system, built using Python, Word2Vec, Annoy and Keyvi which recognizes similarity between queries and their vectors for a web scale search engine within Cliqz browser. The goal is to describe how query embeddings contribute to our existing python search stack at scale and latency issues prevailing in real time search system. Also is a preview of separate vector index for queries, utilized by retrieval system at runtime via ANNs to get closest queries to user query, which is one of the many key components of our search stack. Prerequisites: Basic experience in NLP, ML, Deep Learning, Web search and Vector Algebra. Libraries: Annoy.

00:00
Data management
Building
Query language
Einbettung <Mathematik>
Computer science
Planning
Bit
Physical system
World Wide Web Consortium
00:39
Area
Machine learning
Building
Expert system
Web browser
Mereology
Formal language
Power (physics)
Web browser
Sampling (statistics)
Power (physics)
Information retrieval
Software development kit
Internetworking
Hypermedia
Search engine (computing)
Hypermedia
Information retrieval
Software
Natural number
Search engine (computing)
Process (computing)
World Wide Web Consortium
Speech synthesis
01:07
Point (geometry)
Degree (graph theory)
Process (computing)
Matching (graph theory)
Link (knot theory)
Length
Combinational logic
Negative number
Web browser
Quicksort
Event horizon
Web browser
02:07
Point (geometry)
World Wide Web Consortium
Process (computing)
Hypermedia
Search engine (computing)
Multiplication sign
Information retrieval
Gradient
Query language
Parameter (computer programming)
Endliche Modelltheorie
Quicksort
02:50
Predictability
Web page
Group action
State of matter
Multiplication sign
Price index
Bit
Coma Berenices
Login
Type theory
Facebook
Subject indexing
Frequency
Query language
Query language
Ranking
Summierbarkeit
Resultant
04:05
Web page
Subject indexing
Process (computing)
Query language
State of matter
Search engine (computing)
Set (mathematics)
Ranking
Resultant
05:04
Point (geometry)
Focus (optics)
Context awareness
Point (geometry)
Electronic mailing list
Shared memory
Bit
Semantics (computer science)
Number
Number
Word
Arithmetic mean
Query language
Query language
Representation (politics)
06:04
Area
Point (geometry)
Similarity (geometry)
Distance
Trigonometric functions
Formal language
Subject indexing
Number
Word
Vector space
Query language
Query language
Endliche Modelltheorie
Thermal conductivity
Spacetime
06:48
Freeware
Structural load
Maxima and minima
Similarity (geometry)
Mass
Distance
Faulttolerant system
Product (business)
Word
Representation (politics)
Game theory
Physical system
Sine
Electronic mailing list
Bit
Unsupervised learning
Residual (numerical analysis)
Type theory
Word
Arithmetic mean
Process (computing)
Query language
Right angle
Pattern language
Quicksort
Game theory
Representation (politics)
Resultant
Sinc function
Spacetime
08:41
Presentation of a group
Context awareness
Gradient
Artificial neural network
Data model
Architecture
Word
Endliche Modelltheorie
Analytic continuation
Computerassisted translation
Gradient descent
Addition
Artificial neural network
Bit
Continuous function
Subject indexing
Word
Stochastic
Hill differential equation
Moving average
Table (information)
Window
Gradient descent
Row (database)
Reverse engineering
09:45
Word
Context awareness
Multiplication sign
Execution unit
Moving average
Endliche Modelltheorie
Endliche Modelltheorie
Sequence
Stochastische Sprache
Formal language
10:21
Predictability
Numbering scheme
Data dictionary
Sequence
Formal language
Process modeling
Formal language
Data model
Word
Endliche Modelltheorie
Moving average
Endliche Modelltheorie
Computerassisted translation
Stochastische Sprache
10:59
Context awareness
Randomization
MUD
Linear regression
Set (mathematics)
Distance
Replication (computing)
Sequence
Wave packet
Exact sequence
Word
Gleichverteilung
Endliche Modelltheorie
Moving average
Iteration
Endliche Modelltheorie
Computerassisted translation
Position operator
Physical system
12:06
Context awareness
Multiplication sign
Gradient
Sequence
Number
Product (business)
Data model
Word
Coefficient of determination
Gravitation
Computer worm
Endliche Modelltheorie
Mathematical optimization
Window
Hydraulic jump
Descriptive statistics
Form (programming)
13:20
State observer
Slide rule
Context awareness
Distribution (mathematics)
Gradient
Insertion loss
Mereology
Computer
Likelihood function
Dimensional analysis
Product (business)
Data model
Word
Insertion loss
Noise
Einbettung <Mathematik>
Mathematical optimization
Form (programming)
Parameter (computer programming)
Derivation (linguistics)
Message passing
Word
Function (mathematics)
Einbettung <Mathematik>
Natural language
Table (information)
Gradient descent
14:33
Context awareness
Greatest element
Identifiability
Information
Divisor
Gender
Projective plane
Word
Word
Arithmetic mean
Vector space
Einbettung <Mathematik>
Information
Metropolitan area network
Spacetime
15:57
Multiplication sign
Constructor (objectoriented programming)
Similarity (geometry)
Dressing (medical)
Element (mathematics)
Word
Voting
Query language
Query language
Representation (politics)
Quicksort
Game theory
Address space
Spacetime
16:41
Web page
Presentation of a group
Statistics
Euler angles
Weight
Multiplication sign
Maxima and minima
Survival analysis
Number
Element (mathematics)
Frequency
Bit rate
Term (mathematics)
Average
Operator (mathematics)
Query language
Game theory
Dialect
Term (mathematics)
Subject indexing
Word
Query language
Schmelze <Betrieb>
Normal (geometry)
Quicksort
Bounded variation
Sinc function
Resultant
Row (database)
18:20
Web page
Multiplication sign
Structural load
Web page
Price index
Term (mathematics)
Distance
Computer
Frequency
Subject indexing
Sign (mathematics)
Query language
Telecommunication
Internet forum
Subject indexing
Query language
Endliche Modelltheorie
Endliche Modelltheorie
MiniDisc
Physical system
19:36
Forcing (mathematics)
Execution unit
Similarity (geometry)
Insertion loss
Student's ttest
Cartesian coordinate system
Semantics (computer science)
2 (number)
Radius
Query language
Query language
Endliche Modelltheorie
Metric system
20:19
Run time (program lifecycle phase)
Latin square
Expression
1 (number)
Distance
Mereology
Approximation
Product (business)
Wave packet
Data model
Process (computing)
Query language
Semiconductor memory
Network topology
Representation (politics)
Right angle
Heuristic
Endliche Modelltheorie
Quicksort
Musical ensemble
Physical system
22:06
Point (geometry)
Vektoranalysis
State of matter
Multiplication sign
Personal digital assistant
Query language
String (computer science)
Network topology
Query language
Data structure
Endliche Modelltheorie
Linear map
Spacetime
23:02
Point (geometry)
Randomization
Multiplication sign
Point (geometry)
Binary file
Mereology
Number
Heegaard splitting
Binary tree
Network topology
Network topology
Spacetime
Spacetime
23:38
Point (geometry)
Word
Vector space
Query language
Network topology
Point (geometry)
24:06
Point (geometry)
Polygon
Link (knot theory)
State of matter
Similarity (geometry)
Branch (computer science)
Mereology
Faulttolerant system
Proper map
Formal language
Heegaard splitting
Goodness of fit
Network topology
Forest
Representation (politics)
Endliche Modelltheorie
Condition number
Area
Priority queue
Point (geometry)
Thresholding (image processing)
Software maintenance
Sequence
Uniformer Raum
Network topology
Configuration space
Right angle
Quicksort
Power set
Metric system
25:54
Wrapper (data mining)
Price index
Software maintenance
Event horizon
System call
Entire function
Wave packet
Word
Subject indexing
Term (mathematics)
Network topology
Forest
String (computer science)
Video game
Key (cryptography)
26:50
Web page
Word
Query language
Real number
Fuzzy logic
Set (mathematics)
Price index
Endliche Modelltheorie
Resultant
27:47
Context awareness
Observational study
Information
Real number
Similarity (geometry)
Realtime operating system
Protein
Resultant
Social class
28:17
Point (geometry)
Web page
MUD
Query language
28:58
Web page
Execution unit
Information management
Source code
Bit
Hidden Markov model
Arm
Valueadded network
Product (business)
Prototype
Search engine (computing)
Uniform resource name
FAQ
Endliche Modelltheorie
Resultant
Library (computing)
Window
Wide area network
29:42
Slide rule
Metric system
Web page
Coma Berenices
Emulation
Similarity (geometry)
CAN bus
Software
Semiconductor memory
Query language
Uniform resource name
Query language
Conditionalaccess module
Physical system
World Wide Web Consortium
30:17
Web page
Metric system
Web page
Electronic mailing list
Similarity (geometry)
Mereology
Similarity (geometry)
Query language
Quicksort
Endliche Modelltheorie
Resultant
Physical system
World Wide Web Consortium
31:04
Windows Registry
Implementation
View (database)
Multiplication sign
Sheaf (mathematics)
Set (mathematics)
Mereology
Protein
Word
Different (Kate Ryan album)
Semiconductor memory
String (computer science)
Data structure
Data compression
Physical system
Key (cryptography)
Moment (mathematics)
Projective plane
Fitness function
Electronic mailing list
Data storage device
Database
Subject indexing
Word
Message passing
Query language
Personal digital assistant
Einbettung <Mathematik>
Right angle
Finitestate machine
Cycle (graph theory)
Quicksort
Resultant
00:00
most of the next speaker is thinking by woman and you'll be talking about query embeddings with skills followed by the ballooning and by few and I will be talking about Korean embeddings which is our system which we have developed at clicks on which does not use the learning and the system is entirely a bit about myself nasopharyngeal in research that clicks on background computer science and management systems and the planning and we building of that
00:40
search engine which is part of that proposal and also the browser works in the went to the areas that interest me out in the information retrieval declining and I want selected intuitive sense 20 12 so about clicks via
00:57
based in Munich its majority owned by the media and then the national team of 90 experts from 28 different countries and we combine the power of the search and browsing so
01:10
that Piaget redefining the browsing experience that sentence outcome and you can actually check out process so here I'm talking about search so start with some sort of links looks something like this so many open your Web browser what you usually do is you go for a link or you go for a search just what clicks experience gives you this event browser with the matches found which is intelligent enough to to directly give you decide based on what you the status such something
01:40
like a bike and he you get a point and length 1 to search for like where there's no you can't like the red uh and interesting I found out that on Monday that's today 41 degrees so they can and of course if you want to search for news you get little tiny so it's a combination of a lot of the built into a browser with negative ecology of such behind that's what can from so a
02:09
bit historically about how traditions such books so traditionally sort for us so so it happens very long standing problems and by so timing information retrieval of that search and what they used to come up with was redirect model of the documents on your grade and then match at the time and she became the point process was to come up with like the best you are the best
02:34
documents for the use of going all the time found out like search engines and 1 of the most of the Vatican registered that 2 . 0 there was a lot of media which came in and people expected more from the Web to come up with ourselves during a
02:51
search is based on too much user queries where the query and index and our next is based on query logs so if you type Facebook or F. B. it has to go to facebook . com given such an index you can actually construct of much more meaningful and search users experience for the user because it's and rich by how many times people actually greedy and lead to the same page so what we aim is to reconstruct alternative queries given a user query so if we find it directly that's great but if something which is which is different of you not seen before you try to construct them at runtime and try to search for those results in index and brought index looks something like this so you have a query and it has your ideas which means a URI is linked to some action value and that you are is the actual you that go to given the query and the frequency counts and everything which actually allows us to make a prediction of the state is the rightful page that the user actually intended to give an
04:00
overview of the search problem itself in a bit more in depth the search problem can actually be seen
04:06
as a twostep process 1st 1 is recall and the 2nd 1 is ranking so given the index of like billions of pages what you try to imagine what is like get the most best set of candidate pages that you can say OK given a user query that correspond to the the say I say I wanna get that stand thousand pages and 10 thousand dollars from my billions of pages which best fit the correct and then the problem comes up is the ranking problem the ranking problem means given the stand thousand pages what we want is given the
04:38
talk Dan 100 3 results as you might know that given any search engine result page the 2nd page the data page that everybody concerns about the 1st state so it's very important to have top 5 or top 3 results as the best result for your query and that's all we care about that but it's not what we want is like a given a user query we try to come up with 3 best results from what ought to be pages and index so that as the
05:07
nonincome so what we aim at the clicks is like the China traditional method of search using fuzzy matching the words in the query to a document but then we also used utilizing something which is a bit deeper and the different which is using something called semantic vectors are distributed gives additional of words but what we actually try to do is we send our queries as vectors of that is like a
05:34
fixeddimensional floating point list of numbers and so what we try to do is given a query and given a vector that vector should semantic we understand the meaning of the word this particular thing is called a distributed representation the words which appear in the same context share semantic meaning and the meaning of the query is defined by this vector these predators in an unsupervised yet supervised manner very focus
06:06
on the conduct of the words in the sentences or the queries and on the same and the area that we actually studied this thing is going to probabilistic language models the similarity between these queries that is measured as cosine distance between 2 vectors so if 2 2 vectors are close together in the vector space that they are more similar and hence what we do is that we try to get the closest queries based on how which are the closest vectors in space of the user query vector and this gives us a rate constant for the
06:37
1st said that we actually fetched from an index which most accurately correspond to our user query so a simple example of a lotta illustrate
06:48
this is like a user types a simple query like since PC download which is game what system actually gives us is sort of a list of these stories along with their design distance to the query vector that user died so given the grace and staying busy download we get like sort of started last where the 1st 1 is like the most closest to do since game pieces don't move very mind like it's it's a bit different understanding was you're not doing of what to what not to but vector director much the vector for the trace of the residual thinking PC download is much closer to to downloading the systems understand from it's back and which is Bob words because we want to optimize the space as well so eventually the vector comes out to be the same and that is an right of the provide distances as move down because I'm distance increases and we'll see later you start getting some of that far off results so we usually concerned about like users queries that come to the system
07:56
so a bit more how this learning process works and what what we actually utilizing production is you something about unsupervised learning technique to learn these words in patients so effectively what you want to learn is like given the pattern receives annotation of the word you would like the distance of like 2 words W minus the W. dash to reflect a meaning similarity so for example if the vector lacking the use of that like a vector of mass and then you add a vector you probably get a vector which is close to that of Queen and the ergodic them that defines this is what to record and we learned this representation of the corresponding vectors so a bit more about what
08:43
record it was actually given by Mikolov in 2013 they have 2 different models their country is that words used in addition and continuous Skipgram on this we focus on again this review presentation that I learned by neural networks of both models are trained using to costofliving descent and back that
09:04
a bit more of these you an indication of how this works is like in a sea wall and use them for its money on the left we have like a context words of 5 words say and we want to try to predict the center with the given like the cat sat on the mat the words that has to predicted given the other context words and the Skipgram model does the exact reverse the given the center word in the sentence of the context window you try to predict the surrounding words so given these 2 models you cannot you like define these vectors for each word that is seen as a lookup table and you can learn them using this to got to gradient descent and probably skip this
09:48
because it has a lot of imaginative but still so what we try to optimize is and you promising language model tries to optimize a given comedy times you'll see a particular word given the context and given how many times you see of words not in its context so of best language model will actually say OK given a certain sequence of words you see an excellent and units of the sequence of words you will not
10:14
see a certain way and that's where the model actually and this is 1 example of how a
10:22
traditional language model actually looks so for example this the cat sits on the mat you try to predict what is it about it you might coming after the sequence in a certain way of vocabulary dictionary that you have to to have but the only cash here is that we have to worry about is like your vocabulary could be very very huge so what you might look at it like you want to try to predict availability of word the have 7 to 10 million words in your company you want to predict the ability
10:51
of the 1 single work across all of so to a wider scheme what we use something colonized contrasted estimation within
11:01
that we don't use the entire vocabulary to test out word against what we do is like say OK we pick a set of 5 noisy words like 10 noisy so where this would be a sequence of cat sits on the mat you pretty much sure that are the matter is the right word but so can be other words but then the the cat sits on the other hand or something like that so these words will not be the exact sequence that you find in like and you can take those words at random from uniform distribution and get these noisy words as your training examples so what
11:37
effectively a modern dance right now given the sequence what is the right word to to get next as the next 1 and given the sequence which are not the right words so if the system differentiates this over and over again and millions of examples and train this was certain iterations you probably get a model which is able to differentiate the the position of the right words with the position of the bad words separated replicate distance so let's see like how this so will work with an
12:09
example of that so for example uh there's a document like the quick brown fox jumps over the lazy dog and we have a context window size of 1 we say OK given the 1st 3 words that playground I have the same 2 would click on the surrounding words as the and ground so I want to get in the country is that a lot model what I want is like any predict great based on their and run so just like a recent example of but at production found numbers much better so effectively what we try to find out is like we tried in the context words from a dog target was so we predict and ground from which the given quick predict what is a graduate of particulars of gravity of and the objective function is defined over the entire dataset so whatever dataset we have ideas is built on a lot of Wikipedia it out a lot of gradient our
13:05
title description that we have a number of other textual data that we have to actually learn how the grid is a form of whole sentence a form of what is the sequence of these words and we use SGD for this say at etching time t you have like and face very have quicker and
13:22
and probabilistically they from which so we select make some noisy examples of the light the nonnoise like 1 and we say she she should not be like part of this so Annex in computer loss for this pair of observers and noisy examples and you get this objective function so what we try to do is give message is like a lot of of the value of this form so given the probability which is the product sent directly sustained ends of piece of contexts are there and you should be given a score of 1 and human language quicker and she discourse should be 0 so if you update the value of the dollar because that depends on we can maximize the objective function as like the largest it's a like log likelihood and we can actually
14:12
a gradient ascent on top of it so we perform an update on the embeddings and we this process over and over again for different examples over the entire corpus and we come up with like a lookup table for voice and of vectors so we can define the dimensionality of a vector as a signal my slides that we use 100 as dimensions to represent that word and that's pretty for us so how do these
14:35
word embeddings actually look like or what you get actually done is
14:39
something like this so if the if you see seen the light of the border vectors like you protect these vectors in space what you find is like the vector from man and woman is not to be the distance from like in and we have you find this not just creation gender would ultimately use like what it's like walking and walked and swimming and some because you might have sentences that light and the guy the other person is walking in the person is running what's all he's runs would occur in the same context and this is what the learner actually catch the nicely and understand that we actually also have like some other informational features like countries and capitals like Spain and Madrid are it to me and long Germany in Berlin so these other country Gupta relationships but this is like a projection onto the scale using tSNE you actually can see I mean it's the shot but you Natalie select some factors here at the bottom and up here at the top you have like me should hold of there's something like more or less a more objective identifiers and this is like the a projection of it and see if you see there are more of demand meaning words are actually closer in vector space and this is a very important property because if
15:58
you can try to leverage this and construct light sentence or document representation you probably get like similar documents and spaces right and that is what really embeddings addresses so the way we generate a praelector using these what vectors is like for the same query
16:14
since PC download the vector for each of these words of what we do is like we just don't use these 1 vectors as it gets at the time relevance and tell elements for us is that are sort a custom dresses that income up but actually what you see is like you get a school of what each started in the in the prairies so this tells us make sense is the most important relevant vote in the query because of the name or name mind Annex
16:43
what you do is you use this of some relevance and also objected to calculate the every vector average of these of of these vectors of what a weighted average actually means is like the given 2 vectors of 2 different words and their rates of the term relevance you do a number on average and you get like an average presentation of those words and effectively what we actually see upgrade vector is is this averaging presentation so given a vector and the relevance to get like this attitude variation and this represents operating records effectively at the end since PC download is nothing but this 100 dimensional vector and that is what we use as operating
17:27
but there are about 200 elements so we have 2 different modes of on the elements are usually it is the frequency of the words and you find that it's not a very good for scaling and also makes you use something like the idea of all of these size of nation's but would we you have used is something about 5 is like how given the number of queries link to a page many times that don't appear in those top 5 queries and that's a much better indication to us given the data on the data that have that began roughly say that the given the word statistics given this number and you need a napkin frequency and get something like a constant dominance and the relative 1 is actually sort of a normalization on the on that we have in mind so what we found out is like to normalize your scores across all the regions of the index of the vectors of slightly better I get survival results
18:22
and use all the data dependent on the computer monitor flight each time for sign next and for example this looks something like this for each load lovely features like
18:31
frequency document frequency and you have you can look at all this stuff and similarly for the other so I have
18:41
read what we actually create a Our is like a pretty electronics now given at a distance index which has all the documents you have all the queries and their vectors and we tried to apply a vector of so we cannot do this for all the queries that are just too many places so what we found out is like given all the pages are next actually just picked the top 5 queries which effectively represents the page and a dichotomous top readings and from the page models we can actually get the data so roughly become often like 465 million queries about which features and all the pages and index and we try to learn a query vectors for each 1 of them and if you just like done the whole system on this it's like on dates and what we actually have the problem now is like how do we get similar queries from these 465 million queries that given a user query finally the
19:37
closest 50 queries from this for 165 million so do we find radius should be used brute force it's too slow that
19:47
students will we cannot use hashing techniques that effectively because it's not very accurate for erectus because the vector of semantic units small loss in precision would lead to like you I results so what our solution actually required was the application of resigned similarity metric was somehow it should have this scale for like on justify million queries and take 10 seconds or less so the way you came up be the answer was something about approximate nearest neighbor vector model and they were actually pretty helpful for us so
20:21
what's the 1 that uses poly annoying it is steepest fighting rapidly that exist for this to Billy approximately this new models for all the vectors of queries that we have an ice actually use production of Spotify economic adjustment we can train on on the quantitative defined region and documents and ones but it's too slow because it is a sort of memory memoryintensive uh so what we do is like you don't train them all of them together we have a cluster very actually was these models along with a certain next so we train them understand what it like 46 million queries each and which centered on trees what these trees actually not explained next the size of the models like around 27 needs but shelters and expression that what you get off training which is like around 270 DigiFused if you just give it to 10 months and everything is stored in RAM because for us the most important thing is latency
21:19
and given searching 1 disaster happens pretty quickly made I assure them of our harvesting actually is used in production and then what at runtime which you try to do is like a query or the stands out simultaneously and then sort them based on what was and distances that you get the different parts of your shelves might have different process queries so eventually what you want is like you want is like the best representation of those queries which of those latin music right and really actually kind of nice but of course like 755 51 heuristic as homogeneous queries would be very good for the system that doesn't really like the decrease part of the recall or anything on it in that part of it this haven't used it in courses well so but but 1st I want to actually explain like how we actually
22:07
use annoying and how I actually works and it's 1 of my string books that you cannot use if you are using the vector calculus or like using like something vectorbased approaches for your recall the ranking and you try to find out the nearest point to any great point in like a sub in sublinear time so what you're trying to find out like you cannot do it 1 by 1 so it's not work and what you want to do is like trying to do something and I can get that get those those queries and like global friend in the best case data structure for that is a tree so given my quality of predictors of represented by like each point the single great we try to find out is like say a given our
22:49
certain point which is the nearest point are like where you use the vector which is like the use of great at some random point on the state space find the nearest 1 so to train the model for us to bring that to what you do is like you
23:03
just look this at this time was space on because they're so use split take 2 points at random and spilled the space you do it again and then you get
23:13
something like a tree so you have like a certain segmentation of certain number of points in the cluster which are like in different parts of the tree you can
23:23
and you end up with a huge find to a nice point about this binary tree is like the points that are close to each other in space are more likely to be close to each other in the tree itself so if you're trying to navigate through a knowledge and you try to come up with like from
23:39
China and words that words to travel that you hold crimes would be composed of all the similar nodes in the vector space and this is a very important feature so how
23:50
do we search for a point in the in the tree and these splits that we have been so say that x and the rapid that x is our like user query vector and we try to find the courage of nearest vectors to this but vector and give me the phrase related to it so
24:07
what you do is like you end up with like search for a point and you just jump down a part of from language and you get like this 7 neighbours that you get and you use like of design metrics of how close it is a very close link between the 1 . 5 it's much much more grows up if it's more than 1 was an takes values between minus 2 into so many natural Saturday how close you vectors but have what the problem is like you winning feel like 7 neighbours coming what the 1 more neighbors or if you want more than 7 because of grace so what we use is something called we don't just navigate to 1 branch of the tree began to navigate to the 2nd branch of the tree and this is the maintenance of a proper priority queue and can actually tell us what the what the parts of the tree and get like these clues as vectors and so you don't and not only like to look at the right like you bought but also like a slightly darker part so you see below decided to treat it as that's the split up in fine have both of these sort of areas in hyperspace are like closer to the use of but sometimes you
25:16
find like because we didn't randomly what happens is like you can actually most of them some nice songs because you just across 2 different points so what you do is like to minimize this you train a forest of trees and actually looks something like this so you not only like train on like a sentence a sequence of our past let's let you randomise those across state entries so effectively your model on the stand configuration that once and searches for them in real time and by and this gives you like a pretty good representation is and you saw them and get like would greatly conditions you get like some good similarity between birth so we
25:57
train a forest of trees so 1 bad or like in this and it's in
26:03
my life maybe it's a feature of what is like it doesn't let you store string values but it actually allows you to store the indexes so you got to still like for a great things came easy download a given this like a unique annexes like 5 0 1 and that 1 restored the vector and not pretty annoying you get like an index that of all the indexes which are close to it so what we what we have that clicks and like event of the system call to the which is like a value index which is also responsible for the entire search index we found it is much better than our readers our anything to compare them in terms of reads and maintainability we developed inhouse it's written in C + + and Python wrappers again and so it actually still is your index to is
26:51
annotation so what you effectively see there's no guarantee user great you generate a vector you such that in the online models those the clauses great as you get a nexus for these then you predict the next and get all the qualities and effectively you can fetch the pages for all the queries that are closest to the use of and this is how we improve what we call a and the results of the amazing in the sense that we get
27:20
much richer set of candidate pages of what's 1st but would like a higher possibility of expected pages on them and the reason it is going this way is because now they're going beyond synonyms and doing a simple fuzzy logic but actually using how that because I learned semantic and it's goes up sometimes but most of the time you'll find like there is a definite improvement because you have your this try to learn those words which are new the
27:48
context and the that's a very important feature and proteins and now match in real time using the
27:54
design vector similarity between gray gray vectors class using the past information chill techniques that you works and overall there's a recall improvement from previous studies that we had was around 5 to 7 per cent that's the improvement of the final and uninvested how much improvement in this and translated improvement in the final top 3 results from 1 % so that gives us a clear and their identification of
28:19
their these vectors actually not useful not and the systematic triggers only for those places you've never seen before so that's also like a very very important point here because but it seems varies like if we are moving you actually managed to a certain page your definition about it but for queries which are not seen before which are new to us which are not in next you have to go beyond the traditional techniques and this 1 can we actually have to so before I conclude like I actually wanted to show like what it does it actually looks like so this is like the explosion and
28:56
this is search page and we
28:59
have to have this that which comes up the idea of this was to the induced that whole step of search engine results page and you can actually get like directly to
29:09
operate so the libraries
29:12
are like Spotify and on which is again available and get have key which is the explicitness and get out that you'll actually find it's useful it's pretty active prejudiced way in but to that can be trained
29:26
using Jensen if you want to do look for prototype but I would recommend to use the loosely called because it's a it's a bit more optimizing the found there are certain variations in like the models that are developed because of the competition that we see that there are other clicks source products that you can
29:44
actually contribute to the you want to find the slides is it is
29:49
actually because that it's QE by and now they don't analyze last we might so before I conclude
29:57
and I just say this thing that we are still like looking on this system and we have like a course was enormously ready but you're trying to look up other approach deepening like using something Charter memory networks but the only downside of that approaches like that most of his user queries like keywordbased and you don't usually find people actually typing OK what is that
30:18
height of statue of liberty and those other thing to body I think such a body height and that sort of linguistic relationships but maybe the captured by stands there more complicated but the system is flexible enough to still give you pretty good results so we are trying to use this new method that we have into thinking and it's trying to we use this great debate similarity using document vectors there again using like
30:43
a sort of 4 differentiated listing among all like a paragraph to model and trying to also into a solid system for the US pages which are never played before so here we have a lot of most of these stages we try to find out what what would be the best way to represent those pages so you using vector traditional and approach something like this last part of the lease as it
31:07
thank you and I'll finish it is called which is given by John with with fit in 1957 they said you shall know a word by the company it keeps and NiccolĂ˛ actually developed in 1
31:20
using the same contextual approach of words and it actually had us you because of its so thank you be without the questions yeah the yeah and yeah so 1 of the reasons had was like of the 1 like a unified view so we tried a lot of these key value stores assess tried registry we try like a traditional database we tried search with for the founders like money the a bit different in the sense that we sometimes have a vector and X we we need like our values should be a list of vectors sometimes it is just strings sometimes they're repeated strings were like you have the same is on data structures again and again so you can actually optimise it more if you can write those parts of the world of sense we started by doing that so that I think he has a much bigger project here and I'm not really the exploding but where I can say is like it has a lot of features and like you actually compress your keys it into a message that a sort of compression using said that was not and that gives you like a much school has a vector it's faster to index it's faster to read and scalable and done that we don't actually have to put this in memory began to still have a condensed and do a memory so you can still have like lots of data other can treat that what we we actually wanted and I use case was we wanted to reach to be optimized because we don't have rights at all you can come by and then except once and then what we want to times like user query and give data from the index for that keywords the so yeah and they all right so that means you're already talked about having no rights some without the base I was wondering how you handled having new new data new queries new data to train your embedding so uh I and that I would say In neural nearest neighboring that's years also from what over there are still no you know very well known nothing very and all the learner implementations of nearest neighbors but again you know just update to the next year or so it's true so what we do is like we have really cycle where combine each and my next at the moment and we have the automated get new brazen Newberry vectors with this that's not like a onetime system but it's too Alexei immediately of tomorrow I want to include like a set of results which are like new proteins for tomorrow I cannot do that but to address the same issue we have news so the news would allow actually handles the support and the most recent part of like say anything that is trending right now you have like in the new section so given like the concepts using the finite state what you want but was already available on Wikipedia before its so we actually have these concepts which are under gun from the beginning and that's what we use so you cannot understand the concept for the new words like some explain each xyz Genics word which comes in my word that comes up like tomorrow you probably have a but it's it's a very hard for me yeah would anyone else OK that's given began both