Query Embeddings: Web Scale Search powered by Deep Learning and Python

Video in TIB AV-Portal: Query Embeddings: Web Scale Search powered by Deep Learning and Python

Formal Metadata

Query Embeddings: Web Scale Search powered by Deep Learning and Python
Title of Series
Part Number
Number of Parts
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Ankit Bahuguna - Query Embeddings: Web Scale Search powered by Deep Learning and Python A web search engine allows a user to type few words of query and it presents list of potential relevant results within fraction of a second. Traditionally, keywords in the user query were fuzzy-matched in realtime with the keywords within different pages of the index and they didn't really focus on understanding meaning of query. Recently, Deep Learning + NLP techniques try to represent sentences or documents as fixed dimensional vectors in high dimensional space. These special vectors inherit semantics of the document. Query embeddings is an unsupervised deep learning based system, built using Python, Word2Vec, Annoy and Keyvi which recognizes similarity between queries and their vectors for a web scale search engine within Cliqz browser. The goal is to describe how query embeddings contribute to our existing python search stack at scale and latency issues prevailing in real time search system. Also is a preview of separate vector index for queries, utilized by retrieval system at runtime via ANNs to get closest queries to user query, which is one of the many key components of our search stack. Prerequisites: Basic experience in NLP, ML, Deep Learning, Web search and Vector Algebra. Libraries: Annoy.
Data management Building Query language Einbettung <Mathematik> Computer science Planning Bit Physical system World Wide Web Consortium
Area Machine learning Building Expert system Web browser Mereology Formal language Power (physics) Web browser Sampling (statistics) Power (physics) Information retrieval Software development kit Internetworking Hypermedia Search engine (computing) Hypermedia Information retrieval Software Natural number Search engine (computing) Process (computing) World Wide Web Consortium Speech synthesis
Point (geometry) Degree (graph theory) Process (computing) Matching (graph theory) Link (knot theory) Length Combinational logic Negative number Web browser Quicksort Event horizon Web browser
Point (geometry) World Wide Web Consortium Process (computing) Hypermedia Search engine (computing) Multiplication sign Information retrieval Gradient Query language Parameter (computer programming) Endliche Modelltheorie Quicksort
Predictability Web page Group action State of matter Multiplication sign Price index Bit Coma Berenices Login Type theory Facebook Subject indexing Frequency Query language Query language Ranking Summierbarkeit Resultant
Web page Subject indexing Process (computing) Query language State of matter Search engine (computing) Set (mathematics) Ranking Resultant
Point (geometry) Focus (optics) Context awareness Point (geometry) Electronic mailing list Shared memory Bit Semantics (computer science) Number Number Word Arithmetic mean Query language Query language Representation (politics)
Area Point (geometry) Similarity (geometry) Distance Trigonometric functions Formal language Subject indexing Number Word Vector space Query language Query language Endliche Modelltheorie Thermal conductivity Spacetime
Freeware Structural load Maxima and minima Similarity (geometry) Mass Distance Fault-tolerant system Product (business) Word Representation (politics) Game theory Physical system Sine Electronic mailing list Bit Unsupervised learning Residual (numerical analysis) Type theory Word Arithmetic mean Process (computing) Query language Right angle Pattern language Quicksort Game theory Representation (politics) Resultant Sinc function Spacetime
Presentation of a group Context awareness Gradient Artificial neural network Data model Architecture Word Endliche Modelltheorie Analytic continuation Computer-assisted translation Gradient descent Addition Artificial neural network Bit Continuous function Subject indexing Word Stochastic Hill differential equation Moving average Table (information) Window Gradient descent Row (database) Reverse engineering
Word Context awareness Multiplication sign Execution unit Moving average Endliche Modelltheorie Endliche Modelltheorie Sequence Stochastische Sprache Formal language
Predictability Numbering scheme Data dictionary Sequence Formal language Process modeling Formal language Data model Word Endliche Modelltheorie Moving average Endliche Modelltheorie Computer-assisted translation Stochastische Sprache
Context awareness Randomization MUD Linear regression Set (mathematics) Distance Replication (computing) Sequence Wave packet Exact sequence Word Gleichverteilung Endliche Modelltheorie Moving average Iteration Endliche Modelltheorie Computer-assisted translation Position operator Physical system
Context awareness Multiplication sign Gradient Sequence Number Product (business) Data model Word Coefficient of determination Gravitation Computer worm Endliche Modelltheorie Mathematical optimization Window Hydraulic jump Descriptive statistics Form (programming)
State observer Slide rule Context awareness Distribution (mathematics) Gradient Insertion loss Mereology Computer Likelihood function Dimensional analysis Product (business) Data model Word Insertion loss Noise Einbettung <Mathematik> Mathematical optimization Form (programming) Parameter (computer programming) Derivation (linguistics) Message passing Word Function (mathematics) Einbettung <Mathematik> Natural language Table (information) Gradient descent
Context awareness Greatest element Identifiability Information Divisor Gender Projective plane Word Word Arithmetic mean Vector space Einbettung <Mathematik> Information Metropolitan area network Spacetime
Multiplication sign Constructor (object-oriented programming) Similarity (geometry) Dressing (medical) Element (mathematics) Word Voting Query language Query language Representation (politics) Quicksort Game theory Address space Spacetime
Web page Presentation of a group Statistics Euler angles Weight Multiplication sign Maxima and minima Survival analysis Number Element (mathematics) Frequency Bit rate Term (mathematics) Average Operator (mathematics) Query language Game theory Dialect Term (mathematics) Subject indexing Word Query language Schmelze <Betrieb> Normal (geometry) Quicksort Bounded variation Sinc function Resultant Row (database)
Web page Multiplication sign Structural load Web page Price index Term (mathematics) Distance Computer Frequency Subject indexing Sign (mathematics) Query language Telecommunication Internet forum Subject indexing Query language Endliche Modelltheorie Endliche Modelltheorie MiniDisc Physical system
Forcing (mathematics) Execution unit Similarity (geometry) Insertion loss Student's t-test Cartesian coordinate system Semantics (computer science) 2 (number) Radius Query language Query language Endliche Modelltheorie Metric system
Run time (program lifecycle phase) Latin square Expression 1 (number) Distance Mereology Approximation Product (business) Wave packet Data model Process (computing) Query language Semiconductor memory Network topology Representation (politics) Right angle Heuristic Endliche Modelltheorie Quicksort Musical ensemble Physical system
Point (geometry) Vektoranalysis State of matter Multiplication sign Personal digital assistant Query language String (computer science) Network topology Query language Data structure Endliche Modelltheorie Linear map Spacetime
Point (geometry) Randomization Multiplication sign Point (geometry) Binary file Mereology Number Heegaard splitting Binary tree Network topology Network topology Spacetime Spacetime
Point (geometry) Word Vector space Query language Network topology Point (geometry)
Point (geometry) Polygon Link (knot theory) State of matter Similarity (geometry) Branch (computer science) Mereology Fault-tolerant system Proper map Formal language Heegaard splitting Goodness of fit Network topology Forest Representation (politics) Endliche Modelltheorie Condition number Area Priority queue Point (geometry) Thresholding (image processing) Software maintenance Sequence Uniformer Raum Network topology Configuration space Right angle Quicksort Power set Metric system
Wrapper (data mining) Price index Software maintenance Event horizon System call Entire function Wave packet Word Subject indexing Term (mathematics) Network topology Forest String (computer science) Video game Key (cryptography)
Web page Word Query language Real number Fuzzy logic Set (mathematics) Price index Endliche Modelltheorie Resultant
Context awareness Observational study Information Real number Similarity (geometry) Real-time operating system Protein Resultant Social class
Point (geometry) Web page MUD Query language
Web page Execution unit Information management Source code Bit Hidden Markov model Arm Value-added network Product (business) Prototype Search engine (computing) Uniform resource name FAQ Endliche Modelltheorie Resultant Library (computing) Window Wide area network
Slide rule Metric system Web page Coma Berenices Emulation Similarity (geometry) CAN bus Software Semiconductor memory Query language Uniform resource name Query language Conditional-access module Physical system World Wide Web Consortium
Web page Metric system Web page Electronic mailing list Similarity (geometry) Mereology Similarity (geometry) Query language Quicksort Endliche Modelltheorie Resultant Physical system World Wide Web Consortium
Windows Registry Implementation View (database) Multiplication sign Sheaf (mathematics) Set (mathematics) Mereology Protein Word Different (Kate Ryan album) Semiconductor memory String (computer science) Data structure Data compression Physical system Key (cryptography) Moment (mathematics) Projective plane Fitness function Electronic mailing list Data storage device Database Subject indexing Word Message passing Query language Personal digital assistant Einbettung <Mathematik> Right angle Finite-state machine Cycle (graph theory) Quicksort Resultant
most of the next speaker is thinking by woman and you'll be talking about query embeddings with skills followed by the ballooning and by few and I will be talking about Korean embeddings which is our system which we have developed at clicks on which does not use the learning and the system is entirely a bit about myself nasopharyngeal in research that clicks on background computer science and management systems and the planning and we building of that
search engine which is part of that proposal and also the browser works in the went to the areas that interest me out in the information retrieval declining and I want selected intuitive sense 20 12 so about clicks via
based in Munich its majority owned by the media and then the national team of 90 experts from 28 different countries and we combine the power of the search and browsing so
that Piaget redefining the browsing experience that sentence outcome and you can actually check out process so here I'm talking about search so start with some sort of links looks something like this so many open your Web browser what you usually do is you go for a link or you go for a search just what clicks experience gives you this event browser with the matches found which is intelligent enough to to directly give you decide based on what you the status such something
like a bike and he you get a point and length 1 to search for like where there's no you can't like the red uh and interesting I found out that on Monday that's today 41 degrees so they can and of course if you want to search for news you get little tiny so it's a combination of a lot of the built into a browser with negative ecology of such behind that's what can from so a
bit historically about how traditions such books so traditionally sort for us so so it happens very long standing problems and by so timing information retrieval of that search and what they used to come up with was redirect model of the documents on your grade and then match at the time and she became the point process was to come up with like the best you are the best
documents for the use of going all the time found out like search engines and 1 of the most of the Vatican registered that 2 . 0 there was a lot of media which came in and people expected more from the Web to come up with ourselves during a
search is based on too much user queries where the query and index and our next is based on query logs so if you type Facebook or F. B. it has to go to facebook . com given such an index you can actually construct of much more meaningful and search users experience for the user because it's and rich by how many times people actually greedy and lead to the same page so what we aim is to reconstruct alternative queries given a user query so if we find it directly that's great but if something which is which is different of you not seen before you try to construct them at runtime and try to search for those results in index and brought index looks something like this so you have a query and it has your ideas which means a URI is linked to some action value and that you are is the actual you that go to given the query and the frequency counts and everything which actually allows us to make a prediction of the state is the rightful page that the user actually intended to give an
overview of the search problem itself in a bit more in depth the search problem can actually be seen
as a two-step process 1st 1 is recall and the 2nd 1 is ranking so given the index of like billions of pages what you try to imagine what is like get the most best set of candidate pages that you can say OK given a user query that correspond to the the say I say I wanna get that stand thousand pages and 10 thousand dollars from my billions of pages which best fit the correct and then the problem comes up is the ranking problem the ranking problem means given the stand thousand pages what we want is given the
talk Dan 100 3 results as you might know that given any search engine result page the 2nd page the data page that everybody concerns about the 1st state so it's very important to have top 5 or top 3 results as the best result for your query and that's all we care about that but it's not what we want is like a given a user query we try to come up with 3 best results from what ought to be pages and index so that as the
non-income so what we aim at the clicks is like the China traditional method of search using fuzzy matching the words in the query to a document but then we also used utilizing something which is a bit deeper and the different which is using something called semantic vectors are distributed gives additional of words but what we actually try to do is we send our queries as vectors of that is like a
fixed-dimensional floating point list of numbers and so what we try to do is given a query and given a vector that vector should semantic we understand the meaning of the word this particular thing is called a distributed representation the words which appear in the same context share semantic meaning and the meaning of the query is defined by this vector these predators in an unsupervised yet supervised manner very focus
on the conduct of the words in the sentences or the queries and on the same and the area that we actually studied this thing is going to probabilistic language models the similarity between these queries that is measured as cosine distance between 2 vectors so if 2 2 vectors are close together in the vector space that they are more similar and hence what we do is that we try to get the closest queries based on how which are the closest vectors in space of the user query vector and this gives us a rate constant for the
1st said that we actually fetched from an index which most accurately correspond to our user query so a simple example of a lotta illustrate
this is like a user types a simple query like since PC download which is game what system actually gives us is sort of a list of these stories along with their design distance to the query vector that user died so given the grace and staying busy download we get like sort of started last where the 1st 1 is like the most closest to do since game pieces don't move very mind like it's it's a bit different understanding was you're not doing of what to what not to but vector director much the vector for the trace of the residual thinking PC download is much closer to to downloading the systems understand from it's back and which is Bob words because we want to optimize the space as well so eventually the vector comes out to be the same and that is an right of the provide distances as move down because I'm distance increases and we'll see later you start getting some of that far off results so we usually concerned about like users queries that come to the system
so a bit more how this learning process works and what what we actually utilizing production is you something about unsupervised learning technique to learn these words in patients so effectively what you want to learn is like given the pattern receives annotation of the word you would like the distance of like 2 words W minus the W. dash to reflect a meaning similarity so for example if the vector lacking the use of that like a vector of mass and then you add a vector you probably get a vector which is close to that of Queen and the ergodic them that defines this is what to record and we learned this representation of the corresponding vectors so a bit more about what
record it was actually given by Mikolov in 2013 they have 2 different models their country is that words used in addition and continuous Skip-gram on this we focus on again this review presentation that I learned by neural networks of both models are trained using to cost-of-living descent and back that
a bit more of these you an indication of how this works is like in a sea wall and use them for its money on the left we have like a context words of 5 words say and we want to try to predict the center with the given like the cat sat on the mat the words that has to predicted given the other context words and the Skip-gram model does the exact reverse the given the center word in the sentence of the context window you try to predict the surrounding words so given these 2 models you cannot you like define these vectors for each word that is seen as a lookup table and you can learn them using this to got to gradient descent and probably skip this
because it has a lot of imaginative but still so what we try to optimize is and you promising language model tries to optimize a given comedy times you'll see a particular word given the context and given how many times you see of words not in its context so of best language model will actually say OK given a certain sequence of words you see an excellent and units of the sequence of words you will not
see a certain way and that's where the model actually and this is 1 example of how a
traditional language model actually looks so for example this the cat sits on the mat you try to predict what is it about it you might coming after the sequence in a certain way of vocabulary dictionary that you have to to have but the only cash here is that we have to worry about is like your vocabulary could be very very huge so what you might look at it like you want to try to predict availability of word the have 7 to 10 million words in your company you want to predict the ability
of the 1 single work across all of so to a wider scheme what we use something colonized contrasted estimation within
that we don't use the entire vocabulary to test out word against what we do is like say OK we pick a set of 5 noisy words like 10 noisy so where this would be a sequence of cat sits on the mat you pretty much sure that are the matter is the right word but so can be other words but then the the cat sits on the other hand or something like that so these words will not be the exact sequence that you find in like and you can take those words at random from uniform distribution and get these noisy words as your training examples so what
effectively a modern dance right now given the sequence what is the right word to to get next as the next 1 and given the sequence which are not the right words so if the system differentiates this over and over again and millions of examples and train this was certain iterations you probably get a model which is able to differentiate the the position of the right words with the position of the bad words separated replicate distance so let's see like how this so will work with an
example of that so for example uh there's a document like the quick brown fox jumps over the lazy dog and we have a context window size of 1 we say OK given the 1st 3 words that playground I have the same 2 would click on the surrounding words as the and ground so I want to get in the country is that a lot model what I want is like any predict great based on their and run so just like a recent example of but at production found numbers much better so effectively what we try to find out is like we tried in the context words from a dog target was so we predict and ground from which the given quick predict what is a graduate of particulars of gravity of and the objective function is defined over the entire dataset so whatever dataset we have ideas is built on a lot of Wikipedia it out a lot of gradient our
title description that we have a number of other textual data that we have to actually learn how the grid is a form of whole sentence a form of what is the sequence of these words and we use SGD for this say at etching time t you have like and face very have quicker and
and probabilistically they from which so we select make some noisy examples of the light the non-noise like 1 and we say she she should not be like part of this so Annex in computer loss for this pair of observers and noisy examples and you get this objective function so what we try to do is give message is like a lot of of the value of this form so given the probability which is the product sent directly sustained ends of piece of contexts are there and you should be given a score of 1 and human language quicker and she discourse should be 0 so if you update the value of the dollar because that depends on we can maximize the objective function as like the largest it's a like log likelihood and we can actually
a gradient ascent on top of it so we perform an update on the embeddings and we this process over and over again for different examples over the entire corpus and we come up with like a lookup table for voice and of vectors so we can define the dimensionality of a vector as a signal my slides that we use 100 as dimensions to represent that word and that's pretty for us so how do these
word embeddings actually look like or what you get actually done is
something like this so if the if you see seen the light of the border vectors like you protect these vectors in space what you find is like the vector from man and woman is not to be the distance from like in and we have you find this not just creation gender would ultimately use like what it's like walking and walked and swimming and some because you might have sentences that light and the guy the other person is walking in the person is running what's all he's runs would occur in the same context and this is what the learner actually catch the nicely and understand that we actually also have like some other informational features like countries and capitals like Spain and Madrid are it to me and long Germany in Berlin so these other country Gupta relationships but this is like a projection onto the scale using t-SNE you actually can see I mean it's the shot but you Natalie select some factors here at the bottom and up here at the top you have like me should hold of there's something like more or less a more objective identifiers and this is like the a projection of it and see if you see there are more of demand meaning words are actually closer in vector space and this is a very important property because if
you can try to leverage this and construct light sentence or document representation you probably get like similar documents and spaces right and that is what really embeddings addresses so the way we generate a praelector using these what vectors is like for the same query
since PC download the vector for each of these words of what we do is like we just don't use these 1 vectors as it gets at the time relevance and tell elements for us is that are sort a custom dresses that income up but actually what you see is like you get a school of what each started in the in the prairies so this tells us make sense is the most important relevant vote in the query because of the name or name mind Annex
what you do is you use this of some relevance and also objected to calculate the every vector average of these of of these vectors of what a weighted average actually means is like the given 2 vectors of 2 different words and their rates of the term relevance you do a number on average and you get like an average presentation of those words and effectively what we actually see upgrade vector is is this averaging presentation so given a vector and the relevance to get like this attitude variation and this represents operating records effectively at the end since PC download is nothing but this 100 dimensional vector and that is what we use as operating
but there are about 200 elements so we have 2 different modes of on the elements are usually it is the frequency of the words and you find that it's not a very good for scaling and also makes you use something like the idea of all of these size of nation's but would we you have used is something about 5 is like how given the number of queries link to a page many times that don't appear in those top 5 queries and that's a much better indication to us given the data on the data that have that began roughly say that the given the word statistics given this number and you need a napkin frequency and get something like a constant dominance and the relative 1 is actually sort of a normalization on the on that we have in mind so what we found out is like to normalize your scores across all the regions of the index of the vectors of slightly better I get survival results
and use all the data dependent on the computer monitor flight each time for sign next and for example this looks something like this for each load lovely features like
frequency document frequency and you have you can look at all this stuff and similarly for the other so I have
read what we actually create a Our is like a pretty electronics now given at a distance index which has all the documents you have all the queries and their vectors and we tried to apply a vector of so we cannot do this for all the queries that are just too many places so what we found out is like given all the pages are next actually just picked the top 5 queries which effectively represents the page and a dichotomous top readings and from the page models we can actually get the data so roughly become often like 465 million queries about which features and all the pages and index and we try to learn a query vectors for each 1 of them and if you just like done the whole system on this it's like on dates and what we actually have the problem now is like how do we get similar queries from these 465 million queries that given a user query finally the
closest 50 queries from this for 165 million so do we find radius should be used brute force it's too slow that
students will we cannot use hashing techniques that effectively because it's not very accurate for erectus because the vector of semantic units small loss in precision would lead to like you I results so what our solution actually required was the application of resigned similarity metric was somehow it should have this scale for like on justify million queries and take 10 seconds or less so the way you came up be the answer was something about approximate nearest neighbor vector model and they were actually pretty helpful for us so
what's the 1 that uses poly annoying it is steepest fighting rapidly that exist for this to Billy approximately this new models for all the vectors of queries that we have an ice actually use production of Spotify economic adjustment we can train on on the quantitative defined region and documents and ones but it's too slow because it is a sort of memory memory-intensive uh so what we do is like you don't train them all of them together we have a cluster very actually was these models along with a certain next so we train them understand what it like 46 million queries each and which centered on trees what these trees actually not explained next the size of the models like around 27 needs but shelters and expression that what you get off training which is like around 270 DigiFused if you just give it to 10 months and everything is stored in RAM because for us the most important thing is latency
and given searching 1 disaster happens pretty quickly made I assure them of our harvesting actually is used in production and then what at runtime which you try to do is like a query or the stands out simultaneously and then sort them based on what was and distances that you get the different parts of your shelves might have different process queries so eventually what you want is like you want is like the best representation of those queries which of those latin music right and really actually kind of nice but of course like 755 51 heuristic as homogeneous queries would be very good for the system that doesn't really like the decrease part of the recall or anything on it in that part of it this haven't used it in courses well so but but 1st I want to actually explain like how we actually
use annoying and how I actually works and it's 1 of my string books that you cannot use if you are using the vector calculus or like using like something vector-based approaches for your recall the ranking and you try to find out the nearest point to any great point in like a sub in sublinear time so what you're trying to find out like you cannot do it 1 by 1 so it's not work and what you want to do is like trying to do something and I can get that get those those queries and like global friend in the best case data structure for that is a tree so given my quality of predictors of represented by like each point the single great we try to find out is like say a given our
certain point which is the nearest point are like where you use the vector which is like the use of great at some random point on the state space find the nearest 1 so to train the model for us to bring that to what you do is like you
just look this at this time was space on because they're so use split take 2 points at random and spilled the space you do it again and then you get
something like a tree so you have like a certain segmentation of certain number of points in the cluster which are like in different parts of the tree you can
and you end up with a huge find to a nice point about this binary tree is like the points that are close to each other in space are more likely to be close to each other in the tree itself so if you're trying to navigate through a knowledge and you try to come up with like from
China and words that words to travel that you hold crimes would be composed of all the similar nodes in the vector space and this is a very important feature so how
do we search for a point in the in the tree and these splits that we have been so say that x and the rapid that x is our like user query vector and we try to find the courage of nearest vectors to this but vector and give me the phrase related to it so
what you do is like you end up with like search for a point and you just jump down a part of from language and you get like this 7 neighbours that you get and you use like of design metrics of how close it is a very close link between the 1 . 5 it's much much more grows up if it's more than 1 was an takes values between minus 2 into so many natural Saturday how close you vectors but have what the problem is like you winning feel like 7 neighbours coming what the 1 more neighbors or if you want more than 7 because of grace so what we use is something called we don't just navigate to 1 branch of the tree began to navigate to the 2nd branch of the tree and this is the maintenance of a proper priority queue and can actually tell us what the what the parts of the tree and get like these clues as vectors and so you don't and not only like to look at the right like you bought but also like a slightly darker part so you see below decided to treat it as that's the split up in fine have both of these sort of areas in hyperspace are like closer to the use of but sometimes you
find like because we didn't randomly what happens is like you can actually most of them some nice songs because you just across 2 different points so what you do is like to minimize this you train a forest of trees and actually looks something like this so you not only like train on like a sentence a sequence of our past let's let you randomise those across state entries so effectively your model on the stand configuration that once and searches for them in real time and by and this gives you like a pretty good representation is and you saw them and get like would greatly conditions you get like some good similarity between birth so we
train a forest of trees so 1 bad or like in this and it's in
my life maybe it's a feature of what is like it doesn't let you store string values but it actually allows you to store the indexes so you got to still like for a great things came easy download a given this like a unique annexes like 5 0 1 and that 1 restored the vector and not pretty annoying you get like an index that of all the indexes which are close to it so what we what we have that clicks and like event of the system call to the which is like a value index which is also responsible for the entire search index we found it is much better than our readers our anything to compare them in terms of reads and maintainability we developed in-house it's written in C + + and Python wrappers again and so it actually still is your index to is
annotation so what you effectively see there's no guarantee user great you generate a vector you such that in the online models those the clauses great as you get a nexus for these then you predict the next and get all the qualities and effectively you can fetch the pages for all the queries that are closest to the use of and this is how we improve what we call a and the results of the amazing in the sense that we get
much richer set of candidate pages of what's 1st but would like a higher possibility of expected pages on them and the reason it is going this way is because now they're going beyond synonyms and doing a simple fuzzy logic but actually using how that because I learned semantic and it's goes up sometimes but most of the time you'll find like there is a definite improvement because you have your this try to learn those words which are new the
context and the that's a very important feature and proteins and now match in real time using the
design vector similarity between gray gray vectors class using the past information chill techniques that you works and overall there's a recall improvement from previous studies that we had was around 5 to 7 per cent that's the improvement of the final and uninvested how much improvement in this and translated improvement in the final top 3 results from 1 % so that gives us a clear and their identification of
their these vectors actually not useful not and the systematic triggers only for those places you've never seen before so that's also like a very very important point here because but it seems varies like if we are moving you actually managed to a certain page your definition about it but for queries which are not seen before which are new to us which are not in next you have to go beyond the traditional techniques and this 1 can we actually have to so before I conclude like I actually wanted to show like what it does it actually looks like so this is like the explosion and
this is search page and we
have to have this that which comes up the idea of this was to the induced that whole step of search engine results page and you can actually get like directly to
operate so the libraries
are like Spotify and on which is again available and get have key which is the explicitness and get out that you'll actually find it's useful it's pretty active prejudiced way in but to that can be trained
using Jensen if you want to do look for prototype but I would recommend to use the loosely called because it's a it's a bit more optimizing the found there are certain variations in like the models that are developed because of the competition that we see that there are other clicks source products that you can
actually contribute to the you want to find the slides is it is
actually because that it's QE by and now they don't analyze last we might so before I conclude
and I just say this thing that we are still like looking on this system and we have like a course was enormously ready but you're trying to look up other approach deepening like using something Charter memory networks but the only downside of that approaches like that most of his user queries like keyword-based and you don't usually find people actually typing OK what is that
height of statue of liberty and those other thing to body I think such a body height and that sort of linguistic relationships but maybe the captured by stands there more complicated but the system is flexible enough to still give you pretty good results so we are trying to use this new method that we have into thinking and it's trying to we use this great debate similarity using document vectors there again using like
a sort of 4 differentiated listing among all like a paragraph to model and trying to also into a solid system for the US pages which are never played before so here we have a lot of most of these stages we try to find out what what would be the best way to represent those pages so you using vector traditional and approach something like this last part of the lease as it
thank you and I'll finish it is called which is given by John with with fit in 1957 they said you shall know a word by the company it keeps and Niccolò actually developed in 1
using the same contextual approach of words and it actually had us you because of its so thank you be without the questions yeah the yeah and yeah so 1 of the reasons had was like of the 1 like a unified view so we tried a lot of these key value stores assess tried registry we try like a traditional database we tried search with for the founders like money the a bit different in the sense that we sometimes have a vector and X we we need like our values should be a list of vectors sometimes it is just strings sometimes they're repeated strings were like you have the same is on data structures again and again so you can actually optimise it more if you can write those parts of the world of sense we started by doing that so that I think he has a much bigger project here and I'm not really the exploding but where I can say is like it has a lot of features and like you actually compress your keys it into a message that a sort of compression using said that was not and that gives you like a much school has a vector it's faster to index it's faster to read and scalable and done that we don't actually have to put this in memory began to still have a condensed and do a memory so you can still have like lots of data other can treat that what we we actually wanted and I use case was we wanted to reach to be optimized because we don't have rights at all you can come by and then except once and then what we want to times like user query and give data from the index for that keywords the so yeah and they all right so that means you're already talked about having no rights some without the base I was wondering how you handled having new new data new queries new data to train your embedding so uh I and that I would say In neural nearest neighboring that's years also from what over there are still no you know very well known nothing very and all the learner implementations of nearest neighbors but again you know just update to the next year or so it's true so what we do is like we have really cycle where combine each and my next at the moment and we have the automated get new brazen Newberry vectors with this that's not like a one-time system but it's too Alexei immediately of tomorrow I want to include like a set of results which are like new proteins for tomorrow I cannot do that but to address the same issue we have news so the news would allow actually handles the support and the most recent part of like say anything that is trending right now you have like in the new section so given like the concepts using the finite state what you want but was already available on Wikipedia before its so we actually have these concepts which are under gun from the beginning and that's what we use so you cannot understand the concept for the new words like some explain each xyz Genics word which comes in my word that comes up like tomorrow you probably have a but it's it's a very hard for me yeah would anyone else OK that's given began both


  436 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)