We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

#bbuzz: Fast scalable evaluation of ML models over large data sets using open source

00:00

Formal Metadata

Title
#bbuzz: Fast scalable evaluation of ML models over large data sets using open source
Title of Series
Number of Parts
48
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Modern solutions to search and recommendation require evaluating machine-learned models over large data sets with low latency. Producing the best results typically require combining fast (approximate) nearest neighbour search in vector spaces to limit candidates, filtering to surface only the appropriate subset of results in each case, and evaluation of more complex ML models such as deep neural nets computing over both vectors and semantic features. Combining these needs into a working and scalable solution is a large challenge as separate components solving for each requirement cannot be composed into a scalable whole for fundamental reasons. This talk will explain the architectural challenges of this problem, show the advantages of solving it on concrete cases and introduce an open source engine - Vespa.ai - that provides a scalable solution by implementing all the elements in a single distributed execution.
Open sourcePerformance appraisalScalabilityVirtual machineSet (mathematics)Endliche ModelltheorieVector graphicsSurface of revolutionToken ringTensorInformation retrievalScalar fieldComputing platformQuery languageScale (map)MathematicsModel theorySystem programmingData managementKolmogorov complexityComputer hardwareLogicEmailHypermediaContent (media)SubsetComputer networkWeb pageScalabilityRegulator geneVirtual machineExecution unitEndliche ModelltheoriePerformance appraisalRight angleMathematicsInferenceSubset10 (number)Goodness of fitInterrupt <Informatik>Product (business)2 (number)GradientINTEGRALSet (mathematics)ResultantFormal languageFilter <Stochastik>Vector spaceModel theoryVector graphicsCASE <Informatik>Physical systemReal numberSemiconductor memoryEinbettung <Mathematik>Decision theoryNetwork topologyMultiplication signFreewareWeightChannel capacityBefehlsprozessorLibrary (computing)Server (computing)Query languageAreaSingle-precision floating-point formatTensorSurface of revolutionToken ringOpen sourceLevel (video gaming)Scalar fieldMatching (graph theory)Information retrievalCombinational logicAutomationWeb 2.0AdditionCuboidSoftwareSearch engine (computing)LogicOnline helpComputing platformSimilarity (geometry)High availabilityHybrid computerField (computer science)Operator (mathematics)Sinc functionReferenzmodellPosition operatorNP-hardService (economics)Computer hardwareDifferent (Kate Ryan album)Local ringContent (media)DataflowAsynchronous Transfer ModePartition (number theory)Coefficient of determinationMobile WebChi-squared distributionComputer animationXMLUML
EmailHypermediaSubsetQuery languageContent (media)Computer networkWeb pageBefehlsprozessorComponent-based software engineeringConfiguration spaceEuclidean vectorEndliche ModelltheorieMusical ensembleImplementationAlgorithmAverageTerm (mathematics)Network topologyVector graphicsRankingPopulation densityOpen sourcePRINCE2Function (mathematics)MathematicsInverse trigonometric functionsCone penetration testModel theoryTensorSparse matrixVirtual machineExpressionNumberMixed realityInferenceDisintegrationHybrid computerUser profilePhase transitionPoint (geometry)Single-precision floating-point formatTranslation (relic)ExpressionLevel (video gaming)Cartesian coordinate systemQuery languageMultiplication signEndliche ModelltheorieDistanceRegular graphSet (mathematics)Revision controlCombinational logicCore dumpVirtual machineSummierbarkeitEinbettung <Mathematik>Java appletFerry CorstenPopulation densityMultilaterationNetwork topologyPhysical systemSystem administratorSampling (statistics)Front and back endsTensorSimilarity (geometry)Vector graphicsNumberOpen setWeb pageSoftwareMapping1 (number)Data modelVector spaceRankingField (computer science)Gene clusterDifferent (Kate Ryan album)AntiderivativeInferenceModel theoryOperator (mathematics)Data storage deviceProfil (magazine)AlgorithmPrice indexContent (media)CASE <Informatik>Open sourceKey (cryptography)Reverse engineeringDimensional analysisFilter <Stochastik>Process (computing)Sparse matrixBitConfiguration spaceMathematicsImplementationLogicProduct (business)Real-time operating systemScalabilityConnectivity (graph theory)WebsiteTunisGraph (mathematics)MultiplicationHybrid computerReal numberVideoconferencingTime zoneComputer virusFunctional (mathematics)BuildingStress (mechanics)Asynchronous Transfer ModeAuthorizationRight angleMathematical optimizationWeightAbstractionCountingComputer animationXMLProgram flowchart
Online helpComputer fileEndliche ModelltheoriePerformance appraisalScalabilitySimilarity (geometry)Open setImplementationOpen sourceComputer virusNavigationMedical imagingPhysical systemCheat <Computerspiel>Software frameworkPositional notationSanitary sewerModel theoryArtificial neural networkSeries (mathematics)ConvolutionComputer iconCAN busWorkloadComputer networkInformation and communications technologyVirtual machineMachine learningRankingAbstractionSoftware suitePersonal digital assistantCompilation albumComputer-assisted translationBeat (acoustics)Binary fileObservational studySubsetArchitectureBuildingBootingWebsiteRadical (chemistry)Decision tree learningComputer fontGraph (mathematics)Query languageArtificial intelligenceCNNAsynchronous Transfer ModeAutomationSoftware testingDigital signalComputer-generated imageryFast Fourier transformStatisticsNetwork topologyModal logicPredictionSerial portPole (complex analysis)TendonStress (mechanics)Electric currentComa BerenicesNormal (geometry)Connected spaceAverageBayesian networkExpert systemSpiralGroup actionRevision controlCore dumpWeb pageCartesian coordinate systemQuery languageCondition numberCombinational logicSoftware repositoryEuclidean vectorEinbettung <Mathematik>BitVector graphicsSimilarity (geometry)Matching (graph theory)Java appletDifferent (Kate Ryan album)ResultantDebuggerRight angleFront and back endsOpen sourceComputer animation
Real numberOpen sourceAbstractionSoftware frameworkNachinvariante UntergruppeNormal (geometry)AdditionPredictionPersonal digital assistantMedical imagingTendonSpiralCNNPairwise comparisonAsynchronous Transfer ModeStatisticsBayesian networkCoefficient of variationVirtual machineConvolutionView (database)Convex hullOpen setImplementationSimilarity (geometry)Computer virusEndliche ModelltheoriePerformance appraisalScalabilityLink (knot theory)Cartesian coordinate systemLatent heatPublic key certificateMobile WebCASE <Informatik>Field (computer science)Type theoryService (economics)Einbettung <Mathematik>Computer fileEndliche ModelltheorieNormal (geometry)Virtual machineConnectivity (graph theory)Profil (magazine)Dimensional analysisGene clusterMathematicsTensorModel theoryAdditionVector graphicsNumbering schemeBitSingle-precision floating-point formatCodeDrum memoryLine (geometry)Sampling (statistics)Level (video gaming)ChainProgram slicingEntrainment (chronobiology)Information retrievalQuery languageSpectrum (functional analysis)Data miningCore dumpDataflowExterior algebraDifferent (Kate Ryan album)Absolute valueLeakBuildingPhysical systemInferenceSoftwareOperator (mathematics)Open sourceCodeWeightGoodness of fitProduct (business)Electronic mailing listState of matterStructural loadCombinational logicTwitterScalabilitySoftware repositoryJava appletContent (media)Set (mathematics)Similarity (geometry)ResultantPerformance appraisalDistanceExpressionAsynchronous Transfer ModeMachine learningMultilaterationAmerican Vacuum SocietyComputer animation
View (database)ScalabilityEndliche ModelltheoriePerformance appraisalDirectory serviceEndliche ModelltheorieInheritance (object-oriented programming)Cartesian coordinate systemSampling (statistics)MereologyVector graphicsToken ringFormal languageBenchmarkInformation retrievalVirtual machineQuery languageCloud computingBitLink (knot theory)TensorAlgorithmResultantOperator (mathematics)Reduction of orderModel theoryComputer architectureCore dumpFeedbackVector spaceRight angleDampingProduct (business)Einbettung <Mathematik>MathematicsBuildingComputerArtificial neural networkQuicksortMixed realityMatrix (mathematics)Transformation (genetics)Closed setType theoryFlow separationFilter <Stochastik>WebsiteModelltransformation1 (number)Electronic mailing listEmailBus (computing)Musical ensembleWeight functionChord (peer-to-peer)Presentation of a groupSingle-precision floating-point formatPower (physics)Auditory maskingTable (information)Network topologyComputer animationMeeting/Interview
XMLUML
Transcript: English(auto-generated)
Hey, I'm Joonjotzett, the architect of respa.ai and I'll be talking about fast and scalable evaluation of machine learning models, especially over large datasets.
Many of you probably know, but if not, there's an ongoing revolution happening in search, aka search 2.0, where people are moving from text tokens and token based retrieval plus relevance using typically fairly small set of scalar features and gradient boosted
trees for machine learning for relevance to embedding both the queries and the documents in a vector space and doing retrieval by nearest neighbor in this vector space.
And then doing relevance typically using some variant of deep neural nets, which means using large tensors with maybe thousands or millions of features. In addition to this change happening in search, this technology on the right, the
vector embeddings and so on, is also used in another set of areas that are using typically the same technologies, such as recommendation, personalization, ad targeting, and so on. So while there are good technologies for each of these pieces that
you want to put together to make a solution on the right, it's hard to productionize it, as I'll talk about in a minute. So the pieces are typically some kind of search engine, library for doing approximate nearest neighbor search and some kind of model
server or library for evaluating these deep neural nets. Right. So each of these pieces are good. And if you're doing submitting to a Kaggle competition or something like that, it's
quite easy to put this together and make it work for your submission. And that's it. Right. But when you want to productionize it, you run into a bunch of challenges. Combining these things with good performance is challenging.
In practice, if you're doing a production solution, you typically want to combine nearest neighbor search with the query features because real solutions typically have filters. For example, if you are searching for news articles, you want to filter out some publications for some customers or some languages or countries or something like that.
And all kinds of solutions have similar business needs. Right. And in addition, while this revolution is ongoing and text embeddings and so on are becoming better, typically you get good results only by
combining these embedding techniques with traditional text search, BM25 and so on. So you need a kind of hybrid where you do typically do text matching based on both
neural nets and a traditional text search and combine features of both and return that and that gives you the best results. Right. So how do you combine these things? Right. You need to do the text search or the search or filters in some way in the
search engine and then use nearest neighbor library to search for the nearest neighbors. And then you need to combine the results somehow. If you just do that naively, it will be very expensive because you're doing two different searches that may give you completely disjunct results and then you need to combine them
somehow. And also, how do you know that you're asking for enough to actually be able to combine these into single results? That's another hard problem. If your filters filters out 99% of all the documents, then you probably won't find enough
matches in your nearest neighbor search to even return a result. So this is a hard problem and productionizing it is also challenging because in a production service, you need things like sustained real-time updates, including
removal of documents and so on, which is libraries typically don't do very well, but that doesn't matter for competitions and so on, but it matters a lot for real production systems. And you also need reasonably fast restart times. So the libraries that are only doing this stuff in memory and not persisting won't
really work well in practice. Some of them do and some of them don't. If you have disjunct systems for this and you update them separately, then you also
need to deal with the case where the update succeeds in one and not the other and they diverge over time and so on. Lastly, scaling these solutions is also pretty difficult. What this means is when you scale to more data or more CPU per query, you need to
partition your content and spread it over many nodes. Now you have the problem that you can't really run the model inference that you want to do on a subset of your results on a different node because that will quickly
saturate your network. For example, if you have a 10 GB network, then you can only do a thousand docs per query if you have vectors with 500 floats. If you do that, your total capacity will be 300 queries per second and adding more
nodes won't help because you saturate network backbone. So model servers won't really help you anymore. So you need some kind of lower level inference library that you need to integrate on all of the content partitions, which is a lot more work and more challenging.
You have the same kind of problem with approximate nearest neighbor integration. So how do you solve all this? Well, one way to do it is to just use Vespa AI, which is an open source platform
that supports all of these things out of the box. It started as a web search engine a long time ago. So it has all the traditional text search features, text-based relevance with positions, linguistic, stemming, and so on. VM25, weak AND operator, which is important for scaling text search over tokens,
automized support for gradient decision trees, text snippetting that you want to do in text search and so on, but it also has support for nearest neighbor search and approximate
nearest neighbor search in vector spaces, support for adding tensor data to your documents and queries and doing tensor mathematics, integration with ONX and TensorFlow to import complex machine learning models directly and running them on the content
nodes. So you get this scaling I just talked about for free. And you can combine all these features in a single query and in a single relevance model so you can get the best of both worlds and experiment and so on with these different features. And lastly, it's for high availability production systems.
So you can change the hardware, change the machine learning models, change the data and logic and so on while you're serving and writing without interruption. And these systems are, I mean, Vespa is built to scale to hundreds of billions of
documents, hundreds of thousands of queries per second, and can typically do a couple of tens of thousands of writes per node per second sustained. And that includes writes that remove documents, change fields, all of these things. So I won't be talking too much about Vespa itself, but mention some of its usages
so that you can be assured it's a real production system. Use it extensively out the company that employs me with this, which is Verizon Media. You're serving over a billion users with Vespa, about 350,000 queries per second.
And some of the use cases are delivering personalized content to all the users that visits the Yahoo pages and so on, which means evaluating a burn show while being
doing all the things I just talked about, really, where we map the user to a vector space and do a vector search to come up with the best articles and then run machine learn models to fine tune what you're returning and so on.
And we do that for every user that is visiting one of these sites in real time when they are loading the page. And we are doing the same kind of thing on the ad network owned by the company, which is the third largest in the work, where we're doing similar things, but even more complex
because you take bidding into account and all of that runs on Vespa and is serving in real time. So just a quick overview of Vespa. It's a two-tier system. You have a stateless Java container on top that handles the incoming queries and writes
and so on, or you can have multiple different container clusters if you like. Below that, you have content clusters that stores the actual contents, maintain reverse indices for texts, indices for vector, nearest neighbor searching, and so on, which is doing
all the distributed query execution, including finding the matches, evaluating machine learn models, and so on. Because these systems can contain many nodes, many processes and so on, we also have an administration and config cluster that sets up and manages these nodes for you.
And what the user is seeing is a more high-level abstraction, which we call an application package, which I'll show you an example of later. The application package basically describes the system that you want to run and contains any Java components that you want to run, the machine learn models, and so on.
And when you work with the application, you just change the application package and deploy it, and the system will safely carry out the change from the currently running system to the system described by the new version of the application package.
So we typically do this in a CD fashion where we have a process that pulls from GitHub or whatever you're using, building the application package, and just submitting it to this, and it will roll it out safely in production. So how does approximate nearest neighbor searches work on Vespa?
For the user, it's just another query item that you can combine with any others in the query tree. So you can combine text search and nearest neighbor in the same query and even have multiple nearest neighbor operators or different fields or whatever in the same query.
The approximate nearest neighbor implementation we use is based on the HNSV algorithm, which is a network algorithm, which is the fastest algorithms generally. We have our own implementation that delivers
on the needs I talked about earlier, like supporting removal of nodes from the graph and so on. And it also works efficiently with other query terms, so we can combine it with filters and so on and still do an efficient approximate nearest neighbor search.
How does model inference work in Vespa? So Vespa has a tensor data model where you can add tensors to both documents and queries and the application package.
So a tensor is just a multi-dimensional collection of numbers. Each of the dimensions can be sparse or dense, and you can combine sparse and dense dimensions in the same tensor, as I show an example here where you have a two-dimensional
tensor with a sparse key and a dense vector, so it's really a map of vectors. Then you can do tensor map to express machine learning models or business logic over these
tensors. There's a small set of core operations, which we use in our tensor engine for optimization, and then we have a larger set of higher level functions, which are the ones you will typically use in your models, but which maps to those primitive functions that we
have join and map and so on, which is quite neat, but not that interesting for users I get, you just use the high level methods. Or if you don't want to write your expressions by hand, you can just deploy tensorflow or ONX or XGBoost or light GBM models directly
in Vespa, and Vespa will do the translation automatically when you deploy the model. So we have our own tensor execution engine inside Vespa that is optimized for repeated execution or the models or many data items, which is what you typically want to do in
these kind of systems, right? You're not just evaluating over a single data point per query, but you're evaluating over many data points, articles or movies or whatever it is. And just to show a quick example of the hybrid model thing I talked about earlier, what we
see really almost every time when we look at the performance we get out of these various models is that you don't get the best models by using either some traditional texts features or by
using a neural net model, but you get the very best performance by combining both. And here's where we have some traditional text features here in one regular rank profile and another
rank profile, which is just the distance in the vector space for this embedding. And then we have a hybrid model, which is just a sum of both things and that outperforms the other two. So it's a very simple example because it's from one of our sample applications, but it illustrates these points. So I'm going to go to another example application a bit more
in depth, and I've chosen to application that we call Core 19 with Respa.ai. When the pandemic broke out, the Allen Institute released a data set of initially 40,000
or about 138,000 papers about the coronavirus, at least related to coronavirus somehow. And my team turned around to take a week or two out to build a tool to help exploring
this data set so that researchers could more quickly do science to learn about the new disease, which seemed like an important thing to do at the time. So this combined traditional
text search features with article similarity search and also grouping and filtering, which is something you typically want to do when you do exploration. And here, everything is open, both the data sets and also the Respa itself, but also the
Respa application that implements the Core 19 application, as well as the frontend that we built on top. So that's the advantage of this. It's an open data set and everything is open source. The disadvantage is that the data set is very small, just 130,000 articles, but
Respa scales to about a million times as much content without really changing anything other than adding more nodes because you need more resources for that, obviously. So let me
exit the presentation and show you the Core 19 application, how it works. So this is the front page. Here you can write a query as you would expect, but I'll just click on one of these now.
This one, for example. So this is a rather complex query and you get results as you would expect. And here you see all the matches that you get in various sources, journals, and so on. So this is a grouping feature in Respa. And then you can also do a search
for similar articles here. And what you're doing then is adding this related to term to the query, which is just picked up by a custom Java component in this application that fetches that article from Respa, fetches the embedding vector of that article, and then
adds that embedding vector to the query that is then sent down to get combination of the text features that you added in the query here to, and there is neighbor search
over that article. So you get the combination of both. And that's very useful when you are exploring, because you have an article that represents somehow the topic you're interested in, and then you combine that with text search features that more precisely expresses
conditions on what you're interested in. You can also enter the article itself, which is just served from Respa as well. And here you can also do a similar article search by different embedding vectors that are provided and things like that.
Okay, so how is this implemented? Let's go into it a bit more in detail. So this is the GitHub repo for the frontend, and we have a separate repo for the backend, which is,
oh, sorry, wrong link, for the Respa application, which is an example of an application package, which I mentioned before. So this is the repo for the Respa application.
I'll go through what it contains, but first I have it checked out here. So I checked out this repo and go to source, and there you can see the size of the whole thing. So it contains a light GBM model that we have been experimenting with. So there's,
that's a lot of lines of code, but of course, auto-generated by that machine learning. But apart from that is just about 600 lines of code implementing this entire Core 19
application, which will scale to any size you want. And you can combine vector similarity and text search, snippeting, grouping, and aggregation, and all of these things. So let's look at what it actually contains. So you have the application itself,
which basically contains these two files, a services file, which describes the clusters that you want to run. In this case, we run one of these stateless Java container clusters and one content cluster that holds the content. In the container cluster, we have
some custom Java components, which we'll take a quick look at later. And then we just specify the resources that this cluster should run on. This runs on the public RESTful cloud, and then we can just specify the resources we will have and deploy it,
and the system will get those resources on AVS and run it. In this case, we just specify the sources of each node, and then we say we want from two to four nodes, depending on the load we are seeing. For the content cluster, there's a little bit of tuning here and also tuning all
the snippets. And apart from that, we just reference the single schema that we use for the documents, and again, specifying the resources, and that's it. And then we have a deployment
XML, which specifies where these should run, and this just runs in a single AVS region. If you are self-hosting, Respa is really the same thing, but instead of saying this, you just list the actual hosts that you want to run for this cluster here.
That's the only difference. And then you don't need the deployment XML file. So what else is here? There's the machine-learned light GBM model, and some certificates, and a specification of
the stuff you can send in the query, which is these embedding vectors. And then there's the single schema that we use that describes the data we have here, which is a single type representing the scholarly article itself. So it has a bunch of fields,
as you would expect, with the title, the content itself, the citations, and whatnot. And in addition, some embedding vectors. So we have an embedding vector for
the abstract for the title, and then we have another embedding vector, which is supplied by the alum institute team, which is called the Spectre embedding. And those are all single-dimension dense tensors. The scheme also describes how we can rank or alternatively evaluate machine-learned
profile. So there's a bunch of those here. I won't go into them in detail, but there's one that just do normal text features, one that use BM25, which is also with the normal text features.
And then we have one that uses the light GBM model. And this could also be combined with other features and expressions and tensors and whatnot, because all of this is just
math, as you can see here. You can say plus the light GBM model here or whatever. We also have some models that are used for the listed searchers, where we just access what
we call the raw score of this embedded vector nearest neighbor search, which will return a distance. And that's really all you need to create an application. In addition, we have some custom Java code here to implement the stuff I mentioned around searching for
related articles. So we call these components that can intercept the query and or the results a searcher. They just implement a single method, which is the search method, which gets the query
and returns the results. In this case, it's just looking to see if there is one of these related to items in a query. If not, it just returns, which means it does nothing and you have a normal search. Otherwise, it's translating that related to item to the approximate nearest neighbor
operator, which is in a subclass. Let's take a quick look at that as well. Here we can see that, as you can see, it's just to new a nearest neighbor item. Here we don't
allow approximate nearest neighbor because the data set is so small. So that's the only thing you would change over than adding resources. If you wanted to scale to a billion documents, you would definitely set allow approximate true. But other than that, everything
would be the same. The great item method is used up here, where we combine it with the other items in a query. And here you can also see an example where we create two
NMR nearest neighbor items and combine them with OR to search for nearest neighbors, both in the abstract and the title. Things like that you can do freely and it just works.
That's all I really wanted to cover and that's really all there is in this application. You can easily check this out yourself. If you go to github.com, LESPA engine sample applications, you can find it there. Or just go to Core 19 and click the open source link on top.
To wrap up, vector-based retrieval and tensor-based relevance, which is one way to look
at these deep neural networks, at least when you want to just do inference, is emerging as an alternative to traditional search. And it's also already the state of art for recommendation, personalization and targeting and so on. But productionizing these methods
on their own, even though there are good tools for each of the pieces, it's hard to combine them to a production quality system that has good performance in all cases and sustain good performance as you make changes.
And that you can combine with filtering and traditional search and so on, which also is operable and scalable. So if you don't want to do all that work, you can just try out vespa.ai, which provides all of it in a single integrated solution with better performance
than you would get, for sure, by combining these pieces on your own. And you can find Vespa at vespa.ai. So that's all. Then we can switch to live and take questions.
So thanks, John, for the great presentation. COD 19 search seems super useful, so all the best with that. Guys, we still have a couple of minutes. If you have any questions, please ask them on the Slack channel. I guess, John, you already provided the link to the GitHub
site. Maybe while we are waiting, do you have any further feedback on how you plan to evolve Vespa? What's the roadmap? So where we're spending most of our efforts right now really is on the cloud service for all the applications that are using it.
In my company, we provide a cloud service and we just very recently started providing that cloud service to external customers as well. So we are mostly focusing on
making that more broadly available, adding more features for making it cheaper to run and things like that. So we do seem to have one question. The question is from Edward.
It's basically... Oh, sorry. Okay. So there's one more before from Maya. She would like to understand how can we build vector embeddings for articles? I guess it's more like, how can you add them? Yeah, I think maybe the question is how to
come up with the vectors. So that's the machine learning part really. And that's somebody's else's problem as far as we're concerned. We just make it fast to retrieve them and compute with them once you have created the vectors. But how you create the embeddings,
that's the machine learning part, which typically happens outside this. We have another question. So it's basically from Edward and he's asking if he... So does the Vespa architecture allow plugging
new artificial neural network algorithms? So basically how extensible is the architecture? So the tensor language we have allows you to express pretty much all the models.
I've seen recently, when people came up with birch type models, transformer models with lots of matrixes and so on, we had to extend the tensor math language a bit. But apart from that, it should handle all kinds of models. You would come up with what we have there already,
because the core operations that I mentioned like map and reduce and so on are very general. So you can pretty much implement all kinds of computations over tensors on top of them.
And I think there was one more question as well. Yeah. So actually I think Edward has a follow-up question. I'm not sure I understand all the... Wait, I see it again. But you see it as well, maybe if you can. Yeah. If we can plug in other approximate nearest neighbor search algorithms into Vespa.
No, you cannot, not without lots and lots of work. It's basically what you have been doing for about six months now. It's to plug in one algorithm for this into Vespa, which means implementing it in C++ so that it works with the rest of the engine and supports all the
operations that we need to support with the high throughput, including removal of documents and so on. And most of these algorithms don't handle this very well. So I don't think it work well for production to plug something in. You need to implement it from scratch with all
these requirements taken into account if it's more than for experimenting. But I think we have chosen the right algorithm for this now. So I don't think there's a great need to plug in something else, to be honest. Right. And there's another question by Maya also related to embeddings and text retrieval features.
I think part of answer that maybe if you just want to comment a bit more. Yeah. Do you include them in a single model, she's asking. Yeah. So combining embeddings with text retrieval features, there's two parts to it.
Right. One is the retrieval. You want to retrieve both the nearest neighbor to some vector, but you also want to retrieve the documents that are not near neighbors, but are matching the same tokens. So we want to retrieve a mix of both. And that's sort of logically easy, but difficult to do efficiently because when you want to do it efficiently,
you want to evaluate both things in parallel really and taking filters into account and so on. So that's another reason why you need to integrate this deep into engine to make it efficient. But we have done that. So when you're using it, it's just you create the nearest
neighbor item or several items in the query tree, and you can just combine it with and, and or, and so on with text items. And the other part is relevance. And as you saw there in one model that we actually got pretty good results from, we just added
together the closeness in vector space with some simple text features. So features like the N25 or whatever, and just add them together, probably awaited some. It's fine. So do you have this benchmarking result somewhere?
Yeah, we do. Yeah, it's part of a sample application that we provide so you can run the whole thing yourself. Actually, if you look in the parent directory of the thing I shared earlier, you'll find all the sample applications with benchmarks as well.
Super. Great. I don't think I don't see any other questions. So thanks again, John. And so everyone, we can of course continue the discussion in the breakout channels. So basically the vbuzz2. And yeah, thanks again for the presentation and have a nice evening.