#bbuzz: Fast scalable evaluation of ML models over large data sets using open source
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 48 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68793 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202036 / 48
14
19
26
43
47
00:00
Open sourcePerformance appraisalScalabilityVirtual machineSet (mathematics)Endliche ModelltheorieVector graphicsSurface of revolutionToken ringTensorInformation retrievalScalar fieldComputing platformQuery languageScale (map)MathematicsModel theorySystem programmingData managementKolmogorov complexityComputer hardwareLogicEmailHypermediaContent (media)SubsetComputer networkWeb pageScalabilityRegulator geneVirtual machineExecution unitEndliche ModelltheoriePerformance appraisalRight angleMathematicsInferenceSubset10 (number)Goodness of fitInterrupt <Informatik>Product (business)2 (number)GradientINTEGRALSet (mathematics)ResultantFormal languageFilter <Stochastik>Vector spaceModel theoryVector graphicsCASE <Informatik>Physical systemReal numberSemiconductor memoryEinbettung <Mathematik>Decision theoryNetwork topologyMultiplication signFreewareWeightChannel capacityBefehlsprozessorLibrary (computing)Server (computing)Query languageAreaSingle-precision floating-point formatTensorSurface of revolutionToken ringOpen sourceLevel (video gaming)Scalar fieldMatching (graph theory)Information retrievalCombinational logicAutomationWeb 2.0AdditionCuboidSoftwareSearch engine (computing)LogicOnline helpComputing platformSimilarity (geometry)High availabilityHybrid computerField (computer science)Operator (mathematics)Sinc functionReferenzmodellPosition operatorNP-hardService (economics)Computer hardwareDifferent (Kate Ryan album)Local ringContent (media)DataflowAsynchronous Transfer ModePartition (number theory)Coefficient of determinationMobile WebChi-squared distributionComputer animationXMLUML
09:44
EmailHypermediaSubsetQuery languageContent (media)Computer networkWeb pageBefehlsprozessorComponent-based software engineeringConfiguration spaceEuclidean vectorEndliche ModelltheorieMusical ensembleImplementationAlgorithmAverageTerm (mathematics)Network topologyVector graphicsRankingPopulation densityOpen sourcePRINCE2Function (mathematics)MathematicsInverse trigonometric functionsCone penetration testModel theoryTensorSparse matrixVirtual machineExpressionNumberMixed realityInferenceDisintegrationHybrid computerUser profilePhase transitionPoint (geometry)Single-precision floating-point formatTranslation (relic)ExpressionLevel (video gaming)Cartesian coordinate systemQuery languageMultiplication signEndliche ModelltheorieDistanceRegular graphSet (mathematics)Revision controlCombinational logicCore dumpVirtual machineSummierbarkeitEinbettung <Mathematik>Java appletFerry CorstenPopulation densityMultilaterationNetwork topologyPhysical systemSystem administratorSampling (statistics)Front and back endsTensorSimilarity (geometry)Vector graphicsNumberOpen setWeb pageSoftwareMapping1 (number)Data modelVector spaceRankingField (computer science)Gene clusterDifferent (Kate Ryan album)AntiderivativeInferenceModel theoryOperator (mathematics)Data storage deviceProfil (magazine)AlgorithmPrice indexContent (media)CASE <Informatik>Open sourceKey (cryptography)Reverse engineeringDimensional analysisFilter <Stochastik>Process (computing)Sparse matrixBitConfiguration spaceMathematicsImplementationLogicProduct (business)Real-time operating systemScalabilityConnectivity (graph theory)WebsiteTunisGraph (mathematics)MultiplicationHybrid computerReal numberVideoconferencingTime zoneComputer virusFunctional (mathematics)BuildingStress (mechanics)Asynchronous Transfer ModeAuthorizationRight angleMathematical optimizationWeightAbstractionCountingComputer animationXMLProgram flowchart
19:06
Online helpComputer fileEndliche ModelltheoriePerformance appraisalScalabilitySimilarity (geometry)Open setImplementationOpen sourceComputer virusNavigationMedical imagingPhysical systemCheat <Computerspiel>Software frameworkPositional notationSanitary sewerModel theoryArtificial neural networkSeries (mathematics)ConvolutionComputer iconCAN busWorkloadComputer networkInformation and communications technologyVirtual machineMachine learningRankingAbstractionSoftware suitePersonal digital assistantCompilation albumComputer-assisted translationBeat (acoustics)Binary fileObservational studySubsetArchitectureBuildingBootingWebsiteRadical (chemistry)Decision tree learningComputer fontGraph (mathematics)Query languageArtificial intelligenceCNNAsynchronous Transfer ModeAutomationSoftware testingDigital signalComputer-generated imageryFast Fourier transformStatisticsNetwork topologyModal logicPredictionSerial portPole (complex analysis)TendonStress (mechanics)Electric currentComa BerenicesNormal (geometry)Connected spaceAverageBayesian networkExpert systemSpiralGroup actionRevision controlCore dumpWeb pageCartesian coordinate systemQuery languageCondition numberCombinational logicSoftware repositoryEuclidean vectorEinbettung <Mathematik>BitVector graphicsSimilarity (geometry)Matching (graph theory)Java appletDifferent (Kate Ryan album)ResultantDebuggerRight angleFront and back endsOpen sourceComputer animation
21:23
Real numberOpen sourceAbstractionSoftware frameworkNachinvariante UntergruppeNormal (geometry)AdditionPredictionPersonal digital assistantMedical imagingTendonSpiralCNNPairwise comparisonAsynchronous Transfer ModeStatisticsBayesian networkCoefficient of variationVirtual machineConvolutionView (database)Convex hullOpen setImplementationSimilarity (geometry)Computer virusEndliche ModelltheoriePerformance appraisalScalabilityLink (knot theory)Cartesian coordinate systemLatent heatPublic key certificateMobile WebCASE <Informatik>Field (computer science)Type theoryService (economics)Einbettung <Mathematik>Computer fileEndliche ModelltheorieNormal (geometry)Virtual machineConnectivity (graph theory)Profil (magazine)Dimensional analysisGene clusterMathematicsTensorModel theoryAdditionVector graphicsNumbering schemeBitSingle-precision floating-point formatCodeDrum memoryLine (geometry)Sampling (statistics)Level (video gaming)ChainProgram slicingEntrainment (chronobiology)Information retrievalQuery languageSpectrum (functional analysis)Data miningCore dumpDataflowExterior algebraDifferent (Kate Ryan album)Absolute valueLeakBuildingPhysical systemInferenceSoftwareOperator (mathematics)Open sourceCodeWeightGoodness of fitProduct (business)Electronic mailing listState of matterStructural loadCombinational logicTwitterScalabilitySoftware repositoryJava appletContent (media)Set (mathematics)Similarity (geometry)ResultantPerformance appraisalDistanceExpressionAsynchronous Transfer ModeMachine learningMultilaterationAmerican Vacuum SocietyComputer animation
30:52
View (database)ScalabilityEndliche ModelltheoriePerformance appraisalDirectory serviceEndliche ModelltheorieInheritance (object-oriented programming)Cartesian coordinate systemSampling (statistics)MereologyVector graphicsToken ringFormal languageBenchmarkInformation retrievalVirtual machineQuery languageCloud computingBitLink (knot theory)TensorAlgorithmResultantOperator (mathematics)Reduction of orderModel theoryComputer architectureCore dumpFeedbackVector spaceRight angleDampingProduct (business)Einbettung <Mathematik>MathematicsBuildingComputerArtificial neural networkQuicksortMixed realityMatrix (mathematics)Transformation (genetics)Closed setType theoryFlow separationFilter <Stochastik>WebsiteModelltransformation1 (number)Electronic mailing listEmailBus (computing)Musical ensembleWeight functionChord (peer-to-peer)Presentation of a groupSingle-precision floating-point formatPower (physics)Auditory maskingTable (information)Network topologyComputer animationMeeting/Interview
38:02
XMLUML
Transcript: English(auto-generated)
00:11
Hey, I'm Joonjotzett, the architect of respa.ai and I'll be talking about fast and scalable evaluation of machine learning models, especially over large datasets.
00:25
Many of you probably know, but if not, there's an ongoing revolution happening in search, aka search 2.0, where people are moving from text tokens and token based retrieval plus relevance using typically fairly small set of scalar features and gradient boosted
00:49
trees for machine learning for relevance to embedding both the queries and the documents in a vector space and doing retrieval by nearest neighbor in this vector space.
01:03
And then doing relevance typically using some variant of deep neural nets, which means using large tensors with maybe thousands or millions of features. In addition to this change happening in search, this technology on the right, the
01:25
vector embeddings and so on, is also used in another set of areas that are using typically the same technologies, such as recommendation, personalization, ad targeting, and so on. So while there are good technologies for each of these pieces that
01:46
you want to put together to make a solution on the right, it's hard to productionize it, as I'll talk about in a minute. So the pieces are typically some kind of search engine, library for doing approximate nearest neighbor search and some kind of model
02:05
server or library for evaluating these deep neural nets. Right. So each of these pieces are good. And if you're doing submitting to a Kaggle competition or something like that, it's
02:21
quite easy to put this together and make it work for your submission. And that's it. Right. But when you want to productionize it, you run into a bunch of challenges. Combining these things with good performance is challenging.
02:40
In practice, if you're doing a production solution, you typically want to combine nearest neighbor search with the query features because real solutions typically have filters. For example, if you are searching for news articles, you want to filter out some publications for some customers or some languages or countries or something like that.
03:06
And all kinds of solutions have similar business needs. Right. And in addition, while this revolution is ongoing and text embeddings and so on are becoming better, typically you get good results only by
03:24
combining these embedding techniques with traditional text search, BM25 and so on. So you need a kind of hybrid where you do typically do text matching based on both
03:42
neural nets and a traditional text search and combine features of both and return that and that gives you the best results. Right. So how do you combine these things? Right. You need to do the text search or the search or filters in some way in the
04:03
search engine and then use nearest neighbor library to search for the nearest neighbors. And then you need to combine the results somehow. If you just do that naively, it will be very expensive because you're doing two different searches that may give you completely disjunct results and then you need to combine them
04:22
somehow. And also, how do you know that you're asking for enough to actually be able to combine these into single results? That's another hard problem. If your filters filters out 99% of all the documents, then you probably won't find enough
04:44
matches in your nearest neighbor search to even return a result. So this is a hard problem and productionizing it is also challenging because in a production service, you need things like sustained real-time updates, including
05:03
removal of documents and so on, which is libraries typically don't do very well, but that doesn't matter for competitions and so on, but it matters a lot for real production systems. And you also need reasonably fast restart times. So the libraries that are only doing this stuff in memory and not persisting won't
05:26
really work well in practice. Some of them do and some of them don't. If you have disjunct systems for this and you update them separately, then you also
05:40
need to deal with the case where the update succeeds in one and not the other and they diverge over time and so on. Lastly, scaling these solutions is also pretty difficult. What this means is when you scale to more data or more CPU per query, you need to
06:03
partition your content and spread it over many nodes. Now you have the problem that you can't really run the model inference that you want to do on a subset of your results on a different node because that will quickly
06:22
saturate your network. For example, if you have a 10 GB network, then you can only do a thousand docs per query if you have vectors with 500 floats. If you do that, your total capacity will be 300 queries per second and adding more
06:44
nodes won't help because you saturate network backbone. So model servers won't really help you anymore. So you need some kind of lower level inference library that you need to integrate on all of the content partitions, which is a lot more work and more challenging.
07:05
You have the same kind of problem with approximate nearest neighbor integration. So how do you solve all this? Well, one way to do it is to just use Vespa AI, which is an open source platform
07:20
that supports all of these things out of the box. It started as a web search engine a long time ago. So it has all the traditional text search features, text-based relevance with positions, linguistic, stemming, and so on. VM25, weak AND operator, which is important for scaling text search over tokens,
07:49
automized support for gradient decision trees, text snippetting that you want to do in text search and so on, but it also has support for nearest neighbor search and approximate
08:02
nearest neighbor search in vector spaces, support for adding tensor data to your documents and queries and doing tensor mathematics, integration with ONX and TensorFlow to import complex machine learning models directly and running them on the content
08:21
nodes. So you get this scaling I just talked about for free. And you can combine all these features in a single query and in a single relevance model so you can get the best of both worlds and experiment and so on with these different features. And lastly, it's for high availability production systems.
08:42
So you can change the hardware, change the machine learning models, change the data and logic and so on while you're serving and writing without interruption. And these systems are, I mean, Vespa is built to scale to hundreds of billions of
09:01
documents, hundreds of thousands of queries per second, and can typically do a couple of tens of thousands of writes per node per second sustained. And that includes writes that remove documents, change fields, all of these things. So I won't be talking too much about Vespa itself, but mention some of its usages
09:25
so that you can be assured it's a real production system. Use it extensively out the company that employs me with this, which is Verizon Media. You're serving over a billion users with Vespa, about 350,000 queries per second.
09:48
And some of the use cases are delivering personalized content to all the users that visits the Yahoo pages and so on, which means evaluating a burn show while being
10:06
doing all the things I just talked about, really, where we map the user to a vector space and do a vector search to come up with the best articles and then run machine learn models to fine tune what you're returning and so on.
10:21
And we do that for every user that is visiting one of these sites in real time when they are loading the page. And we are doing the same kind of thing on the ad network owned by the company, which is the third largest in the work, where we're doing similar things, but even more complex
10:41
because you take bidding into account and all of that runs on Vespa and is serving in real time. So just a quick overview of Vespa. It's a two-tier system. You have a stateless Java container on top that handles the incoming queries and writes
11:00
and so on, or you can have multiple different container clusters if you like. Below that, you have content clusters that stores the actual contents, maintain reverse indices for texts, indices for vector, nearest neighbor searching, and so on, which is doing
11:23
all the distributed query execution, including finding the matches, evaluating machine learn models, and so on. Because these systems can contain many nodes, many processes and so on, we also have an administration and config cluster that sets up and manages these nodes for you.
11:44
And what the user is seeing is a more high-level abstraction, which we call an application package, which I'll show you an example of later. The application package basically describes the system that you want to run and contains any Java components that you want to run, the machine learn models, and so on.
12:05
And when you work with the application, you just change the application package and deploy it, and the system will safely carry out the change from the currently running system to the system described by the new version of the application package.
12:20
So we typically do this in a CD fashion where we have a process that pulls from GitHub or whatever you're using, building the application package, and just submitting it to this, and it will roll it out safely in production. So how does approximate nearest neighbor searches work on Vespa?
12:44
For the user, it's just another query item that you can combine with any others in the query tree. So you can combine text search and nearest neighbor in the same query and even have multiple nearest neighbor operators or different fields or whatever in the same query.
13:04
The approximate nearest neighbor implementation we use is based on the HNSV algorithm, which is a network algorithm, which is the fastest algorithms generally. We have our own implementation that delivers
13:22
on the needs I talked about earlier, like supporting removal of nodes from the graph and so on. And it also works efficiently with other query terms, so we can combine it with filters and so on and still do an efficient approximate nearest neighbor search.
13:46
How does model inference work in Vespa? So Vespa has a tensor data model where you can add tensors to both documents and queries and the application package.
14:04
So a tensor is just a multi-dimensional collection of numbers. Each of the dimensions can be sparse or dense, and you can combine sparse and dense dimensions in the same tensor, as I show an example here where you have a two-dimensional
14:24
tensor with a sparse key and a dense vector, so it's really a map of vectors. Then you can do tensor map to express machine learning models or business logic over these
14:42
tensors. There's a small set of core operations, which we use in our tensor engine for optimization, and then we have a larger set of higher level functions, which are the ones you will typically use in your models, but which maps to those primitive functions that we
15:03
have join and map and so on, which is quite neat, but not that interesting for users I get, you just use the high level methods. Or if you don't want to write your expressions by hand, you can just deploy tensorflow or ONX or XGBoost or light GBM models directly
15:24
in Vespa, and Vespa will do the translation automatically when you deploy the model. So we have our own tensor execution engine inside Vespa that is optimized for repeated execution or the models or many data items, which is what you typically want to do in
15:42
these kind of systems, right? You're not just evaluating over a single data point per query, but you're evaluating over many data points, articles or movies or whatever it is. And just to show a quick example of the hybrid model thing I talked about earlier, what we
16:04
see really almost every time when we look at the performance we get out of these various models is that you don't get the best models by using either some traditional texts features or by
16:23
using a neural net model, but you get the very best performance by combining both. And here's where we have some traditional text features here in one regular rank profile and another
16:40
rank profile, which is just the distance in the vector space for this embedding. And then we have a hybrid model, which is just a sum of both things and that outperforms the other two. So it's a very simple example because it's from one of our sample applications, but it illustrates these points. So I'm going to go to another example application a bit more
17:09
in depth, and I've chosen to application that we call Core 19 with Respa.ai. When the pandemic broke out, the Allen Institute released a data set of initially 40,000
17:28
or about 138,000 papers about the coronavirus, at least related to coronavirus somehow. And my team turned around to take a week or two out to build a tool to help exploring
17:46
this data set so that researchers could more quickly do science to learn about the new disease, which seemed like an important thing to do at the time. So this combined traditional
18:00
text search features with article similarity search and also grouping and filtering, which is something you typically want to do when you do exploration. And here, everything is open, both the data sets and also the Respa itself, but also the
18:20
Respa application that implements the Core 19 application, as well as the frontend that we built on top. So that's the advantage of this. It's an open data set and everything is open source. The disadvantage is that the data set is very small, just 130,000 articles, but
18:45
Respa scales to about a million times as much content without really changing anything other than adding more nodes because you need more resources for that, obviously. So let me
19:03
exit the presentation and show you the Core 19 application, how it works. So this is the front page. Here you can write a query as you would expect, but I'll just click on one of these now.
19:21
This one, for example. So this is a rather complex query and you get results as you would expect. And here you see all the matches that you get in various sources, journals, and so on. So this is a grouping feature in Respa. And then you can also do a search
19:44
for similar articles here. And what you're doing then is adding this related to term to the query, which is just picked up by a custom Java component in this application that fetches that article from Respa, fetches the embedding vector of that article, and then
20:07
adds that embedding vector to the query that is then sent down to get combination of the text features that you added in the query here to, and there is neighbor search
20:21
over that article. So you get the combination of both. And that's very useful when you are exploring, because you have an article that represents somehow the topic you're interested in, and then you combine that with text search features that more precisely expresses
20:43
conditions on what you're interested in. You can also enter the article itself, which is just served from Respa as well. And here you can also do a similar article search by different embedding vectors that are provided and things like that.
21:05
Okay, so how is this implemented? Let's go into it a bit more in detail. So this is the GitHub repo for the frontend, and we have a separate repo for the backend, which is,
21:25
oh, sorry, wrong link, for the Respa application, which is an example of an application package, which I mentioned before. So this is the repo for the Respa application.
21:42
I'll go through what it contains, but first I have it checked out here. So I checked out this repo and go to source, and there you can see the size of the whole thing. So it contains a light GBM model that we have been experimenting with. So there's,
22:05
that's a lot of lines of code, but of course, auto-generated by that machine learning. But apart from that is just about 600 lines of code implementing this entire Core 19
22:20
application, which will scale to any size you want. And you can combine vector similarity and text search, snippeting, grouping, and aggregation, and all of these things. So let's look at what it actually contains. So you have the application itself,
22:45
which basically contains these two files, a services file, which describes the clusters that you want to run. In this case, we run one of these stateless Java container clusters and one content cluster that holds the content. In the container cluster, we have
23:07
some custom Java components, which we'll take a quick look at later. And then we just specify the resources that this cluster should run on. This runs on the public RESTful cloud, and then we can just specify the resources we will have and deploy it,
23:24
and the system will get those resources on AVS and run it. In this case, we just specify the sources of each node, and then we say we want from two to four nodes, depending on the load we are seeing. For the content cluster, there's a little bit of tuning here and also tuning all
23:48
the snippets. And apart from that, we just reference the single schema that we use for the documents, and again, specifying the resources, and that's it. And then we have a deployment
24:01
XML, which specifies where these should run, and this just runs in a single AVS region. If you are self-hosting, Respa is really the same thing, but instead of saying this, you just list the actual hosts that you want to run for this cluster here.
24:21
That's the only difference. And then you don't need the deployment XML file. So what else is here? There's the machine-learned light GBM model, and some certificates, and a specification of
24:44
the stuff you can send in the query, which is these embedding vectors. And then there's the single schema that we use that describes the data we have here, which is a single type representing the scholarly article itself. So it has a bunch of fields,
25:04
as you would expect, with the title, the content itself, the citations, and whatnot. And in addition, some embedding vectors. So we have an embedding vector for
25:20
the abstract for the title, and then we have another embedding vector, which is supplied by the alum institute team, which is called the Spectre embedding. And those are all single-dimension dense tensors. The scheme also describes how we can rank or alternatively evaluate machine-learned
25:50
profile. So there's a bunch of those here. I won't go into them in detail, but there's one that just do normal text features, one that use BM25, which is also with the normal text features.
26:05
And then we have one that uses the light GBM model. And this could also be combined with other features and expressions and tensors and whatnot, because all of this is just
26:23
math, as you can see here. You can say plus the light GBM model here or whatever. We also have some models that are used for the listed searchers, where we just access what
26:41
we call the raw score of this embedded vector nearest neighbor search, which will return a distance. And that's really all you need to create an application. In addition, we have some custom Java code here to implement the stuff I mentioned around searching for
27:10
related articles. So we call these components that can intercept the query and or the results a searcher. They just implement a single method, which is the search method, which gets the query
27:25
and returns the results. In this case, it's just looking to see if there is one of these related to items in a query. If not, it just returns, which means it does nothing and you have a normal search. Otherwise, it's translating that related to item to the approximate nearest neighbor
27:46
operator, which is in a subclass. Let's take a quick look at that as well. Here we can see that, as you can see, it's just to new a nearest neighbor item. Here we don't
28:03
allow approximate nearest neighbor because the data set is so small. So that's the only thing you would change over than adding resources. If you wanted to scale to a billion documents, you would definitely set allow approximate true. But other than that, everything
28:20
would be the same. The great item method is used up here, where we combine it with the other items in a query. And here you can also see an example where we create two
28:43
NMR nearest neighbor items and combine them with OR to search for nearest neighbors, both in the abstract and the title. Things like that you can do freely and it just works.
29:02
That's all I really wanted to cover and that's really all there is in this application. You can easily check this out yourself. If you go to github.com, LESPA engine sample applications, you can find it there. Or just go to Core 19 and click the open source link on top.
29:28
To wrap up, vector-based retrieval and tensor-based relevance, which is one way to look
29:41
at these deep neural networks, at least when you want to just do inference, is emerging as an alternative to traditional search. And it's also already the state of art for recommendation, personalization and targeting and so on. But productionizing these methods
30:03
on their own, even though there are good tools for each of the pieces, it's hard to combine them to a production quality system that has good performance in all cases and sustain good performance as you make changes.
30:23
And that you can combine with filtering and traditional search and so on, which also is operable and scalable. So if you don't want to do all that work, you can just try out vespa.ai, which provides all of it in a single integrated solution with better performance
30:43
than you would get, for sure, by combining these pieces on your own. And you can find Vespa at vespa.ai. So that's all. Then we can switch to live and take questions.
31:02
So thanks, John, for the great presentation. COD 19 search seems super useful, so all the best with that. Guys, we still have a couple of minutes. If you have any questions, please ask them on the Slack channel. I guess, John, you already provided the link to the GitHub
31:21
site. Maybe while we are waiting, do you have any further feedback on how you plan to evolve Vespa? What's the roadmap? So where we're spending most of our efforts right now really is on the cloud service for all the applications that are using it.
31:48
In my company, we provide a cloud service and we just very recently started providing that cloud service to external customers as well. So we are mostly focusing on
32:01
making that more broadly available, adding more features for making it cheaper to run and things like that. So we do seem to have one question. The question is from Edward.
32:24
It's basically... Oh, sorry. Okay. So there's one more before from Maya. She would like to understand how can we build vector embeddings for articles? I guess it's more like, how can you add them? Yeah, I think maybe the question is how to
32:42
come up with the vectors. So that's the machine learning part really. And that's somebody's else's problem as far as we're concerned. We just make it fast to retrieve them and compute with them once you have created the vectors. But how you create the embeddings,
33:00
that's the machine learning part, which typically happens outside this. We have another question. So it's basically from Edward and he's asking if he... So does the Vespa architecture allow plugging
33:20
new artificial neural network algorithms? So basically how extensible is the architecture? So the tensor language we have allows you to express pretty much all the models.
33:42
I've seen recently, when people came up with birch type models, transformer models with lots of matrixes and so on, we had to extend the tensor math language a bit. But apart from that, it should handle all kinds of models. You would come up with what we have there already,
34:05
because the core operations that I mentioned like map and reduce and so on are very general. So you can pretty much implement all kinds of computations over tensors on top of them.
34:21
And I think there was one more question as well. Yeah. So actually I think Edward has a follow-up question. I'm not sure I understand all the... Wait, I see it again. But you see it as well, maybe if you can. Yeah. If we can plug in other approximate nearest neighbor search algorithms into Vespa.
34:46
No, you cannot, not without lots and lots of work. It's basically what you have been doing for about six months now. It's to plug in one algorithm for this into Vespa, which means implementing it in C++ so that it works with the rest of the engine and supports all the
35:06
operations that we need to support with the high throughput, including removal of documents and so on. And most of these algorithms don't handle this very well. So I don't think it work well for production to plug something in. You need to implement it from scratch with all
35:23
these requirements taken into account if it's more than for experimenting. But I think we have chosen the right algorithm for this now. So I don't think there's a great need to plug in something else, to be honest. Right. And there's another question by Maya also related to embeddings and text retrieval features.
35:45
I think part of answer that maybe if you just want to comment a bit more. Yeah. Do you include them in a single model, she's asking. Yeah. So combining embeddings with text retrieval features, there's two parts to it.
36:03
Right. One is the retrieval. You want to retrieve both the nearest neighbor to some vector, but you also want to retrieve the documents that are not near neighbors, but are matching the same tokens. So we want to retrieve a mix of both. And that's sort of logically easy, but difficult to do efficiently because when you want to do it efficiently,
36:28
you want to evaluate both things in parallel really and taking filters into account and so on. So that's another reason why you need to integrate this deep into engine to make it efficient. But we have done that. So when you're using it, it's just you create the nearest
36:43
neighbor item or several items in the query tree, and you can just combine it with and, and or, and so on with text items. And the other part is relevance. And as you saw there in one model that we actually got pretty good results from, we just added
37:01
together the closeness in vector space with some simple text features. So features like the N25 or whatever, and just add them together, probably awaited some. It's fine. So do you have this benchmarking result somewhere?
37:22
Yeah, we do. Yeah, it's part of a sample application that we provide so you can run the whole thing yourself. Actually, if you look in the parent directory of the thing I shared earlier, you'll find all the sample applications with benchmarks as well.
37:43
Super. Great. I don't think I don't see any other questions. So thanks again, John. And so everyone, we can of course continue the discussion in the breakout channels. So basically the vbuzz2. And yeah, thanks again for the presentation and have a nice evening.