#bbuzz: Fast scalable evaluation of ML models over large data sets using open source

Plain Schwarz

Bratseth, Jon

Formal Metadata

Title

Title of Series

Berlin Buzzwords 2020

Number of Parts

Author

Bratseth, Jon

Contributors

Biswas, Debmalya (Moderation)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/68793 (DOI)

Publisher

Plain Schwarz

Release Date

2020

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Modern solutions to search and recommendation require evaluating machine-learned models over large data sets with low latency. Producing the best results typically require combining fast (approximate) nearest neighbour search in vector spaces to limit candidates, filtering to surface only the appropriate subset of results in each case, and evaluation of more complex ML models such as deep neural nets computing over both vectors and semantic features. Combining these needs into a working and scalable solution is a large challenge as separate components solving for each requirement cannot be composed into a scalable whole for fundamental reasons. This talk will explain the architectural challenges of this problem, show the advantages of solving it on concrete cases and introduce an open source engine - Vespa.ai - that provides a scalable solution by implementing all the elements in a single distributed execution.