Comparing vector implementations in generic databases

Plain Schwarz

Golubenco, Tudor

Formal Metadata

Title

Title of Series

Berlin Buzzwords 2024

Number of Parts

Author

Golubenco, Tudor

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/70263 (DOI)

Publisher

Plain Schwarz

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

We're going to look in particular at (at least) two vector search implementation in popular tools that a lot of people already use: * pgvector for PostgreSQL * Lucene vector implementation for Elasticsearch and OpenSearch We recently had to evaluate the two for a particular use case and the comparison is quite interesting, there are pros to each, for example: * pgvector means less infra and cost, and is always strongly consistent * Elasticsearch/Opensearch can do automatic sharding * in postgres you can shard by tenant easier by using schemas or partitioned indexes * Lucene can combine functionality with full-text search We'll go through the above and also discuss when going for a dedicated vector DB makes sense.