We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search

Formal Metadata

Title
Jina Embeddings V2: From Raw Data to Bilingual Hybrid Search
Title of Series
Number of Parts
64
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
In this talk, we explore the sophisticated design, training, and application of bilingual Jina Embeddings V2, the state-of-the-art German-English embedding model crafted here in Berlin. Acknowledging the inherent shortcomings of traditional exact match and term-based retrieval methods, we dive into the application of this bilingual model in a hybrid search setup. By combining vector-based search with conventional BM25 search, we harness the strengths of both approaches, leading to a marked enhancement in search results. This discussion is therefore highly relevant to anyone in the search field. Participants gain insights into the training processes of embedding models, the methodologies for sourcing and preparing data for these models, and the straightforward integration of our open-source German-English bilingual model into a search pipeline to enhance results. This talk is aimed at those keen on the latest in search and retrieval technologies, offering practical knowledge on improving search systems through the use of embeddings.