Boosting Ranking Performance with Minimal Supervision

Zitieren

Zugehöriges Material

Plain Schwarz

Bergum, Jo Kristian

Formale Metadaten

Titel

Boosting Ranking Performance with Minimal Supervision

Serientitel

Berlin Buzzwords 2023

Anzahl der Teile

Autor

Bergum, Jo Kristian

Lizenz

CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/66636 (DOI)

Herausgeber

Plain Schwarz

Erscheinungsjahr

2023

Sprache

Englisch

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

Transformer language models are highly effective text rankers; however, training Transformer-based neural ranking models requires vast amounts of labeled supervised data, which is costly and time-consuming. What if you could teach a ranking model without behavioral click data or human annotations? Enter generative large language models (LLMs) such as GPT-3. This talk showcases a novel approach to generating labeled data with minimal human supervision. First, with just three human-labeled queries and document examples, an open-source LLM generates synthetic questions for all documents in the index. Then, the synthetic data trains a much smaller, cost-efficient Transformer ranking model, which outperforms a strong BM25 baseline by 10 nDCG@10 points on a popular relevance dataset. The innovative method saves on costly annotation efforts and enables faster adaptation to search ranking in new domains, and allows organizations to revolutionize their search capabilities without breaking the bank.