Boosting Ranking Performance with Minimal Supervision

Cite

Related Material

Plain Schwarz

Bergum, Jo Kristian

Formal Metadata

Title

Boosting Ranking Performance with Minimal Supervision

Title of Series

Berlin Buzzwords 2023

Number of Parts

Author

Bergum, Jo Kristian

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/66636 (DOI)

Publisher

Plain Schwarz

Release Date

2023

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Transformer language models are highly effective text rankers; however, training Transformer-based neural ranking models requires vast amounts of labeled supervised data, which is costly and time-consuming. What if you could teach a ranking model without behavioral click data or human annotations? Enter generative large language models (LLMs) such as GPT-3. This talk showcases a novel approach to generating labeled data with minimal human supervision. First, with just three human-labeled queries and document examples, an open-source LLM generates synthetic questions for all documents in the index. Then, the synthetic data trains a much smaller, cost-efficient Transformer ranking model, which outperforms a strong BM25 baseline by 10 nDCG@10 points on a popular relevance dataset. The innovative method saves on costly annotation efforts and enables faster adaptation to search ranking in new domains, and allows organizations to revolutionize their search capabilities without breaking the bank.