We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.

Boosting Ranking Performance with Minimal Supervision

Formal Metadata

Boosting Ranking Performance with Minimal Supervision
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date2023

Content Metadata

Subject Area
Transformer language models are highly effective text rankers; however, training Transformer-based neural ranking models requires vast amounts of labeled supervised data, which is costly and time-consuming. What if you could teach a ranking model without behavioral click data or human annotations? Enter generative large language models (LLMs) such as GPT-3. This talk showcases a novel approach to generating labeled data with minimal human supervision. First, with just three human-labeled queries and document examples, an open-source LLM generates synthetic questions for all documents in the index. Then, the synthetic data trains a much smaller, cost-efficient Transformer ranking model, which outperforms a strong BM25 baseline by 10 nDCG@10 points on a popular relevance dataset. The innovative method saves on costly annotation efforts and enables faster adaptation to search ranking in new domains, and allows organizations to revolutionize their search capabilities without breaking the bank.