Lies, damned lies and large language models

Cite

Related Material

EuroPython

Burchell, Jodie

Formal Metadata

Title

Lies, damned lies and large language models

Title of Series

EuroPython 2024

Number of Parts

131

Author

Burchell, Jodie

Contributors

N. N. (Moderation)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/69471 (DOI)

Publisher

EuroPython

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Would you like to use large language models (LLMs) in your own project, but are troubled by their tendency to frequently “hallucinate”, or produce incorrect information? Have you ever wondered if there was a way to easily measure an LLM’s hallucination rate, and compare this against other models? And would you like to learn how to help LLMs produce more accurate information? In this talk, we’ll have a look at some of the main reasons that hallucinations occur in LLMs, and then focus on how we can measure one specific type of hallucination: the tendency of models to regurgitate misinformation that they have learned from their training data. We’ll explore how we can easily measure this type of hallucination in LLMs using a dataset called TruthfulQA in conjunction with Python tooling including Hugging Face’s `datasets` and `transformers` packages, and the `langchain` package. We’ll end by looking at recent initiatives to reduce hallucinations in LLMs, using a technique called retrieval augmented generation (RAG). We’ll look at how and why RAG makes LLMs less likely to hallucinate, and how this can help make these models more reliable and usable in a range of contexts.