We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building Open Source Language Models

Formal Metadata

Title
Building Open Source Language Models
Title of Series
Number of Parts
798
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
LINAGORA, as a leader in the Open LLM France community, has made it a priority to pull the curtain off of the process of building Large Language Models (LLMs). While most LLMs in use today – even the “open” ones – reveal few to no details about their training, and especially the data on which they are trained, we have decided to share it all. In this talk, we discuss why using an open model trained on traceable data is important for business and research alike and examine some of the difficulties involved in pursuing an open strategy for LLMs. We bring to the table our experience with data collection and training of LLMs, including the Claire family of language models.