We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Compress giant language models using knowledge distillation

Formale Metadaten

Titel
Compress giant language models using knowledge distillation
Alternativer Titel
Compress language models to effective & resource-saving models with knowledge distillation
Serientitel
Anzahl der Teile
56
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Language models have drawn a lot of attention in NLP in recent years. Despite their short history of development, they have been employed and delivered astonishing performances in all sorts of NLP tasks, such as translation, question answering, information extraction and intelligent search. However, we should not forget that giant language models are not only data hungry, but also energy hungry. State-of-the-art language models such as BERT, RoBERTa and XLNet process millions of parameters, which is only possible with the help of dozens of sophisticated and expensive chips. The CO2 generated in the process is also massive. Being responsible for such high energy consumption is not easy in times of climate change. In order for companies to benefit from the performance of state-of-the-art language models without putting too much strain on their computing costs, the models used must be reduced to a minimum. Of course, performance should not suffer as a result. One possible means to achieve this is the so-called knowledge distillation, which is one common technique among model compression methods. In this presentation, we will show you how you can use knowledge distillation to generate models that achieve comparable performances as state-of-the-art language models effectively, and in a resource-saving manner.