The TIB AV-Portal
The TIB AV-Portal is an open access platform for scientific videos with a focus on technology as well as architecture, chemistry, computer science, mathematics and physics. Over time, the scope of the portal has expanded to include a wider range of scientific disciplines, including the humanities and social sciences, economics, law and medicine (see subjects). Users of the portal have access to a variety of video content, including lecture and conference recordings, simulations, animations, experiments, interviews and video abstracts as well as open educational resources.
The portal was developed from July 2011 to April 2014 by the Lab Non-Textual Materials of the German National Library of Science and Technology (TIB) in cooperation with the Hasso Plattner Institute for Software Systems Engineering. The portal went online in April 2014 and was operated by yovisto GmbH until 2019 and further developed on behalf of the TIB. In September 2018, the "Scrum Team AV-Portal" was founded at the TIB, which was entrusted with the task of migrating all software projects of the portal to the TIB infrastructure. Since 2020, responsibility for the operation and continuous development of the portal has been in the hands of the TIB Scrum Team.
The TIB AV-Portal simplifies the search, citation and publication of scientific videos and also offers the option of downloading videos, licensing them or ordering them as DVDs. Due to the use of Creative Commons licenses, most of the content is freely reusable.
The services of the TIB AV-Portal include:
- Hosting and long-term archiving of videos and accompanying materials
- DOI registration for permanent citation
- Rights clearance and license advice
- Speech, text and image recognition
- Continuous further development of the portal
- Editorial support
- Open, ad-free and GDPR-compliant
Get to know the TIB AV-Portal in just 120 seconds!
Scene detection – shot boundary detection segments the video based on cuts. A visual table of contents generated from this provides a quick overview of the entire video content and facilitates targeted access to specific sections. Each scene can be cited to the second via Media Fragment Identifier. The technology used is PySceneDetect.
Text recognition – video optical character recognition captures, indexes and makes written language, such as text on presentation slides, searchable. Tesseract is used for this purpose.
Speech recognition – automatic speech recognition transcribes the spoken language in the video. The result is a transcript with time stamps that enables a precise search. The AI-based speech recognition "Whisper" transcribes 100 languages, including English, German, French, Spanish and Ukrainian, and translates these languages into English. Accordingly, the video subtitles and transcripts are offered in both the original language and in English translation.
Image recognition – visual concept detection indexes the moving image with visual concepts such as "computer animation" or "experiment". The Scrum Team AV-Portal is currently working together with the Visual Analytics Research Group on the further development of image recognition. In future, an open-clip model will probably be used to label the visual concepts.
Keywording – named-entity linking links the individual video segments with subject headings from the Integrated Authority File (GND). The terms are disambiguated and have relationships to other terms, which enables a more effective search.
In principle, videos are only analysed automatically if this is legally permissible. The language of the video is converted into a machine-readable transcript that can be displayed as subtitles and searched. Non-English videos are translated into English; these translations are available both as subtitles and as searchable transcripts.
Annotations of language, text and images are generated exclusively for videos belonging to the six core subjects of the TIB – Leibniz Information Centre for Science and Technology and University Library: Engineering, Architecture, Chemistry, Computer Science, Mathematics and Physics. The reason for this is that a vocabulary for automatic annotation is currently only available for these subjects.
The TIB AV-Portal offers both a German and an English user interface, which can be selected via the language selection in the user settings (top right).
Many metadata are available in German and English, so that both German and English search terms can be entered in the search field. If a German search term is entered, the English translations are also automatically searched; the same applies vice versa for English search terms.
The automatic speech recognition Whisper, which we use to create subtitles and searchable transcripts, can transcribe around 100 languages. These include German, English, Spanish, French, Italian, Greek, Russian, Ukrainian, Polish, Japanese, Turkish and Chinese.