Open data export of TIB AV-Portal metadata

The German National Library of Science and Technology (TIB) aims to promote the use and distribution of its collections. In this context, TIB publishes the authoritative and time-based, automatically generated metadata of videos of the TIB AV-Portal as Open Data. Only metadata and thumbnails of videos which allow usage of their respective metadata and thumbnails under the Creative Commons License CC0 1.0 Universal are made available. Please note that the data was partially generated by an automatic process and may therefore contain errors or might be incomplete.

In addition, TIB also offers the metadata of the TIB AV portal via an OAI interface - in the formats OAI Dublin Core, MARC XML or RDF XML.

License
Acknowledgement
Datasets
Documentation
- JSON Lines
  - Media
  - Series
- RDF

License

For the use of the metadata and provided thumbnails, the conditions of the Creative Commons License CC0 1.0 Universal (CC0 1.0) Public Domain Dedication shall apply.
(Click here to view summary and legally binding version of the license.)

Acknowledgement

When using the data of TIB, please link to the page https://av.tib.eu/opendata in order to promote the use and distribution of this data.

Datasets

Filename	Format	Size	Date created:	Version:
tib-av-portal-opendata-2025-06-30-jsonl.zip (zipped)	application/jsonl	~535MiB (unzipped ~2.2GiB)	30.06.2025	2025-06-30
tib-av-portal-opendata-2025-06-30-ttl.zip (zipped)	text/turtle	~503.1MiB (unzipped ~3.4GiB)	30.06.2025	2025-06-30

Documentation

All datasets are provided as ZIP files to download. Every ZIP file contains two files:

media.EXT: All media, including videos, audio files and offline media.
series.EXT: All corresponding series.

We offer various formats, each with its own file extension:

JSON Lines (.jsonl)
RDF Turtle (.ttl)

JSON Lines

JSON Lines format contains of one dataset per line.

For a better overview, the documentation provides an example dataset in a structured form.

Media

Type specification of the dataset: In the case of media datasets, the value “media” is always specified here.

  "type": "media",

Id of the dataset.

  "id": 1,

The length of the medium in ms.

  "duration": 100000,

Metadata for this dataset.

  "metadata": {

The different title specifications for the dataset: main title, subtitle, and alternative title. Subtitles and alternative titles can be provided multiple times.

All title specifications include a value field, which contains the title, and a lang field, which indicates the language for this title.

    "title": {
      "value": "Title",
      "lang": "en"
    },
    "subtitles": [
      {
        "value": "Subtitle",
        "lang": "en"
      },
      ...
    ],
    "alternativeTitles": [
      {
        "value": "Alternative Title",
        "lang": "en"
      },
      ...
    ],

Each dataset can have multiple abstracts. The value field contains the actual text of the abstract, while the lang field specifies the language of the abstract.

    "abstracts": [
      {
        "value": "Abstract",
        "lang": "en"
      },
      ...
    ],

A list of keywords for this dataset. The value field contains the keyword, while the lang field specifies the language of the keyword.

    "keywords": [
      {
        "value": "Keyword",
        "lang": "en"
      },
      ...
    ],

Publication Year

    "publicationYear": 2025,

Production Year (example: 2025 or 2021-2023)

    "productionYear": "2025",

Production Place

    "productionPlace": "Production place",

Language of the medium (ISO 639-2/B).

In addition, 'qno' (silent film) and 'qot' (original sound without spoken text) may occur.

    "language": "ger",

Link to the series from the file series.jsonl.

    "series": {
      "id": 1
    },

List of authors, contributors, publishers and producers with corresponding identifiers.

uri: internal identifier
name: person/organization
identifiers: see section "identifiers" below

    "creators": [
      {
        "uri": "identifier",
        "name": "name",
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "contributors": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "publishers": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "producers": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],

List of licenses for the medium.

    "licenses": [
      {
        "uri": "identifier",
        "shortName": "short name"
      },
      ...
    ],

Additional identifiers for this dataset.

label: Display text
url: Url for the identifier type
type: ORCID, GND, ISIL, ...

    "identifiers": [
      {
        "label": "label",
        "url": "url",
        "type": "type"
      },
      ...
    ],

List of genres and subjects with display text in both German and English.

    "genres": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],
    "subjects": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],

Additional information for IWF films.

    "iwfTechData": "",
    "iwfSignature": "",
    "iwfClassCodes": [
      {
        "value": "",
        "lang": "de"
      },
      ...
    ],

List of transcriptions.

type:
- Transcription: original language transcription
- Translation: translated transcript
usableAsSubtitle: Is the transcript usable as a subtitle?
automatic: Was the transcript generated automatically (ASR)?
vtt: Full transcript in WebVTT format.

    "transcriptions": [
      {
        "id": 1,
        "source": "",
        "type": "",
        "language": "de",
        "version": "",
        "usableAsSubtitle": true,
        "automatic": false,
        "vtt": ""
      },
      ...
    ]
  },

Reference to a different version of this dataset, e.g. a video in a different language is often referenced for videos.

  "otherVersionIds": [
    {
      "id": 1,
      "language": "de"
    },
    ...
  ],

Time-based metadata.

  "segments": {

List of timestamps of the scene cuts in ms.

    "scenes": [
      {
        "time": 0
      },
      ...
    ],

List of entities found that are assigned to timestamps.

time: Timestamp in ms
items:
- source: asr (language), ocr (text) and vcd (image)
- type: thing, concept, person, organization and unknown
- labels: display text in German or English

    "annotations": [
      {
        "time": 0,
        "items": [
          {
            "uri": "",
            "source": "",
            "type": "",
            "labels": {
              "de": "Name",
              "en": "Name"
            }
          },
          ...
        ]
      },
      ...
    ]
  }
}

Series

The same schema is used for series.jsonl, but only some of the properties are used, see example below.

The following fields are aggregated from the media belonging to the series: publishers, genres and subjects.

{
  "type": "series",
  "id": 1,
  "metadata": {
    "title": {
      "value": "Title",
      "lang": "en"
    },
    "abstracts": [
      {
        "value": "Abstract",
        "lang": "en"
      },
      ...
    ],
    "publishers": [
      {
        "uri": "identifier",
        "name": "name"
      },
      ...
    ],
    "identifiers": [
      {
        "label": "label",
        "url": "url",
        "type": "type"
      },
      ...
    ],
    "genres": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],
    "subjects": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ]
  }
}

RDF

The RDF datasets contain the same information as the JSON Lines datasets. For detailed documentation of fields see the JSON Lines documentation above.

RDF Turtle was chosen as RDF serialization.

Namespaces

The following table shows the RDF prefixes and ontologies used in the dumps.

Prefix	Namespace	Vocabulary
dcterms	http://purl.org/dc/terms/	DCMI Metadata Terms
gnd	http://d-nb.info/gnd/	The Integrated Authority File
iso639	http://id.loc.gov/vocabulary/iso639-2/	ISO 639-2 Languages
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#	Resource Description Framework
schema	http://schema.org/	Schema.org Vocabulary
tib	http://av.tib.eu/resource/	TIB AV-Portal Ontology

AV-Portal Ontology

Documentation of TIB AV-Portal namespace http://av.tib.eu/resource/ (tib:).

Predicates

Resources

genre/<ID>
media/<ID>
series/<ID>
subject/<ID>
transcription/<ID>
visualconcepts/<ID>

Examples

Metadata

<http://av.tib.eu/resource/media/42>
        rdf:type                    schema:MediaObject;
        tib:iwfClassCode            "biology"@en;
        tib:iwfSignature            "X 00";
        tib:iwfTechData             "Film, 16mm";
        dcterms:subject             <http://av.tib.eu/resource/subject/Life_Sciences>;
        schema:abstract             "my abstract."@en;
        schema:alternateName        "Mein Titel"@de;
        schema:alternativeHeadline  "Secondary Title"@en;
        schema:contributor          [ schema:name  "Contributor" ];
        schema:creator              [ schema:name  "Second Creator" ];
        schema:creator              [ schema:identifier  <https://orcid/0000-0000-0000>;
                                      schema:name        "John Smith"
                                    ];
        schema:dateCreated          "1998-1999";
        schema:datePublished        "2000";
        schema:genre                <http://av.tib.eu/resource/genre/Documentation_Report>;
        schema:identifier           <https://doi.org/10.5072/test>;
        schema:inLanguage           iso639:eng;
        schema:isPartOf             <http://av.tib.eu/resource/series/11>;
        schema:keywords             "foobar"@en;
        schema:license              <http://creativecommons.org/licenses/by-nc-sa/3.0/de/>;
        schema:locationCreated      "Hannover";
        schema:name                 "My Title"@en;
        schema:producers            [ schema:name  "Producer" ];
        schema:publisher            [ schema:name  "Publisher" ];
        schema:thumbnailUrl         <https://av.tib.eu/thumbnail/42>;
        schema:url                  <https://av.tib.eu/media/42> .

Transcriptions

<http://av.tib.eu/resource/transcription/100>
        tib:asrAutomatic         true;
        tib:asrSource            "Whisper";
        tib:asrType              "Transcription";
        tib:asrUsableAsSubtitle  true;
        tib:asrVersion           "whisper-ctranslate2=0.5.0@medium";
        schema:language          "en";
        schema:transcript        "WEBVTT\n\n00:00:00.000 --> 00:00:05.000\nHello world!\n" .
<http://av.tib.eu/resource/media/42>
        rdf:type              schema:MediaObject;
        schema:thumbnailUrl   <https://av.tib.eu/thumbnail/42>;
        schema:transcription  <http://av.tib.eu/resource/transcription/100>;
        schema:url            <https://av.tib.eu/media/42> .

Scenes

<http://av.tib.eu/resource/media/42>
        rdf:type             schema:MediaObject;
        tib:scene            [ rdf:type          schema:Clip;
                               schema:startTime  "01:40"
                             ];
        tib:scene            [ rdf:type          schema:Clip;
                               schema:startTime  "00:10"
                             ];
        schema:thumbnailUrl  <https://av.tib.eu/thumbnail/42>;
        schema:url           <https://av.tib.eu/media/42> .

Annotations

<http://av.tib.eu/resource/media/42>
        rdf:type             schema:MediaObject;
        tib:segment          [ rdf:type          schema:Clip;
                               tib:annotatedBy   [ tib:annotation        <http://av.tib.eu/resource/visualconcepts/diagram>;
                                                   tib:annotationSource  "vcd";
                                                   tib:annotationType    "concept"
                                                 ];
                               schema:startTime  "11:40"
                             ];
        tib:segment          [ rdf:type          schema:Clip;
                               tib:annotatedBy   [ tib:annotation        <http://av.tib.eu/resource/visualconcepts/diagram>;
                                                   tib:annotationSource  "vcd";
                                                   tib:annotationType    "concept"
                                                 ];
                               tib:annotatedBy   [ tib:annotation        gnd:4193845-8;
                                                   tib:annotationSource  "asr";
                                                   tib:annotationType    "thing"
                                                 ];
                               schema:startTime  "00:50"
                             ];
        schema:thumbnailUrl  <https://av.tib.eu/thumbnail/42>;
        schema:url           <https://av.tib.eu/media/42> .
gnd:4193845-8  rdf:label  "Summation"@en , "Summe"@de .
<http://av.tib.eu/resource/visualconcepts/diagram>
        rdf:label  "Diagram"@en , "Diagramm"@de .

Series

<http://av.tib.eu/resource/series/42>
        rdf:type     schema:Series;
        schema:name  "My Series"@en;
        schema:url   <https://av.tib.eu/series/42> .

Open data export of TIB AV-Portal metadata

Table of contents

License

Acknowledgement

Datasets

Documentation

JSON Lines

Media

Series

RDF

Namespaces

AV-Portal Ontology

Predicates

Resources

Examples

Metadata

Transcriptions

Scenes

Annotations

Series