We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

RDF export of TIB AV-Portal metadata

The German National Library of Science and Technology (TIB) aims to promote the use and distribution of its collections. In this context, TIB publishes the authoritative and time-based, automatically generated metadata of videos of the TIB AV-Portal as Open Data. Only metadata and thumbnails of videos which allow usage of their respective metadata and thumbnails under the Creative Commons License CC0 1.0 Universal are made available. Please note that the data was partially generated by an automatic process and may therefore contain errors or might be incomplete.

In addition, TIB also offers the metadata of the TIB AV portal via an OAI interface - in the formats OAI Dublin Core, MARC XML or RDF XML.

License

For the use of the metadata and provided thumbnails, the conditions of the Creative Commons License CC0 1.0 Universal (CC0 1.0) Public Domain Dedication shall apply.
(Click here to view summary and legally binding version of the license.)

Acknowledgement

When using the data of TIB, please link to the page https://av.tib.eu/opendata in order to promote the use and distribution of this data.

Datasets JSON Lines

Total

Filename Format Size Date created: Version:
tib-av-portal-opendata-2025-02-24.zip (zipped) application/jsonl ~702.6MiB (unzipped ~2.7GiB) 24.02.2025 2025-02-24

Documentation of the JSON Lines Data Dumps

The download in JSON Lines format contains two files in the ZIP file: media.jsonl and series.jsonl. The media.jsonl file contains all media, including videos, audio files and offline media. The series.jsonl file contains the datasets for the series.

Both files are stored in JSON Lines format, with each line representing a dataset. For a better overview, the documentation provides an example dataset in a structured form.

File media.jsonl
{

Type specification of the dataset: In the case of media datasets, the value “media” is always specified here.

  "type": "media",

Id of the dataset.

  "id": 1,

The length of the medium in ms.

  "duration": 100000,

Metadata for this dataset.

  "metadata": {

The different title specifications for the dataset: main title, subtitle, and alternative title. Subtitles and alternative titles can be provided multiple times.

All title specifications include a value field, which contains the title, and a lang field, which indicates the language for this title.

    "title": {
      "value": "Title",
      "lang": "en"
    },
    "subtitles": [
      {
        "value": "Subtitle",
        "lang": "en"
      },
      ...
    ],
    "alternativeTitles": [
      {
        "value": "Alternative Title",
        "lang": "en"
      },
      ...
    ],

Each dataset can have multiple abstracts. The value field contains the actual text of the abstract, while the lang field specifies the language of the abstract.

    "abstracts": [
      {
        "value": "Abstract",
        "lang": "en"
      },
      ...
    ],

A list of keywords for this dataset. The value field contains the keyword, while the lang field specifies the language of the keyword.

    "keywords": [
      {
        "value": "Keyword",
        "lang": "en"
      },
      ...
    ],

Publication Year

    "publicationYear": 2025,

Production Year (example: 2025 or 2021-2023)

    "productionYear": "2025",

Production Place

    "productionPlace": "Production place",

Language of the medium (ISO 639-2/B).

In addition, 'qno' (silent film) and 'qot' (original sound without spoken text) may occur.

    "language": "ger",

Link to the series from the file series.jsonl.

    "series": {
      "id": 1
    },

List of authors, contributors, publishers and producers with corresponding identifiers.

  • uri: internal identifier
  • name: person/organization
  • identifiers: see section "identifiers" below
    "creators": [
      {
        "uri": "identifier",
        "name": "name",
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "contributors": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "publishers": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],
    "producers": [
      {
        "uri": "identifier",
        "name": "name"
        "identifiers": [
          {
            "label": "1080328793",
            "url": "http://d-nb.info/gnd/1080328793",
            "type": "GND"
          },
          ...
        ]
      },
      ...
    ],

List of licenses for the medium.

    "licenses": [
      {
        "uri": "identifier",
        "shortName": "short name"
      },
      ...
    ],

Additional identifiers for this dataset.

  • label: Display text
  • url: Url for the identifier type
  • type: ORCID, GND, ISIL, ...
    "identifiers": [
      {
        "label": "label",
        "url": "url",
        "type": "type"
      },
      ...
    ],

List of genres and subjects with display text in both German and English.

    "genres": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],
    "subjects": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],

Additional information for IWF films.

    "iwfTechData": "",
    "iwfSignature": "",
    "iwfClassCodes": [
      {
        "value": "",
        "lang": "de"
      },
      ...
    ],

List of transcriptions.

  • type:
    • Transcription: original language transcription
    • Translation: translated transcript
  • mainTranscript: Main transcript?
  • usableAsSubtitle: Is the transcript usable as a subtitle?
  • borked: Is the transcript likely to contain errors?
  • automatic: Was the transcript generated automatically (ASR)?
  • vtt: Full transcript in WebVTT format.
    "transcriptions": [
      {
        "id": 1,
        "source": "",
        "type": "",
        "language": "de",
        "version": "",
        "mainTranscript": true,
        "usableAsSubtitle": true,
        "borked": false,
        "automatic": false,
        "vtt": ""
      },
      ...
    ]
  },

Reference to a different version of this dataset, e.g. a video in a different language is often referenced for videos.

  "otherVersionIds": [
    {
      "id": 1,
      "language": "de"
    },
    ...
  ],

Time-based metadata.

  "segments": {

List of timestamps of the scene cuts in ms.

    "scenes": [
      {
        "time": 0
      },
      ...
    ],

List of entities found that are assigned to timestamps.

  • time: Timestamp in ms
  • items:
    • source: asr (language), ocr (text) and vcd (image)
    • type: thing, concept, person, organization and unknown
    • labels: display text in German or English
    "annotations": [
      {
        "time": 0,
        "items": [
          {
            "uri": "",
            "source": "",
            "type": "",
            "labels": {
              "de": "Name",
              "en": "Name"
            }
          },
          ...
        ]
      },
      ...
    ]
  }
}
File series.jsonl

The same schema is used for series.jsonl, but only some of the properties are used, see example below.

The following fields are aggregated from the media belonging to the series: publishers, genres and subjects.

{
  "type": "series",
  "id": 1,
  "metadata": {
    "title": {
      "value": "Title",
      "lang": "en"
    },
    "abstracts": [
      {
        "value": "Abstract",
        "lang": "en"
      },
      ...
    ],
    "publishers": [
      {
        "uri": "identifier",
        "name": "name"
      },
      ...
    ],
    "identifiers": [
      {
        "label": "label",
        "url": "url",
        "type": "type"
      },
      ...
    ],
    "genres": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ],
    "subjects": [
      {
        "uri": "uri",
        "labels": {
          "de": "Name",
          "en": "Name"
        }
      },
      ...
    ]
  }
}

Datasets RDF

Total

Filename Format Size Date created: Version:
tib-av-portal-export-2024-11-28.ttl (zipped) text/turtle ~1.6GiB (unzipped ~17.4GiB) 29.11.2024 2024-11-28

Dumps of publisher IWF Wissen und Medien gGmbH i.L.

These dumps are a subset of the total stock. They only contain the videos of the publisher IWF Wissen und Medien gGmbH i.L..

Filename Format Size Date created: Version:
tib-av-portal-export-iwf-2024-11-28.ttl (zipped) text/turtle ~27.6MiB (unzipped ~244.9MiB) 29.11.2024 2024-11-28

Additional Data and Mappings

Mapping of TIB AV-Portal Subjects to DBpedia and GND

Filename Format Size Date created: Version:
tib-av-portal-subjects-1.0.0.ttl application/turtle 11kB 18.03.2016 1.0.0

Mapping of TIB AV-Portal VCD Classes to DBpedia, Wikidata, and GND

Filename Format Size Date created: Version:
tib-av-portal-classes_vcd-1.0.1.ttl application/turtle 11kB 26.06.2018 1.0.1
tib-av-portal-classes_vcd-1.0.1.n3 application/turtle 48kB 26.06.2018 1.0.1

Documentation of the RDF Data Dumps

This documentation will give a brief overview on the structure of the dump data and shows how it can be imported in a RDF store and queried with SPARQL.

Structure of the data

This section will introduce the structure of the TIB AV-Portal RDF data.

The following table shows the RDF prefixes used in the dumps.

Prefix Namespace Vocabulary
bibframe http​://bibframe.org/vocab/ Bibframe Vocabulary
dbp http​://dbpedia.org/resource/ DBpedia Resources
dcterms http​://purl.org/dc/terms/ DCMI Metadata Terms
dctypes http​://purl.org/dc/dcmitype/ DCMI Type Vocabulary
foaf http​://xmlns.com/foaf/0.1/ Friend of a Friend Vocabulary
gnd http​://d-nb.info/gnd/ Integrated Authority File (GND)
schema http​://schema.org/ Schema.org Vocabulary
tib http​://av.tib.eu/resource/ TIB AV-Portal Resources
cnt http​://www​.w3.org/2011/content# Representing Content in RDF
itsrdf http​://www​.w3.org/2005/11/its/rdf# Internationalization Tag Set (ITS)
nif http​://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# NLP Interchange Format
oa http​://www​.w3.org/ns/oa# Open Annotation Data Model
rdf http​://www​.w3.org/1999/02/22-rdf-syntax-ns# Resource Description Framework

Note: Don't forget, in Turtle syntax slashes are not allowed in the local part of a prefixed name and have to be escaped with '\'.

Example 1: Video Standard Metadata (datatype properties / literals):
tib:video\/16453 schema:name           "Wall-crossing and geometry at infinity of Betti moduli spaces"@en ;
schema:description    "Linear algebraic differential equation (in one variable) depending on a small ..."@en ;
schema:keywords       "Betti moduli"@en ,  "chaos theory"@en,  "singularity"@en ;
schema:date Created   "1973-01-01T00:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#gYear> .
schema:duration       1:16:48 .
Example 2: Video Standard Metadata (object properties)
tib:video\/16453 rdf:type              schema:Movie ;
schema:url            <https://av.tib.eu/media/16453> ;
schema:producer       gnd:4028361-6 ;
schema:publisher      tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 ;
schema:license        <http://creativecommons.org/licenses/by/3.0/deed.en> ;
schema:availability   schema:OnlineOnly ;
bibframe:doi          <http://dx.doi.org/10.5446/16453> ;
schema:thumbnailUrl   <https://av.tib.eu/images/avpimg1fdaede78b338bba137140fd805cd382> .

tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29  foaf:name  “Institut des Hautes Études Scientifiques (IHÉS)” .

Note: As best as possible, we tried to map publishers, producers, creators, etc. to existing knowledge bases or authority files (e.g. GND). In some cases, a mapping could not be made by now or is simply impossible. In that cases the resource is represented through an IRI with ‘tib:’ prefix and its corresponding information, e.g. foaf:name. In further versions of the dumps, these IRIs are subject to be replaced by its correct common knowledge base or authority resources, if possible.

Example 3: OCR result

Image: Example 3

tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 dcterms:isPartOf tib:video\/16453 .

tib:ocr\/16453_42436_42436_x368y316h15w292   oa:hasTarget    tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 ;
oa:hasBody      tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 ;
oa:annotatedBy  tib:annotator\/OCR-1.0.0 ;
rdf:type        oa:Annotation .

tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 rdf:type nif:Context ;
rdf:type nif:RFC5147String ;
nif:isString “optimal” .
Example 4: VCD result

Image: Example 4

tib:video\/16453?t=smpte-25:0:01:02:07 dcterms:isPartOf tib:video\/16453 .

tib:vcd\/16453_1347007_1557  oa:hasTarget   tib:video\/16453?t=smpte-25:0:01:02:07 ;
oa:hasBody     tib:visualconcepts/Lecture ;
oa:annotatedBy tib:annotator\/VCD-1.0.0 ;
oa:motivatedBy oa:tagging ;
rdf:type       oa:Annotation .

tib:visualconcepts\/Lecture  rdf:type oa:SemanticTag .
Example 5: Named Entity Linking of OCR/ASR

Image: Example 5

tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 dcterms:isPartOf tib:video\/16453 .

tib:asr\/16453_13753838_7522 oa:hasTarget   tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 ;
oa:annotatedBy tib:annotator\/ASR-1.0.0 ;
rdf:type       oa:Annotation ;
oa:hasBody     tib:asr\/16453_13753838_7522?char=0,5617 .

tib:asr\/16453_13753838_7522?char=0,5617 rdf:type nif:Context ;
rdf:type nif:RFC5147String .

tib:asr\/16453_13753838_7522?char=4743,4747 nif:referenceContext tib:asr\/16453_13753838_7522?char=0,5617 ;
itsrdf:taIdentRef gnd:4038613-2 ;
itsrdf:taAnnotatorsRef tib:annotator\/NEL-1.0.0 ;
rdf:type nif:Phrase ;
rdf:type nif:String ;
nif:beginIndex "4743" ;
nif:beginIndex "4747" ;
nif:anchorOf "sets" .