RDF export of TIB AV-Portal metadata
The German National Library of Science and Technology (TIB) aims to promote the use and distribution of its collections. In this context, TIB publishes the authoritative and time-based, automatically generated metadata of videos of the TIB AV-Portal as Open Data. Only metadata and thumbnails of videos which allow usage of their respective metadata and thumbnails under the Creative Commons License CC0 1.0 Universal are made available. Please note that the data was partially generated by an automatic process and may therefore contain errors or might be incomplete.
In addition, TIB also offers the metadata of the TIB AV portal via an OAI interface - in the formats OAI Dublin Core, MARC XML or RDF XML.
Table of contents
License
For the use of the metadata and provided thumbnails, the conditions of the Creative Commons License CC0 1.0 Universal (CC0 1.0) Public Domain Dedication shall apply.
(Click here to view summary and legally binding version of the license.)
Acknowledgement
When using the data of TIB, please link to the page https://av.tib.eu/opendata in order to promote the use and distribution of this data.
Datasets JSON Lines
Total
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-opendata-2025-02-24.zip (zipped) | application/jsonl | ~702.6MiB (unzipped ~2.7GiB) | 24.02.2025 | 2025-02-24 |
Documentation of the JSON Lines Data Dumps
The download in JSON Lines format contains two files in the ZIP file: media.jsonl and series.jsonl. The media.jsonl file contains all media, including videos, audio files and offline media. The series.jsonl file contains the datasets for the series.
Both files are stored in JSON Lines format, with each line representing a dataset. For a better overview, the documentation provides an example dataset in a structured form.
File media.jsonl
{
Type specification of the dataset: In the case of media datasets, the value “media” is always specified here.
"type": "media",
Id of the dataset.
"id": 1,
The length of the medium in ms.
"duration": 100000,
Metadata for this dataset.
"metadata": {
The different title specifications for the dataset: main title, subtitle, and alternative title. Subtitles and alternative titles can be provided multiple times.
All title specifications include a value field, which contains the title, and a lang field, which indicates the language for this title.
"title": {
"value": "Title",
"lang": "en"
},
"subtitles": [
{
"value": "Subtitle",
"lang": "en"
},
...
],
"alternativeTitles": [
{
"value": "Alternative Title",
"lang": "en"
},
...
],
Each dataset can have multiple abstracts. The value field contains the actual text of the abstract, while the lang field specifies the language of the abstract.
"abstracts": [
{
"value": "Abstract",
"lang": "en"
},
...
],
A list of keywords for this dataset. The value field contains the keyword, while the lang field specifies the language of the keyword.
"keywords": [
{
"value": "Keyword",
"lang": "en"
},
...
],
Publication Year
"publicationYear": 2025,
Production Year (example: 2025
or 2021-2023
)
"productionYear": "2025",
Production Place
"productionPlace": "Production place",
Language of the medium (ISO 639-2/B).
In addition, 'qno' (silent film) and 'qot' (original sound without spoken text) may occur.
"language": "ger",
Link to the series from the file series.jsonl.
"series": {
"id": 1
},
List of authors, contributors, publishers and producers with corresponding identifiers.
- uri: internal identifier
- name: person/organization
- identifiers: see section "identifiers" below
"creators": [
{
"uri": "identifier",
"name": "name",
"identifiers": [
{
"label": "1080328793",
"url": "http://d-nb.info/gnd/1080328793",
"type": "GND"
},
...
]
},
...
],
"contributors": [
{
"uri": "identifier",
"name": "name"
"identifiers": [
{
"label": "1080328793",
"url": "http://d-nb.info/gnd/1080328793",
"type": "GND"
},
...
]
},
...
],
"publishers": [
{
"uri": "identifier",
"name": "name"
"identifiers": [
{
"label": "1080328793",
"url": "http://d-nb.info/gnd/1080328793",
"type": "GND"
},
...
]
},
...
],
"producers": [
{
"uri": "identifier",
"name": "name"
"identifiers": [
{
"label": "1080328793",
"url": "http://d-nb.info/gnd/1080328793",
"type": "GND"
},
...
]
},
...
],
List of licenses for the medium.
"licenses": [
{
"uri": "identifier",
"shortName": "short name"
},
...
],
Additional identifiers for this dataset.
- label: Display text
- url: Url for the identifier type
- type: ORCID, GND, ISIL, ...
"identifiers": [
{
"label": "label",
"url": "url",
"type": "type"
},
...
],
List of genres and subjects with display text in both German and English.
"genres": [
{
"uri": "uri",
"labels": {
"de": "Name",
"en": "Name"
}
},
...
],
"subjects": [
{
"uri": "uri",
"labels": {
"de": "Name",
"en": "Name"
}
},
...
],
Additional information for IWF films.
"iwfTechData": "",
"iwfSignature": "",
"iwfClassCodes": [
{
"value": "",
"lang": "de"
},
...
],
List of transcriptions.
- type:
Transcription
: original language transcriptionTranslation
: translated transcript
- mainTranscript: Main transcript?
- usableAsSubtitle: Is the transcript usable as a subtitle?
- borked: Is the transcript likely to contain errors?
- automatic: Was the transcript generated automatically (ASR)?
- vtt: Full transcript in WebVTT format.
"transcriptions": [
{
"id": 1,
"source": "",
"type": "",
"language": "de",
"version": "",
"mainTranscript": true,
"usableAsSubtitle": true,
"borked": false,
"automatic": false,
"vtt": ""
},
...
]
},
Reference to a different version of this dataset, e.g. a video in a different language is often referenced for videos.
"otherVersionIds": [
{
"id": 1,
"language": "de"
},
...
],
Time-based metadata.
"segments": {
List of timestamps of the scene cuts in ms.
"scenes": [
{
"time": 0
},
...
],
List of entities found that are assigned to timestamps.
- time: Timestamp in ms
- items:
- source:
asr
(language),ocr
(text) andvcd
(image) - type:
thing
,concept
,person
,organization
andunknown
- labels: display text in German or English
- source:
"annotations": [
{
"time": 0,
"items": [
{
"uri": "",
"source": "",
"type": "",
"labels": {
"de": "Name",
"en": "Name"
}
},
...
]
},
...
]
}
}
File series.jsonl
The same schema is used for series.jsonl, but only some of the properties are used, see example below.
The following fields are aggregated from the media belonging to the series: publishers
, genres
and subjects
.
{
"type": "series",
"id": 1,
"metadata": {
"title": {
"value": "Title",
"lang": "en"
},
"abstracts": [
{
"value": "Abstract",
"lang": "en"
},
...
],
"publishers": [
{
"uri": "identifier",
"name": "name"
},
...
],
"identifiers": [
{
"label": "label",
"url": "url",
"type": "type"
},
...
],
"genres": [
{
"uri": "uri",
"labels": {
"de": "Name",
"en": "Name"
}
},
...
],
"subjects": [
{
"uri": "uri",
"labels": {
"de": "Name",
"en": "Name"
}
},
...
]
}
}
Datasets RDF
Total
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-export-2024-11-28.ttl (zipped) | text/turtle | ~1.6GiB (unzipped ~17.4GiB) | 29.11.2024 | 2024-11-28 |
Dumps of publisher IWF Wissen und Medien gGmbH i.L.
These dumps are a subset of the total stock. They only contain the videos of the publisher IWF Wissen und Medien gGmbH i.L..
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-export-iwf-2024-11-28.ttl (zipped) | text/turtle | ~27.6MiB (unzipped ~244.9MiB) | 29.11.2024 | 2024-11-28 |
Additional Data and Mappings
Mapping of TIB AV-Portal Subjects to DBpedia and GND
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-subjects-1.0.0.ttl | application/turtle | 11kB | 18.03.2016 | 1.0.0 |
Mapping of TIB AV-Portal VCD Classes to DBpedia, Wikidata, and GND
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-classes_vcd-1.0.1.ttl | application/turtle | 11kB | 26.06.2018 | 1.0.1 |
tib-av-portal-classes_vcd-1.0.1.n3 | application/turtle | 48kB | 26.06.2018 | 1.0.1 |
Documentation of the RDF Data Dumps
This documentation will give a brief overview on the structure of the dump data and shows how it can be imported in a RDF store and queried with SPARQL.
Structure of the data
This section will introduce the structure of the TIB AV-Portal RDF data.
The following table shows the RDF prefixes used in the dumps.
Prefix | Namespace | Vocabulary |
---|---|---|
bibframe | http://bibframe.org/vocab/ | Bibframe Vocabulary |
dbp | http://dbpedia.org/resource/ | DBpedia Resources |
dcterms | http://purl.org/dc/terms/ | DCMI Metadata Terms |
dctypes | http://purl.org/dc/dcmitype/ | DCMI Type Vocabulary |
foaf | http://xmlns.com/foaf/0.1/ | Friend of a Friend Vocabulary |
gnd | http://d-nb.info/gnd/ | Integrated Authority File (GND) |
schema | http://schema.org/ | Schema.org Vocabulary |
tib | http://av.tib.eu/resource/ | TIB AV-Portal Resources |
cnt | http://www.w3.org/2011/content# | Representing Content in RDF |
itsrdf | http://www.w3.org/2005/11/its/rdf# | Internationalization Tag Set (ITS) |
nif | http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# | NLP Interchange Format |
oa | http://www.w3.org/ns/oa# | Open Annotation Data Model |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# | Resource Description Framework |
Note: Don't forget, in Turtle syntax slashes are not allowed in the local part of a prefixed name and have to be escaped with '\'.
Example 1: Video Standard Metadata (datatype properties / literals):
tib:video\/16453 schema:name "Wall-crossing and geometry at infinity of Betti moduli spaces"@en ;
schema:description "Linear algebraic differential equation (in one variable) depending on a small ..."@en ;
schema:keywords "Betti moduli"@en , "chaos theory"@en, "singularity"@en ;
schema:date Created "1973-01-01T00:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#gYear> .
schema:duration 1:16:48 .
Example 2: Video Standard Metadata (object properties)
tib:video\/16453 rdf:type schema:Movie ;
schema:url <https://av.tib.eu/media/16453> ;
schema:producer gnd:4028361-6 ;
schema:publisher tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 ;
schema:license <http://creativecommons.org/licenses/by/3.0/deed.en> ;
schema:availability schema:OnlineOnly ;
bibframe:doi <http://dx.doi.org/10.5446/16453> ;
schema:thumbnailUrl <https://av.tib.eu/images/avpimg1fdaede78b338bba137140fd805cd382> .
tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 foaf:name “Institut des Hautes Études Scientifiques (IHÉS)” .
Note: As best as possible, we tried to map publishers, producers, creators, etc. to existing knowledge bases or authority files (e.g. GND). In some cases, a mapping could not be made by now or is simply impossible. In that cases the resource is represented through an IRI with ‘tib:’ prefix and its corresponding information, e.g. foaf:name. In further versions of the dumps, these IRIs are subject to be replaced by its correct common knowledge base or authority resources, if possible.
Example 3: OCR result
tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 dcterms:isPartOf tib:video\/16453 .
tib:ocr\/16453_42436_42436_x368y316h15w292 oa:hasTarget tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 ;
oa:hasBody tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 ;
oa:annotatedBy tib:annotator\/OCR-1.0.0 ;
rdf:type oa:Annotation .
tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 rdf:type nif:Context ;
rdf:type nif:RFC5147String ;
nif:isString “optimal” .
Example 4: VCD result
tib:video\/16453?t=smpte-25:0:01:02:07 dcterms:isPartOf tib:video\/16453 .
tib:vcd\/16453_1347007_1557 oa:hasTarget tib:video\/16453?t=smpte-25:0:01:02:07 ;
oa:hasBody tib:visualconcepts/Lecture ;
oa:annotatedBy tib:annotator\/VCD-1.0.0 ;
oa:motivatedBy oa:tagging ;
rdf:type oa:Annotation .
tib:visualconcepts\/Lecture rdf:type oa:SemanticTag .
Example 5: Named Entity Linking of OCR/ASR
tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 dcterms:isPartOf tib:video\/16453 .
tib:asr\/16453_13753838_7522 oa:hasTarget tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 ;
oa:annotatedBy tib:annotator\/ASR-1.0.0 ;
rdf:type oa:Annotation ;
oa:hasBody tib:asr\/16453_13753838_7522?char=0,5617 .
tib:asr\/16453_13753838_7522?char=0,5617 rdf:type nif:Context ;
rdf:type nif:RFC5147String .
tib:asr\/16453_13753838_7522?char=4743,4747 nif:referenceContext tib:asr\/16453_13753838_7522?char=0,5617 ;
itsrdf:taIdentRef gnd:4038613-2 ;
itsrdf:taAnnotatorsRef tib:annotator\/NEL-1.0.0 ;
rdf:type nif:Phrase ;
rdf:type nif:String ;
nif:beginIndex "4743" ;
nif:beginIndex "4747" ;
nif:anchorOf "sets" .