RDF export of TIB AV-Portal metadata
The German National Library of Science and Technology (TIB) aims to promote the use and distribution of its collections. In this context, TIB publishes the authoritative and time-based, automatically generated metadata of videos of the TIB AV-Portal as Linked Open Data. Only metadata and thumbnails of videos which allow usage of their respective metadata and thumbnails under the Creative Commons License CC0 1.0 Universal are made available. Please note that the data was partially generated by an automatic process and may therefore contain errors or might be incomplete.
In addition, TIB also offers the metadata of the TIB AV portal via an OAI interface - in the formats OAI Dublin Core, MARC XML or RDF XML.
Table of contents
Datasets
Total
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-export-2023-09-26.ttl (zipped) | text/turtle | ~1.4GiB (unzipped ~13.9GiB) | 27.09.2023 | 2023-09-26 |
Dumps of publisher IWF Wissen und Medien gGmbH i.L.
These dumps are a subset of the total stock. They only contain the videos of the publisher IWF Wissen und Medien gGmbH i.L..
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-export-iwf-2023-09-26.ttl (zipped) | text/turtle | ~25.0MiB (unzipped ~210.6MiB) | 27.09.2023 | 2023-09-26 |
Additional Data and Mappings
Mapping of TIB AV-Portal Subjects to DBpedia and GND
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-subjects-1.0.0.ttl | application/turtle | 11kB | 18.03.2016 | 1.0.0 |
Mapping of TIB AV-Portal VCD Classes to DBpedia, Wikidata, and GND
Filename | Format | Size | Date created: | Version: |
---|---|---|---|---|
tib-av-portal-classes_vcd-1.0.1.ttl | application/turtle | 11kB | 26.06.2018 | 1.0.1 |
tib-av-portal-classes_vcd-1.0.1.n3 | application/turtle | 48kB | 26.06.2018 | 1.0.1 |
License
For the use of the metadata and provided thumbnails, the conditions of the Creative Commons License CC0 1.0 Universal (CC0 1.0) Public Domain Dedication shall apply.
(Click here to view summary and legally binding version of the license.)
Acknowledgement
When using the data of TIB, please link to the page https://av.tib.eu/opendata in order to promote the use and distribution of this data.
Documentation of the Data Dumps
This documentation will give a brief overview on the structure of the dump data and shows how it can be imported in a RDF store and queried with SPARQL.
Structure of the data
This section will introduce the structure of the TIB AV-Portal RDF data.
The following table shows the RDF prefixes used in the dumps.
Prefix | Namespace | Vocabulary |
---|---|---|
bibframe | http://bibframe.org/vocab/ | Bibframe Vocabulary |
dbp | http://dbpedia.org/resource/ | DBpedia Resources |
dcterms | http://purl.org/dc/terms/ | DCMI Metadata Terms |
dctypes | http://purl.org/dc/dcmitype/ | DCMI Type Vocabulary |
foaf | http://xmlns.com/foaf/0.1/ | Friend of a Friend Vocabulary |
gnd | http://d-nb.info/gnd/ | Integrated Authority File (GND) |
schema | http://schema.org/ | Schema.org Vocabulary |
tib | http://av.tib.eu/resource/ | TIB AV-Portal Resources |
cnt | http://www.w3.org/2011/content# | Representing Content in RDF |
itsrdf | http://www.w3.org/2005/11/its/rdf# | Internationalization Tag Set (ITS) |
nif | http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# | NLP Interchange Format |
oa | http://www.w3.org/ns/oa# | Open Annotation Data Model |
rdf | http://www.w3.org/1999/02/22-rdf-syntax-ns# | Resource Description Framework |
Note: Don't forget, in Turtle syntax slashes are not allowed in the local part of a prefixed name and have to be escaped with '\'.
Example 1: Video Standard Metadata (datatype properties / literals):
tib:video\/16453 schema:name "Wall-crossing and geometry at infinity of Betti moduli spaces"@en ;
schema:description "Linear algebraic differential equation (in one variable) depending on a small ..."@en ;
schema:keywords "Betti moduli"@en , "chaos theory"@en, "singularity"@en ;
schema:date Created "1973-01-01T00:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#gYear> .
schema:duration 1:16:48 .
Example 2: Video Standard Metadata (object properties)
tib:video\/16453 rdf:type schema:Movie ;
schema:url <https://av.tib.eu/media/16453> ;
schema:producer gnd:4028361-6 ;
schema:publisher tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 ;
schema:license <http://creativecommons.org/licenses/by/3.0/deed.en> ;
schema:availability schema:OnlineOnly ;
bibframe:doi <http://dx.doi.org/10.5446/16453> ;
schema:thumbnailUrl <https://av.tib.eu/images/avpimg1fdaede78b338bba137140fd805cd382> .
tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 foaf:name “Institut des Hautes Études Scientifiques (IHÉS)” .
Note: As best as possible, we tried to map publishers, producers, creators, etc. to existing knowledge bases or authority files (e.g. GND). In some cases, a mapping could not be made by now or is simply impossible. In that cases the resource is represented through an IRI with ‘tib:’ prefix and its corresponding information, e.g. foaf:name. In further versions of the dumps, these IRIs are subject to be replaced by its correct common knowledge base or authority resources, if possible.
Example 3: OCR result
tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 dcterms:isPartOf tib:video\/16453 .
tib:ocr\/16453_42436_42436_x368y316h15w292 oa:hasTarget tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 ;
oa:hasBody tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 ;
oa:annotatedBy tib:annotator\/OCR-1.0.0 ;
rdf:type oa:Annotation .
tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 rdf:type nif:Context ;
rdf:type nif:RFC5147String ;
nif:isString “optimal” .
Example 4: VCD result
tib:video\/16453?t=smpte-25:0:01:02:07 dcterms:isPartOf tib:video\/16453 .
tib:vcd\/16453_1347007_1557 oa:hasTarget tib:video\/16453?t=smpte-25:0:01:02:07 ;
oa:hasBody tib:visualconcepts/Lecture ;
oa:annotatedBy tib:annotator\/VCD-1.0.0 ;
oa:motivatedBy oa:tagging ;
rdf:type oa:Annotation .
tib:visualconcepts\/Lecture rdf:type oa:SemanticTag .
Example 5: Named Entity Linking of OCR/ASR
tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 dcterms:isPartOf tib:video\/16453 .
tib:asr\/16453_13753838_7522 oa:hasTarget tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 ;
oa:annotatedBy tib:annotator\/ASR-1.0.0 ;
rdf:type oa:Annotation ;
oa:hasBody tib:asr\/16453_13753838_7522?char=0,5617 .
tib:asr\/16453_13753838_7522?char=0,5617 rdf:type nif:Context ;
rdf:type nif:RFC5147String .
tib:asr\/16453_13753838_7522?char=4743,4747 nif:referenceContext tib:asr\/16453_13753838_7522?char=0,5617 ;
itsrdf:taIdentRef gnd:4038613-2 ;
itsrdf:taAnnotatorsRef tib:annotator\/NEL-1.0.0 ;
rdf:type nif:Phrase ;
rdf:type nif:String ;
nif:beginIndex "4743" ;
nif:beginIndex "4747" ;
nif:anchorOf "sets" .
How to import the dumps to a triple store
The following table shows some popular RDF stores, which instantly can be used to import and work with the provided RDF dumps.
Virtuoso Opensource | https://vos.openlinksw.com/owiki/wiki/VOS/ |
Sesame | http://rdf4j.org/ |
Apache Jena TBD | https://jena.apache.org/documentation/tdb/ |
Blazegraph | https://www.blazegraph.com/ |
For a quick start, you can use Blazegraph like follows:
Download the blazegraph jar and follow the instructions to start Blazegraph from: https://github.com/blazegraph/database/wiki/Main_Page
Once you have started Blazegraph, you should be able to access it with your own web-browser at:
http://localhost:9999/blazegraph/
Now, download and unzip the TIB AV-Portal dump file from the tables above.
To import the TIB AV-Portal dump file into Blazegraph (cf. the screenshot below):
- switch to the “UPDATE” tab in Blazegraph
- enter the complete absolute URL of the locally downloaded and extracted dump file into the text input field.
- select Type: “File Path or URL” from the dropdown menu
- press the “Update” button below
The update should start now, indicated by “Running updates ...”. It will likely take some time (about 10 to 30 minutes, depending on your computer) to finish, indicated by a message such as “Modified: 10099269 Milliseconds: 1441798”.
How to query the data with SPARQL
In Blazegraph, switch to the “QUERY” tab and enter the following example queries.
Use the following prefixes with every query:
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX gnd: <http://d-nb.info/gnd/>
PREFIX schema: <http://schema.org/>
PREFIX tib: <http://av.tib.eu/resource/>
PREFIX itsrdf: <http://www.w3.org/2005/11/its/rdf#>
PREFIX nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>
PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Example 1: Show video with ID 16453 and all of its triples
SELECT *
WHERE {
tib:video\/15293 ?p ?o .
}
Example 2: Show all videos of publisher 'IWF (Göttingen)'
SELECT DISTINCT ?movie
WHERE {
?movie rdf:type schema:Movie .
?movie schema:publisher <http://av.tib.eu/resource/IWF_%28G%C3%B6ttingen%29> .
}
Example 3: Show all videos having the term ‘big data’ in their title
SELECT DISTINCT ?movie ?name
WHERE {
?movie rdf:type schema:Movie .
?movie schema:name ?name .
FILTER REGEX(STR(?name), 'big data', 'i') .
}
Example 4: How many videos are annotated with a visual concept?
SELECT (COUNT(DISTINCT ?video) AS ?count)
WHERE {
?annotation oa:annotatedBy tib:annotator\/VCD-1.0.0 .
?annotation oa:hasTarget ?videoFragment .
?annotation oa:hasBody ?concept .
?videoFragment dcterms:isPartOf ?video .
}
Example 5: Show videos which have GND entity ‘http://d-nb.info/gnd/4298379-4’ annotated
SELECT ?video
WHERE {
?phrase itsrdf:taIdentRef gnd:4298379-4 .
?phrase nif:referenceContext ?context .
?annotation oa:hasBody ?context .
?annotation oa:hasTarget ?videofragment .
?videofragment dcterms:isPartOf ?video .
}
Example 6: How many videos have ocr analysis results
SELECT (COUNT(DISTINCT ?video) AS ?count)
WHERE {
?annotation oa:annotatedBy tib:annotator\/OCR-1.0.0 .
?annotation oa:hasTarget ?videofragment .
?videofragment dcterms:isPartOf ?video .
}