RDF export of TIB AV-Portal metadata

The German National Library of Science and Technology (TIB) aims to promote the use and distribution of its collections. In this context, TIB publishes the authoritative and time-based, automatically generated metadata of videos of the TIB AV-Portal as Linked Open Data. Only metadata and thumbnails of videos which allow usage of their respective metadata and thumbnails under the Creative Commons License CC0 1.0 Universal are made available. Please note that the data was partially generated by an automatic process and may therefore contain errors or might be incomplete.

Datasets

Total

FilenameFormatSizeDate created:Version:
tib-av-portal-export-1.2.3.rdf.ziprdf/xml247M (unzipped 4441M)16.11.20181.2.3
tib-av-portal-export-1.2.3.nt.ziptext/n-triples243M (unzipped 3369M)16.11.20181.2.3
tib-av-portal-export-1.2.3.ttl.ziptext/turtle237M (unzipped 2267M)16.11.20181.2.3

TIB Subjects: Engineering as well as Architecture, Chemistry, Information Technology, Mathematics and Physics

These dumps are a subset of the total stock. They only contain the videos of the TIB subjects engineering as well as architecture, chemistry, information

FilenameFormatSizeDate created:Version:
tib-av-portal-export-tib-subjects-1.2.3.rdf.ziprdf/xml239M (unzipped 4333M)16.11.20181.2.3
tib-av-portal-export-tib-subjects-1.2.3.nt.ziptext/n-triples235M (unzipped 3289M)16.11.20181.2.3
tib-av-portal-export-tib-subjects-1.2.3.ttl.ziptext/turtle228M (unzipped 2205M)16.11.20181.2.3

Dumps of publisher IWF Wissen und Medien gGmbH i.L.

These dumps are a subset of the total stock. They only contain the videos of the publisher IWF Wissen und Medien gGmbH i.L..

FilenameFormatSizeDate created:Version:
tib-av-portal-export-iwf-1.2.3.rdf.ziprdf/xml11M (unzipped 167M)16.11.20181.2.3
tib-av-portal-export-iwf-1.2.3.nt.ziptext/n-triples11M (unzipped 125M)16.11.20181.2.3
tib-av-portal-export-iwf-1.2.3.ttl.ziptext/turtle11M (unzipped 90M)16.11.20181.2.3

Additional Data and Mappings

Mapping of TIB AV-Portal Subjects to DBpedia and GND

FilenameFormatSizeDate created:Version:
tib-av-portal-subjects-1.0.0.ttlapplication/turtle11kB18.03.20161.0.0

Mapping of TIB AV-Portal VCD Classes to DBpedia, Wikidata, and GND

FilenameFormatSizeDate created:Version:
tib-av-portal-classes_vcd-1.0.1.ttlapplication/turtle11kB26.06.20181.0.1
tib-av-portal-classes_vcd-1.0.1.n3application/turtle48kB26.06.20181.0.1

License

For the use of the metadata and provided thumbnails, the conditions of the Creative Commons License CC0 1.0 Universal (CC0 1.0) Public Domain Dedication shall apply.
(Click here to view summary and legally binding version of the license.)

Acknowledgement

When using the data of TIB, please link to the page [[https://av.tib.eu/opendata|/opendata]] in order to promote the use and distribution of this data.

Documentation of the Data Dumps

This documentation will give a brief overview on the structure of the dump data and shows how it can be imported in a RDF store and queried with SPARQL.

Structure of the data

This section will introduce the structure of the TIB AV-Portal RDF data.

The following table shows the RDF prefixes used in the dumps.

PrefixNamespaceVocabulary
bibframehttp://bibframe.org/vocab/Bibframe Vocabulary
dbphttp://dbpedia.org/resource/DBpedia Resources
dctermshttp://purl.org/dc/terms/DCMI Metadata Terms
dctypeshttp://purl.org/dc/dcmitype/DCMI Type Vocabulary
foafhttp://xmlns.com/foaf/0.1/Friend of a Friend Vocabulary
gndhttp://d-nb.info/gnd/Integrated Authority File (GND)
schemahttp://schema.org/Schema.org Vocabulary
tibhttp://av.tib.eu/resource/TIB AV-Portal Resources
cnthttp://www.w3.org/2011/content#Representing Content in RDF
itsrdfhttp://www.w3.org/2005/11/its/rdf#Internationalization Tag Set (ITS)
nifhttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#NLP Interchange Format
oahttp://www.w3.org/ns/oa#Open Annotation Data Model
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#Resource Description Framework

Note: Don't forget, in Turtle syntax slashes are not allowed in the local part of a prefixed name and have to be escaped with '\'.

Example 1: Video Standard Metadata (datatype properties / literals):
tib:video\/16453 schema:name           "Wall-crossing and geometry at infinity of Betti moduli spaces"@en ;
schema:description    "Linear algebraic differential equation (in one variable) depending on a small ..."@en ;
schema:keywords       "Betti moduli"@en ,  "chaos theory"@en,  "singularity"@en ;
schema:date Created   "1973-01-01T00:00:00+01:00"^^<http://www.w3.org/2001/XMLSchema#gYear> .
schema:duration       1:16:48 .
Example 2: Video Standard Metadata (object properties)
tib:video\/16453 rdf:type              schema:Movie ;
schema:url            <https://av.tib.eu/media/16453> ;
schema:producer       gnd:4028361-6 ;
schema:publisher      tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29 ;
schema:license        <http://creativecommons.org/licenses/by/3.0/deed.en> ;
schema:availability   schema:OnlineOnly ;
bibframe:doi          <http://dx.doi.org/10.5446/16453> ;
schema:thumbnailUrl   <https://av.tib.eu/images/avpimg1fdaede78b338bba137140fd805cd382> .

tib:Institut_des_Hautes__tudes_Scientifiques_%28IH_S%29  foaf:name  “Institut des Hautes Études Scientifiques (IHÉS)” .

Note: As best as possible, we tried to map publishers, producers, creators, etc. to existing knowledge bases or authority files (e.g. GND). In some cases, a mapping could not be made by now or is simply impossible. In that cases the resource is represented through an IRI with ‘tib:’ prefix and its corresponding information, e.g. foaf:name. In further versions of the dumps, these IRIs are subject to be replaced by its correct common knowledge base or authority resources, if possible.

Example 3: OCR result

Image: Example 4

tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 dcterms:isPartOf tib:video\/16453 .

tib:ocr\/16453_42436_42436_x368y316h15w292   oa:hasTarget    tib:video\/16453?t=smpte-25:0:28:17:11&xywh=368,316,292,15 ;
oa:hasBody      tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 ;
oa:annotatedBy  tib:annotator\/OCR-1.0.0 ;
rdf:type        oa:Annotation .

tib:ocr\/16453_42436_42436_x368y316h15w292?char=0,7 rdf:type nif:Context ;
rdf:type nif:RFC5147String ;
nif:isString “optimal” .
Example 4: VCD result

Image: Example 5

tib:video\/16453?t=smpte-25:0:01:02:07 dcterms:isPartOf tib:video\/16453 .

tib:vcd\/16453_1347007_1557  oa:hasTarget   tib:video\/16453?t=smpte-25:0:01:02:07 ;
oa:hasBody     tib:visualconcepts/Lecture ;
oa:annotatedBy tib:annotator\/VCD-1.0.0 ;
oa:motivatedBy oa:tagging ;
rdf:type       oa:Annotation .

tib:visualconcepts\/Lecture  rdf:type oa:SemanticTag .
Example 5: Named Entity Linking of OCR/ASR

Image: Example 6

tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 dcterms:isPartOf tib:video\/16453 .

tib:asr\/16453_13753838_7522 oa:hasTarget   tib:video\/16453?t=smpte-25:0:05:00:22,0:05:03:00 ;
oa:annotatedBy tib:annotator\/ASR-1.0.0 ;
rdf:type       oa:Annotation ;
oa:hasBody     tib:asr\/16453_13753838_7522?char=0,5617 .

tib:asr\/16453_13753838_7522?char=0,5617 rdf:type nif:Context ;
rdf:type nif:RFC5147String .

tib:asr\/16453_13753838_7522?char=4743,4747 nif:referenceContext tib:asr\/16453_13753838_7522?char=0,5617 ;
itsrdf:taIdentRef gnd:4038613-2 ;
itsrdf:taAnnotatorsRef tib:annotator\/NEL-1.0.0 ;
rdf:type nif:Phrase ;
rdf:type nif:String ;
nif:beginIndex "4743" ;
nif:beginIndex "4747" ;
nif:anchorOf "sets" .

How to import the dumps to a triple store

The following table shows some popular RDF stores, which instantly can be used to import and work with the provided RDF dumps.

Virtuoso Opensourcehttp://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/
Sesamehttp://rdf4j.org/
Apache Jena TBDhttps://jena.apache.org/documentation/tdb/
Blazegraphhttps://www.blazegraph.com/

For a quick start, you can use Blazegraph like follows:

Download the blazegraph jar and follow the instructions to start Blazegraph from: https://www.blazegraph.com/download/

Once you have started Blazegraph, you should be able to access it with your own web-browser at:
http://localhost:9999/blazegraph/

Now, download and unzip the TIB AV-Portal dump file from the tables above.

To import the TIB AV-Portal dump file into Blazegraph (cf. the screenshot below):

  • switch to the “UPDATE” tab in Blazegraph
  • enter the complete absolute URL of the locally downloaded and extracted dump file into the text input field.
  • select Type: “File Path or URL” from the dropdown menu
  • press the “Update” button below

The update should start now, indicated by “Running updates ...”. It will likely take some time (about 10 to 30 minutes, depending on your computer) to finish, indicated by a message such as “Modified: 10099269 Milliseconds: 1441798”.

Blazegraph Screenshot

How to query the data with SPARQL

In Blazegraph, switch to the “QUERY” tab and enter the following example queries.

Use the following prefixes with every query:

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX gnd: <http://d-nb.info/gnd/>
PREFIX schema: <http://schema.org/>
PREFIX tib: <http://av.tib.eu/resource/>
PREFIX itsrdf: <http://www.w3.org/2005/11/its/rdf#>
PREFIX nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>
PREFIX oa: <http://www.w3.org/ns/oa#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Example 1: Show video with ID 16453 and all of its triples
SELECT *
WHERE {
  tib:video\/15293 ?p ?o .
}
Example 2: Show all videos of publisher 'IWF (Göttingen)'
SELECT DISTINCT ?movie
WHERE {
  ?movie rdf:type schema:Movie .
  ?movie schema:publisher <http://av.tib.eu/resource/IWF_%28G%C3%B6ttingen%29> .
}
Example 3: Show all videos having the term ‘big data’ in their title
SELECT DISTINCT ?movie ?name
WHERE {
  ?movie rdf:type schema:Movie .
  ?movie schema:name ?name .
  FILTER REGEX(STR(?name), 'big data', 'i') .
}
Example 4: How many videos are annotated with a visual concept?
SELECT (COUNT(DISTINCT ?video) AS ?count)
WHERE {
  ?annotation oa:annotatedBy tib:annotator\/VCD-1.0.0 .
  ?annotation oa:hasTarget ?videoFragment .
  ?annotation oa:hasBody ?concept .
  ?videoFragment dcterms:isPartOf ?video .
}
Example 5: Show videos which have GND entity ‘http://d-nb.info/gnd/4298379-4’ annotated
SELECT ?video
WHERE {
  ?phrase itsrdf:taIdentRef gnd:4298379-4 .
  ?phrase nif:referenceContext ?context .
  ?annotation oa:hasBody ?context .
  ?annotation oa:hasTarget ?videofragment .
  ?videofragment dcterms:isPartOf ?video .
}
Example 6: How many videos have ocr analysis results
SELECT (COUNT(DISTINCT ?video) AS ?count)
WHERE {
  ?annotation oa:annotatedBy tib:annotator\/OCR-1.0.0 .
  ?annotation oa:hasTarget ?videofragment .
  ?videofragment dcterms:isPartOf ?video .
}

URI-Dereferencing

The TIB AV-Portal supports the dereferencing of the AV-Portal internal URIs in the RDF export.

Two methods are offered: (1) HTTP accept header and (2) file extensions.

1 Dereferencing via HTTP accept header

The following content types can be directly requested via HTTP accept header:

application/ld+json-> json-ld
application/n-triples-> nt
application/rdf+json-> rdf-json
application/rdf+xml-> xml
application/turtle-> ttl
application/x-turtle-> ttl
text/n3-> ttl
text/plain-> nt
text/rdf+n3-> ttl
text/turtle-> ttl

For example, the RDF data for the resource 'http://av.tib.eu/resource/video/12284' can be retrieved with 'curl' via:

curl -k -L -H "Accept: application/rdf+xml" 'http://av.tib.eu/resource/video/12284'

The parameter '-k' allows ignoring the SSL certificates.

The parameter '-L' allows tracking of redirects (e.g. HTTP-> HTTPS).

2 Dereferencing via file extension

The following file extensions can also be used to retrieve RDF data. This is where 302 content negotiation is performed.

.json-> json-ld
.n3-> nt
.nt-> nt
.rdf-> xml
.ttl-> ttl
.xml-> xml
.ntriples-> nt

e.g. via:

curl -k -L 'http://av.tib.eu/resource/video/12284.n3'

Dereferencing via file extension does not work with URIs containing the '?' character. Instead, only the first method can be used.

The parameter '-k' allows ignoring the SSL certificates.

The parameter '-L' allows tracking of redirects (e.g. HTTP-> HTTPS).

Feedback
AV-Portal 3.5.0 (cb7a58240982536f976b3fae0db2d7d34ae7e46b)

Timings

    2 ms - page object