We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Searching through large graphs using Elasticsearch

Formal Metadata

Title
Searching through large graphs using Elasticsearch
Title of Series
Number of Parts
56
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The National Audiovisual Institute (INA) is a repository of all French audiovisual archives, being responsible for archiving over 180 radio and television services, 24/7, since 1995. The generated metadata describing this content currently represents the equivalent of over 50 million documents (e.g.: images, audio and video fragments, text excerpts, etc.). Due to the heterogeneity of the content, the data model is directly inspired from the conceptual models of cultural heritage, represented by a large graph with complex relations between generic entities. The challenge for building a global search engine for this particular use case is twofold: on one hand, the capacity to index and maintain the entire set of documents updated in a reasonable amount of time, and on the other hand the implementation of complex full text search capabilities with high performance. Our talk describes the key choices for the graph representation, facilitating the indexing process of the documents, as well as the technical framework set up around Elasticsearch, implementing dedicated search APIs required by different functional areas. We also briefly mention the implementation optimisations that lead to a full process of 50 million documents in less than 48 hours, for an equivalent of 800GB Elasticsearch index.