The rapid growth of traditional and social media, sensors, and other key web technologies has led to an equally rapid increase in the collection of spatio-temporal data. Horizontally scalable solutions provide a technically feasible and affordable solution to this problem, allowing organizations to incrementally scale their hardware in tandem with data increases.GeoMesa is an open-source distributed, spatio-temporal database built on the Accumulo column-family store. Leveraging a novel spatio-temporal indexing scheme, GeoMesa enables efficient (E)CQL queries by parallelizing execution across a distributed cloud of compute and storage resources, while adhering to Accumulo's fine-grained security policies. GeoMesa integrates with Geotools to expose the distributed capabilities in a familiar API. Geoserver plugins also enable integration via OGC standard services to a much wider range of technologies and languages, such as Leaflet, Python, UDig, and QuantumGIS. In this presentation, Anthony Fox will discuss the design of spatio-temporal indexes in distributed "NoSQL" databases, the performance characteristics and tradeoffs of the GeoMesa index, and how it can be leveraged to scale compute-intensive spatial operations across very large data sources. This discussion will detail how GeoMesa distributes data uniformly across the cloud nodes to ensure maximum parallelization of queries, and other computations. Specific computationally intensive analytics include distributed heat map generation over time, nearest neighbor queries, and spatio-temporal event prediction. He will present common analytic workflows against spatial data expressed as batch map-reduce jobs, dynamic ECQL queries, and real-time Storm topologies. Using the Global Database of Events, Language, and Tone (GDELT) dataset as a working example source, Mr. Fox will demonstrate how a completely open-source architecture stack, including GeoMesa, enables ad-hoc and real-time analytics.This presentation will be of interest to data scientists, geospatial systems developers, DevOps engineers, and users of massive Spatio-Temporal datasets. |