GeoMesa: Scalable Geospatial Analytics

Video in TIB AV-Portal: GeoMesa: Scalable Geospatial Analytics

Formal Metadata

GeoMesa: Scalable Geospatial Analytics
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year
Production Place
Washington, DC

Content Metadata

Subject Area
The proliferation of smart phones with embedded geolocation sensors has led to an explosion of geospatial data in all domains. Every mobile app now asks users to enable location services and generates copious geotagged data. Existing solutions for managing this data rely on traditional approaches using geospatial relational RDBMS platforms. GeoMesa is an open source scalable spatio-temporal index built on top of the Accumulo distributed column family database that provides efficient OGC standards based access and query capabilities of very large datasets. GeoMesa provides WMS or WFS services over HTTP for data access as well as an API based on Geotools. Spatial analytics in GeoMesa can leverage Hadoop to perform computations in parallel on a cloud. Sensitive personal information inherent in consumer geolocated data can be protected using Accumulo's cell level security. This talk will cover the indexing structure in GeoMesa and how it enables scalable geospatial analytics in a cloud platform.
Uniform resource locator Relational database Analytic set Computer scientist Database Event horizon
Satellite Standard deviation Complex (psychology) Distribution (mathematics) Multiplication sign Numbering scheme Port scanner Price index Database Mereology Formal language Derivation (linguistics) Mathematics Semiconductor memory Single-precision floating-point format Query language Information Endliche Modelltheorie Data compression Information security Physical system Social class Satellite Electric generator Smoothing Zeitliches Datenbanksystem Real number Open source Bit Numbering scheme Twitter Tablet computer Process (computing) Velocity Order (biology) MiniDisc Resultant Point (geometry) Three-dimensional space Functional (mathematics) Server (computing) Service (economics) Image resolution Virtual machine Streaming media Computer Event horizon Number Element (mathematics) Planning Anwendungsschicht Audiovisualisierung Energy level Standard deviation Multiplication Key (cryptography) Interface (computing) Chemical equation Analytic set Database Cartesian coordinate system Scalability Subject indexing Uniform resource locator Word Visualization (computer graphics) Personal digital assistant Query language Point cloud Key (cryptography) Table (information) Family
Point (geometry) Ocean current Concurrency (computer science) Distribution (mathematics) Curve Database Semantics (computer science) Coprocessor Computer Dimensional analysis Element (mathematics) Power (physics) Web service Different (Kate Ryan album) Term (mathematics) Single-precision floating-point format Query language Distributed computing Spacetime Data structure Extension (kinesiology) Mobile Web Distribution (mathematics) Multiplication Key (cryptography) Relational database Server (computing) Structural load Interactive television Data storage device Parallel port Analytic set Database Cartesian coordinate system Tablet computer Vector space Query language Order (biology) Table (information) Family Spacetime
Complex (psychology) Multiplication Implementation Multiplication sign Image resolution Polygon Curve Element (mathematics) Tablet computer Subject indexing Spring (hydrology) Hash function Order (biology) Videoconferencing Spacetime Key (cryptography) Spacetime Geometry
Area Predictability Group action Server (computing) Service (economics) Server (computing) Multiplication sign Analytic set Event horizon Planning Type theory Geometry Web service Query language Web service Query language Process (computing) Endliche Modelltheorie Resultant Spacetime World Wide Web Consortium
Server (computing) Interpolation Computer file Transformation (genetics) Multiplication sign Sparse matrix Computer Magnetic stripe card Geometry Population density Different (Kate Ryan album) Military operation Operator (mathematics) Density matrix Query language Reduction of order Spacetime Process (computing) Associative property FIESTA <Programm> Server (computing) Client (computing) Instance (computer science) Tablet computer Partition (number theory) Type theory Personal digital assistant Density matrix Iteration Interpolation Spacetime Associative property
Complex (psychology) Interpolation Software Information Query language Interactive television Series (mathematics) Resultant Data buffer
Server (computing) Mapping Link (knot theory) State of matter Codierung <Programmierung> Multiplication sign Disintegration Authentication Electronic mailing list Parallel port Axonometric projection Subset Planning Geometry Subject indexing Query language Authorization Information security Algebra Authentication Programming paradigm Email Link (knot theory) Information Suite (music) Server (computing) Electronic mailing list Binary file Statistics Uniform resource locator Query language Website Endliche Modelltheorie Authorization Information security
and 1 of my name and was the the Fox from the World Cup The of child Ginia called from computer research and we developed G amazed set which we of soft by the year ago and that connected with the location the team now amazed under the location for a minute or today about Geo may set what it is how it came to the was talk alone that about distributed databases which is what you may said and I'm dive into how we do all the diverse indexing to enable geospatial data in a non Relational database and event limited to a few Analytics at all shows how we leveraged you makes built on top the versatile would is that
you may say many many of you have had been by customers in dictated to move to a cloud or some of you have an actual justification and the need to move to a club that cases that it's a little bit of both we were directed to migrate in a letter that had intense computational quiet to a o'clock based systems and the tools that we had come to rely on the rich geospatial functionality that available in post yes was just not available to us on the back so we developed and not all of the geospatial capability that we needed to support a analytic and and quickly realised that it was usual on its on to lead the couple from the end with a and B of soft Sergio may say is a result that it is a distributed spatio-temporal database in particular built on the cumulo complicated which McGowan 2 more details about the more quickly its the goal of the amazing is to be a one time only dependency on Friday that it implements all the deal tools the standard you for the and and also exposes data in these distributed databases by standardized services like the see that we had weeks of by just a like in the point being that you should never have to import makes class into your application you should have a word the deal tools relevant tools interfaces and work with those directly and the geospatial computations part transparently executed on the left again location over sauce some visualization just seeing here all are associated with the G Delve data said he noticed the global database of events language tone that by the end of the Boston belief and its to 150 million deal could events since 19 79 about 100 gigs of uncompressed data and just to give you some for its numbers we can invest that it into the system on a small virtualized about approximately 15 minutes and in the end you Analytics against the music visualizations against that data said so
smooth justification about Highgate cloudbase deal why would you need as well you don't look very far to see high-velocity spatio-temporal data to a does not 100 200 thousand between 2 per cent in a small percentage of those are geotag but a small percentage of the large number of these large number was quick claims the do 1 million 2nd day everybody's deal located clicks change so mobsters advertises very interested in geologically changes to show you how you can actually correlate Shia located the streams Biennale if we developed satellite imagery pretty obvious vehicle and traffic senses essay Kotcheff accepted generate tons of high high-velocity data so you have a need for a distributed database because you have more data that can fit on a single machines disc or can be processed to within a single showed memory so the customer dictates that you use a column family database of Google published a paper in about 0 6 or 7 on a system called the table and quickly after that the number of derivative databases dotted across monks those derivative databases are H based its built directly on the big table model Cassandra and humility focuses on securities does high-resolution sell levels during which were starting to 14 year made its these Distributed databases particular these key values sauce have very flexible scheme asserts easy to get up and going with them but not scheme Olesya application voice imposes scheme on the day after priciest understand you didn't have to do something what this means is that the complexity of query pushed up into application less you have to make these interesting trade where you have potentially a multi tendency database and each application its dictating how their doing scans having using the sauces that you have to balance between flexible scheme and potentially conflicting data access pack of its horizontally Scalable this very nice feature of talk about how that's how we leverage that in particular a cumulo has the notion of tablatures tablatures at a single table is spread across many tablet servers and we can actually take advantage of the processing and bio and with of multiple tablets and he will handle failover balancing on rebalancing all the nitty-gritty Distributed complexity but he is another trade off which is that in those Jesse have very nice and sophisticated large reason quite trees and and your traditional spatial indexes which were incredibly well on relatively but the size datasets well when you go to 1 of these contently databases you don't have many indexes you have to design a tables such your table has said it in the only index that actually available to you is an implicit lexicographic order of the keys in your keep value storey so this is a a critical element that has to be thought to be a Catholic and pushing geospatial data because obviously geospatial data is multi-dimensional when at times European three-dimensional
so how do I go about leveraging should databases for the quality is that they have that are advantages will be new petitioning very easily petitioning almost comes 3 where with Relational databases you to do a lot of work and the other Asian to due petitioning with Distributed databases protesting that is effectively given to you and that you were able to distribute queries across mobile cheese these Concurrent queries by different applications for users and they can be spread over Load across the street we use striding to distribute computations within a single query single query weaken actually split up into a ball of the tablet service of 100 tablatures then you can potentially have a 100 processors that are executing on your queries this makes some of the very large datasets operational and interactive the trade of the making with that is that when you make when you straight to data across all the resources you would incur a cost when you have to to current praise from many users because although he reaches are being brought all the resources are being brought their on every single query 3 of flexible and elements of the the indexing structure that allows you to to the amount of parallel as American again per table she will has an extension point called service ideas which we leverage in until said to provide geospatial and seedy well and easy she well query semantics at the data rather than a secondary stands with an awesome bed custom Analytics on the show some example that inside the service ideas with a century becomes ad hoc an interactive MapReduce like computation last book the kind of
place not only have the power to summon doses really quickly at this point the keys to working with the family of the value of data storage but that have been implicit lexicographical order in his use a terms of space are a one dimensional data structure that allows you to protect multi dimensions into a Linear space so have a
nice video here the Muazam latest so the shows how the space killing her fills the space that were interested in the early obviously busy access is time bullying seemed that secondfront Bahrain there is that striding each cautioned of the geographic time space actually contributes to each tablet so the tablets so the data strike across all all the tablets in a structured way to cross the road but this
is a complex polygons and wine spring essentially the compose the piling on into multi resolution Geo hash's to hash is the implementation of based makers that were using the order and the new stole or each the compose hash as an element in your index space and
create planning amounts to computing the writes that are candidates for results of your queries by the time
the get too early the year Analytics that we developed with implemented the Mall as buppie services and their deployed in Geo servers the discoverable some the handling and the foot this 1 is a spatio-temporal Prediction analytic optix events in space and time and that 1 of the interesting things about the sale at it is that it's Santiago Chile and its the rivalry events in Santiago predicting Robinson as the hot deceiver just in the low area below the red below the 7th believe is a stadium and we recently transition from Linear model to open on the new model that able to detect the threat for Ribery's is higher around the stadium inside the data and the and all that implemented as service at interest in this see any type
of associate of computation can be cast as the file and this is native Madridistas method is implemented with in iterated so for instance density computations can be done very rapidly by Computing spas density Matrix for any kind of transformation major in within each tablet servers you have hundreds of workers that are brought to their on your computation and reduce side applies associate of operation which in the case of densities is just summation and but you can expressed many different types of computations this way which kind of
another interesting analytic we developed as a deadly Fiesta that implemented within the Riverside iterative is interpolated time space queries basically I would like to see who you might in reacted with on a trip that you made so it's time interpolation as well as space interpolation we like to think of it
as tweeting the New Jersey Turnpike so you have to a a person that is tweeting as the travelling on the New Jersey Turnpike and I would like know who a might be on the mend must with them before their might have stopped interactive with so you have to do a complex series of queries that should time through each Japanese interpolate attract based on the road network for some of the underlying where so the possible interactions it
that is the result of this and with it the executed by W and with gas filling and and snapped a road tracks we can get more information but quicker would
not about what you may sell were pushing for 1 0 release in June last year implementing full based authentication authorizations living integrate with the sellable security became ill we have audit method of acquiring Codington Relational Projections allowed a subset the data and rapidly return just what we need for each queries in the full were looking at integrating deeply which she a server and the should do ecosystem so W piazzas will be executed across a parade of computer paradigms like Staum and MapReduce and spa and top cashing can be pushed into which gives as for within a cumulo and were also looking at during data increase the 2 states and how those might improve equipment for quick couple links here the location dialogue website time list some information about you may is that we have a Jimmy's a dialogue website that has tutorials and other demonstrations and this uses mailing list and debt mailing list have taken the questions in the Commons of the income questions


  334 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)