GeoMesa: Scalable Geospatial Analytics

138 views

Formal Metadata

Title
GeoMesa: Scalable Geospatial Analytics
Title of Series
Number of Parts
14
Author
Fox, Anthony
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI
Publisher
LocationTech, Andrew Ross
Release Date
2014
Language
English
Production Year
2014
Production Place
Washington, DC

Content Metadata

Subject Area
Abstract
The proliferation of smart phones with embedded geolocation sensors has led to an explosion of geospatial data in all domains. Every mobile app now asks users to enable location services and generates copious geotagged data. Existing solutions for managing this data rely on traditional approaches using geospatial relational RDBMS platforms. GeoMesa is an open source scalable spatio-temporal index built on top of the Accumulo distributed column family database that provides efficient OGC standards based access and query capabilities of very large datasets. GeoMesa provides WMS or WFS services over HTTP for data access as well as an API based on Geotools. Spatial analytics in GeoMesa can leverage Hadoop to perform computations in parallel on a cloud. Sensitive personal information inherent in consumer geolocated data can be protected using Accumulo's cell level security. This talk will cover the indexing structure in GeoMesa and how it enables scalable geospatial analytics in a cloud platform.
Loading...
Uniform resource locator Computer animation Relational database Database Analytic set Computer scientist Event horizon
Satellite Standard deviation Complex (psychology) Multiplication sign Scientific modelling Port scanner Price index Mereology Formal language Image resolution Derivation (linguistics) Mathematics Data compression Single-precision floating-point format Query language Information Information security Physical system Social class Satellite Process (computing) Electric generator Smoothing Zeitliches Datenbanksystem Numbering scheme Real number Open source Interface (computing) Bit Functional (mathematics) Twitter Tablet computer Velocity Database Order (biology) MiniDisc Resultant Point (geometry) Read-only memory Three-dimensional space Numbering scheme Server (computing) Service (economics) Virtual machine Point cloud Streaming media Event horizon Number Planning Anwendungsschicht Database Audiovisualisierung Energy level Standard deviation Multiplication Key (cryptography) Computer Distribution (mathematics) Chemical equation Element (mathematics) Analytic set Cartesian coordinate system Scalability Table (information) Subject indexing Uniform resource locator Word Computer animation Visualization (computer graphics) Personal digital assistant Query language Key (cryptography) Family
Point (geometry) Ocean current Spacetime Concurrency (computer science) Distribution (mathematics) Distribution (mathematics) Curve Coprocessor Semantics (computer science) Power (physics) Web service Term (mathematics) Database Single-precision floating-point format Query language Distributed computing Data structure Extension (kinesiology) Subtraction Mobile Web Multiplication Spacetime Key (cryptography) Computer Relational database Server (computing) Structural load Element (mathematics) Interactive television Parallel port Analytic set Cartesian coordinate system Table (information) Tablet computer Computer animation Vector space Database Query language Data storage device Hausdorff dimension Order (biology) Family
Complex (psychology) Spacetime Multiplication Implementation Spacetime Multiplication sign Polygon Element (mathematics) Geometry Curve Tablet computer Image resolution Subject indexing Spring (hydrology) Computer animation Hash function Order (biology) Videoconferencing Key (cryptography) Spacetime
Area Group action Server (computing) Service (economics) Server (computing) Geometry Multiplication sign Scientific modelling Analytic set Prediction Event horizon Planning Web service Computer animation Lecture/Conference Query language Web service Query language Process (computing) Data type Resultant Spacetime World Wide Web Consortium
Spacetime Query language Server (computing) Interpolation Computer file Transformation (genetics) Multiplication sign Sparse matrix Magnetic stripe card Population density Military operation Operator (mathematics) Density matrix Reduction of order Process (computing) Subtraction Associative property FIESTA <Programm> Spacetime Computer Geometry Server (computing) Client (computing) Instance (computer science) Tablet computer Partition (number theory) Computer animation Personal digital assistant Density matrix Iteration Interpolation Data type Spacetime Associative property
Series (mathematics) Complex (psychology) Interpolation Information Query language Computer network Drawing Interactive television Resultant Data buffer
Query language Server (computing) Mapping State of matter Model theory Multiplication sign Disintegration Authentication Binary code Electronic mailing list Parallel port Axonometric projection Subset Planning Linker (computing) Subject indexing Authorization Statistics Information security Algebra Authentication Email Programming paradigm Link (knot theory) Information Suite (music) Server (computing) Geometry Electronic mailing list Uniform resource locator Computer animation Query language Website Authorization Code Information security
and 1 of my name and was the the Fox from the World Cup The of child Ginia called from computer research and we developed G amazed set which we of soft by the year ago and that connected with the location the team now amazed under the location for a minute or today about Geo may set what it is how it came to the was talk alone that about distributed databases which is what you may said and I'm dive into how we do all the diverse indexing to enable geospatial data in a non Relational database and event limited to a few Analytics at all shows how we leveraged you makes built on top the versatile would is that
you may say many many of you have had been by customers in dictated to move to a cloud or some of you have an actual justification and the need to move to a club that cases that it's a little bit of both we were directed to migrate in a letter that had intense computational quiet to a o'clock based systems and the tools that we had come to rely on the rich geospatial functionality that available in post yes was just not available to us on the back so we developed and not all of the geospatial capability that we needed to support a analytic and and quickly realised that it was usual on its on to lead the couple from the end with a and B of soft Sergio may say is a result that it is a distributed spatio-temporal database in particular built on the cumulo complicated which McGowan 2 more details about the more quickly its the goal of the amazing is to be a one time only dependency on Friday that it implements all the deal tools the standard you for the and and also exposes data in these distributed databases by standardized services like the see that we had weeks of by just a like in the point being that you should never have to import makes class into your application you should have a word the deal tools relevant tools interfaces and work with those directly and the geospatial computations part transparently executed on the left again location over sauce some visualization just seeing here all are associated with the G Delve data said he noticed the global database of events language tone that by the end of the Boston belief and its to 150 million deal could events since 19 79 about 100 gigs of uncompressed data and just to give you some for its numbers we can invest that it into the system on a small virtualized about approximately 15 minutes and in the end you Analytics against the music visualizations against that data said so
smooth justification about Highgate cloudbase deal why would you need as well you don't look very far to see high-velocity spatio-temporal data to a does not 100 200 thousand between 2 per cent in a small percentage of those are geotag but a small percentage of the large number of these large number was quick claims the do 1 million 2nd day everybody's deal located clicks change so mobsters advertises very interested in geologically changes to show you how you can actually correlate Shia located the streams Biennale if we developed satellite imagery pretty obvious vehicle and traffic senses essay Kotcheff accepted generate tons of high high-velocity data so you have a need for a distributed database because you have more data that can fit on a single machines disc or can be processed to within a single showed memory so the customer dictates that you use a column family database of Google published a paper in about 0 6 or 7 on a system called the table and quickly after that the number of derivative databases dotted across monks those derivative databases are H based its built directly on the big table model Cassandra and humility focuses on securities does high-resolution sell levels during which were starting to 14 year made its these Distributed databases particular these key values sauce have very flexible scheme asserts easy to get up and going with them but not scheme Olesya application voice imposes scheme on the day after priciest understand you didn't have to do something what this means is that the complexity of query pushed up into application less you have to make these interesting trade where you have potentially a multi tendency database and each application its dictating how their doing scans having using the sauces that you have to balance between flexible scheme and potentially conflicting data access pack of its horizontally Scalable this very nice feature of talk about how that's how we leverage that in particular a cumulo has the notion of tablatures tablatures at a single table is spread across many tablet servers and we can actually take advantage of the processing and bio and with of multiple tablets and he will handle failover balancing on rebalancing all the nitty-gritty Distributed complexity but he is another trade off which is that in those Jesse have very nice and sophisticated large reason quite trees and and your traditional spatial indexes which were incredibly well on relatively but the size datasets well when you go to 1 of these contently databases you don't have many indexes you have to design a tables such your table has said it in the only index that actually available to you is an implicit lexicographic order of the keys in your keep value storey so this is a a critical element that has to be thought to be a Catholic and pushing geospatial data because obviously geospatial data is multi-dimensional when at times European three-dimensional
so how do I go about leveraging should databases for the quality is that they have that are advantages will be new petitioning very easily petitioning almost comes 3 where with Relational databases you to do a lot of work and the other Asian to due petitioning with Distributed databases protesting that is effectively given to you and that you were able to distribute queries across mobile cheese these Concurrent queries by different applications for users and they can be spread over Load across the street we use striding to distribute computations within a single query single query weaken actually split up into a ball of the tablet service of 100 tablatures then you can potentially have a 100 processors that are executing on your queries this makes some of the very large datasets operational and interactive the trade of the making with that is that when you make when you straight to data across all the resources you would incur a cost when you have to to current praise from many users because although he reaches are being brought all the resources are being brought their on every single query 3 of flexible and elements of the the indexing structure that allows you to to the amount of parallel as American again per table she will has an extension point called service ideas which we leverage in until said to provide geospatial and seedy well and easy she well query semantics at the data rather than a secondary stands with an awesome bed custom Analytics on the show some example that inside the service ideas with a century becomes ad hoc an interactive MapReduce like computation last book the kind of
place not only have the power to summon doses really quickly at this point the keys to working with the family of the value of data storage but that have been implicit lexicographical order in his use a terms of space are a one dimensional data structure that allows you to protect multi dimensions into a Linear space so have a
nice video here the Muazam latest so the shows how the space killing her fills the space that were interested in the early obviously busy access is time bullying seemed that secondfront Bahrain there is that striding each cautioned of the geographic time space actually contributes to each tablet so the tablets so the data strike across all all the tablets in a structured way to cross the road but this
is a complex polygons and wine spring essentially the compose the piling on into multi resolution Geo hash's to hash is the implementation of based makers that were using the order and the new stole or each the compose hash as an element in your index space and
create planning amounts to computing the writes that are candidates for results of your queries by the time
the get too early the year Analytics that we developed with implemented the Mall as buppie services and their deployed in Geo servers the discoverable some the handling and the foot this 1 is a spatio-temporal Prediction analytic optix events in space and time and that 1 of the interesting things about the sale at it is that it's Santiago Chile and its the rivalry events in Santiago predicting Robinson as the hot deceiver just in the low area below the red below the 7th believe is a stadium and we recently transition from Linear model to open on the new model that able to detect the threat for Ribery's is higher around the stadium inside the data and the and all that implemented as service at interest in this see any type
of associate of computation can be cast as the file and this is native Madridistas method is implemented with in iterated so for instance density computations can be done very rapidly by Computing spas density Matrix for any kind of transformation major in within each tablet servers you have hundreds of workers that are brought to their on your computation and reduce side applies associate of operation which in the case of densities is just summation and but you can expressed many different types of computations this way which kind of
another interesting analytic we developed as a deadly Fiesta that implemented within the Riverside iterative is interpolated time space queries basically I would like to see who you might in reacted with on a trip that you made so it's time interpolation as well as space interpolation we like to think of it
as tweeting the New Jersey Turnpike so you have to a a person that is tweeting as the travelling on the New Jersey Turnpike and I would like know who a might be on the mend must with them before their might have stopped interactive with so you have to do a complex series of queries that should time through each Japanese interpolate attract based on the road network for some of the underlying where so the possible interactions it
that is the result of this and with it the executed by W and with gas filling and and snapped a road tracks we can get more information but quicker would
not about what you may sell were pushing for 1 0 release in June last year implementing full based authentication authorizations living integrate with the sellable security became ill we have audit method of acquiring Codington Relational Projections allowed a subset the data and rapidly return just what we need for each queries in the full were looking at integrating deeply which she a server and the should do ecosystem so W piazzas will be executed across a parade of computer paradigms like Staum and MapReduce and spa and top cashing can be pushed into which gives as for within a cumulo and were also looking at during data increase the 2 states and how those might improve equipment for quick couple links here the location dialogue website time list some information about you may is that we have a Jimmy's a dialogue website that has tutorials and other demonstrations and this uses mailing list and debt mailing list have taken the questions in the Commons of the income questions
Loading...
Feedback

Timings

  503 ms - page object

Version

AV-Portal 3.9.2 (c7d7a940c57b22d0bc6d7f70d6f13fde2ef2d4b8)