We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Big (enough) data and strategies for distributed geoprocessing

Formal Metadata

Title
Big (enough) data and strategies for distributed geoprocessing
Title of Series
Number of Parts
188
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Producer
Production Year2014
Production PlacePortland, Oregon, United States of America

Content Metadata

Subject Area
Genre
Abstract
Big data gets a lot of press these days, but even if you're not geocoding the Twitter firehose, "big enough" data can be a pain - whether you're crashing your database server or simply running out of RAM. Distributed geoprocessing can be even more painful, but for the right job it's a revelation!This session will explore strategies you can use to unlock the power of distributed geoprocessing for the "big enough" datasets that make your life difficult. Granted, geospatial data doesn't always fit cleanly into Hadoop's MapReduce framework. But with a bit of creativity - think in-memory joins, hyper-optimized data schemas, and offloading work to API services or PostGIS - you too can get Hadoop MapReduce working on your geospatial data!Real-world examples will be taken from work on GlobalForestWatch.org, a new platform for exploring and analyzing global data on deforestation. I'll be demoing key concepts using Cascalog, a Clojure wrapper for the Cascading Java library that makes Hadoop and Map/Reduce a lot more palatable. If you prefer Python or Scala, there are wrappers for you too.Hadoop is no silver bullet, but for the right geoprocessing job it's a powerful tool.
Keywords