Big (enough) data and strategies for distributed geoprocessing

Cite

Related Material

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Kraft, Robin

Formal Metadata

Title

Big (enough) data and strategies for distributed geoprocessing

Title of Series

FOSS4G 2014 Portland

Number of Parts

188

Author

Kraft, Robin

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/31665 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2014

Language

English

Producer

Foss4G

Open Source Geospatial Foundation (OSGeo)

Production Year

2014

Production Place

Portland, Oregon, United States of America

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Big data gets a lot of press these days, but even if you're not geocoding the Twitter firehose, "big enough" data can be a pain - whether you're crashing your database server or simply running out of RAM. Distributed geoprocessing can be even more painful, but for the right job it's a revelation!This session will explore strategies you can use to unlock the power of distributed geoprocessing for the "big enough" datasets that make your life difficult. Granted, geospatial data doesn't always fit cleanly into Hadoop's MapReduce framework. But with a bit of creativity - think in-memory joins, hyper-optimized data schemas, and offloading work to API services or PostGIS - you too can get Hadoop MapReduce working on your geospatial data!Real-world examples will be taken from work on GlobalForestWatch.org, a new platform for exploring and analyzing global data on deforestation. I'll be demoing key concepts using Cascalog, a Clojure wrapper for the Cascading Java library that makes Hadoop and Map/Reduce a lot more palatable. If you prefer Python or Scala, there are wrappers for you too.Hadoop is no silver bullet, but for the right geoprocessing job it's a powerful tool.

Keywords