Don't Copy Data! Instead, Share it at Web-Scale

Cite

Related Material

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Korver, Mark

Formal Metadata

Title

Don't Copy Data! Instead, Share it at Web-Scale

Title of Series

FOSS4G 2014 Portland

Number of Parts

188

Author

Korver, Mark

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/31664 (DOI)

Publisher

FOSS4G

Open Source Geospatial Foundation (OSGeo)

Release Date

2014

Language

English

Producer

Foss4G

Open Source Geospatial Foundation (OSGeo)

Production Year

2014

Production Place

Portland, Oregon, United States of America

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Since its start in 2006, Amazon Web Services has grown to over 40 different services. S3, our object store, one of our first services, is now home to trillions of objects and regularly peaks at 1.5 million requests/second. S3 is used to store many data types, including map tiles, genome data, video, and database backups. This presentation's primary goal is to illustrate best practice around open data sets on AWS. To do so, it showcases a simple map tiling architecture, built using just a few of those services, CloudFront (CDN), S3 (object Store), and Elastic Beanstalk (Application Management) in combination with FOSS tools, Leaflet, Mapserver/GDAL and Yas3fs. My demo will use USDA's NAIP dataset (48TB), plus other higher resolution data at the city level, and show how you can deliver images derived from over 219,000 GeoTIFFs to both TMS and OGC WMS clients for the 48 States, without pre-caching tiles while keeping your server environment appropriately sized via auto-scaling. Because the NAIP data sits in a requester-pays bucket that allows authenticated read access, anyone with an AWS account has immediate access to the source GeoTIFFs, and can copy the data in bulk to anywhere they desire. However, I will show that the pay-for-use model of the cloud, allows for open-data architectures that are not possible with on-prem environments, and that for certain kinds of data, especially BIG data, rather than move the data, it makes more sense to use it in-situ in an environment that can support demanding SLAs.

Keywords

big data