Big data analysis with Tile Reduce and Turf.js

Video thumbnail (Frame 0) Video thumbnail (Frame 584) Video thumbnail (Frame 1217) Video thumbnail (Frame 2154) Video thumbnail (Frame 3067) Video thumbnail (Frame 4595) Video thumbnail (Frame 5194) Video thumbnail (Frame 6480) Video thumbnail (Frame 7012) Video thumbnail (Frame 7646) Video thumbnail (Frame 8256) Video thumbnail (Frame 8851) Video thumbnail (Frame 9322) Video thumbnail (Frame 10469) Video thumbnail (Frame 10918) Video thumbnail (Frame 11902) Video thumbnail (Frame 13149) Video thumbnail (Frame 13799) Video thumbnail (Frame 16093) Video thumbnail (Frame 17346) Video thumbnail (Frame 27529) Video thumbnail (Frame 28192) Video thumbnail (Frame 29208) Video thumbnail (Frame 30545) Video thumbnail (Frame 31212) Video thumbnail (Frame 35061)
Video in TIB AV-Portal: Big data analysis with Tile Reduce and Turf.js

Formal Metadata

Big data analysis with Tile Reduce and Turf.js
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Year
Production Place
Seoul, South Korea

Content Metadata

Subject Area
Tile Reduce is a new open source map reduce frame work for analyzing massive geo data. Tile reduce is a tile analysis framework built on the javascript GIS library Turf.js. It runs on your local computer or in the AWS cloud and scales to run thousands of processors in parallel. At Mapbox we use Tile Reduce to detect issues in global street vector data like OpenStreetMap, data comparison and data conflation. This talk will walk through the architecture of Tile Reduce, highlight advantages, limitations and future developments.
Arithmetic mean Computer animation Mapping Open source Mathematical analysis Cuboid Data analysis
Word Scaling (geometry) Computer animation Mapping Open source Tower Projective plane Modul <Datentyp> Mathematical analysis Software framework Library (computing)
Default (computer science) Functional (mathematics) Statistics Standard deviation Group action Graph (mathematics) Mapping Open set Replication (computing) Number Web 2.0 Computer animation output
Point (geometry) Scripting language Suite (music) Mapping Java applet Software developer Projective plane Mathematical analysis Bit Open set Web browser Coprocessor Portable communications device Computer animation Term (mathematics) Table (information) Resultant
Laptop Point (geometry) Slide rule Server (computing) Scaling (geometry) Real number Web browser Cartesian coordinate system Web browser Neuroinformatik Computer animation Interpreter (computing) Point cloud Laptop Geometry Point cloud
Point (geometry) Slide rule Service (economics) Open source Mapping Computer file Neighbourhood (graph theory) Projective plane Real-time operating system System call Graph coloring Number Computer animation Term (mathematics) Electronic visual display Modul <Datentyp> Communications protocol Information security Geometry Library (computing)
Point (geometry) Computer animation Demo (music) Buffer solution Polygon Execution unit Design by contract Object (grammar) Line (geometry) Extension (kinesiology) System call Data buffer
Computer animation Water vapor Real-time operating system Bit Smoothing Instance (computer science) Web browser
Point (geometry) Computer animation Image resolution Image resolution Active contour model Line (geometry) Sequence Inflection point
Area Point (geometry) Dot product Functional (mathematics) Statistics Information Control flow Set (mathematics) Sound effect Maxima and minima Line (geometry) Web browser Number Computer animation Average Operator (mathematics) Quicksort Geometry
Surface Polygon Standard deviation Slide rule Functional (mathematics) Open source Projective plane Median Variance Web 2.0 Sample (statistics) Computer animation Envelope (mathematics) Quantile Convex set Square number Data buffer
Area Multiplication Implementation Code Software developer Surface Multiplication sign Software bug Computer animation Process (computing) Automation Computing platform Geometry
Point (geometry) Graphics tablet Implementation Computer file Image resolution Mathematical analysis Line (geometry) Web browser Computer font Power (physics) Graphical user interface Computer animation Table (information) Row (database)
Area Laptop Computer animation Surface Polygon Mathematical analysis Virtual machine Total S.A. Bit 2 (number) Power (physics)
Laptop Server (computing) Service (economics) Open source Sequel Multiplication sign Set (mathematics) Client (computing) Web browser Function (mathematics) Binary file Neuroinformatik Number Revision control Medical imaging Ontology Cuboid Data compression Exception handling Task (computing) Mapping File format Moment (mathematics) Parallel port Database Tessellation Subject indexing Process (computing) Computer animation Vector space Software Personal digital assistant Tower Phase transition Website Reading (process) Geometry Spacetime
Wechselseitige Information Building Code Length Zoom lens 1 (number) Price index Parameter (computer programming) Function (mathematics) Total S.A. Computer programming Computer configuration Ontology Cuboid Electronic visual display Physical law Series (mathematics) Hill differential equation Touchscreen Mapping Interior (topology) Range (statistics) Variable (mathematics) Regulärer Ausdruck <Textverarbeitung> Entire function Electronic signature Tessellation Category of being Process (computing) Befehlsprozessor Tower Phase transition Configuration space Hill differential equation Right angle Resultant Geometry Point (geometry) Trail Empennage Functional (mathematics) Quantum state Open source Computer file Real number Virtual machine Maxima and minima Division (mathematics) Web browser Distance Event horizon Number Power (physics) 2 (number) Revision control Wave Inclusion map Goodness of fit Term (mathematics) Natural number String (computer science) Operator (mathematics) Reduction of order Energy level Software testing output Task (computing) Pairwise comparison Information Mathematical analysis Counting Total S.A. Basis <Mathematik> Line (geometry) System call Template (C++) Subject indexing Uniform resource locator Computer animation Grand Unified Theory Synchronization Video game Object (grammar) Library (computing)
Computer animation Virtual machine Mathematical analysis Point cloud
Area Functional (mathematics) Mapping Fitness function Mathematical analysis Basis <Mathematik> Water vapor Branch (computer science) Instance (computer science) Line (geometry) Cartesian coordinate system Mereology Information privacy Graph coloring Word Computer animation Visualization (computer graphics) Different (Kate Ryan album) Computer configuration Tower Right angle Geometry
Slide rule Word Computer animation Open source Authorization
State observer Presentation of a group Open source Execution unit Density of states Virtual machine Mereology Number Neuroinformatik 2 (number) Lecture/Conference Term (mathematics) Reduction of order Cuboid Covering space MIDI Scaling (geometry) Mapping Projective plane Sampling (statistics) Mathematical analysis Cluster analysis Instance (computer science) Process (computing) Computer animation Personal digital assistant Order (biology) Quicksort Object (grammar) Geometry
wp since the by means going everybody
I have this offer my apologies on behalf of my colleague alex he was able to make the trip untimely and yeah there are 2 tools that I'm hoping to talk to you about today that map boxes be investing in for large-scale geospatial analysis and which I think could be useful to your own workflow and they are
to and reduced and they complement 1 another to is a modular library for geospatial analysis and how would use is a framework for performing geospatial analysis using tour for otherwise at very large scale and so I'm going jot dive into 2 a 1st talking through what it can do and I then do applied examples tower used to show how you can actually put all this stuff together but so too 1st
of all I should say is not exclusively map OX project you can find a turf yes that word i is an open source project that predates map OX involved in it but we have been investing very heavily in the project and 2
as you might imagine is designed to manipulate map data in this way it's quite similar to existing GIS technologies that you might already be using up I imagine many of you have workflows that involve had like PostGIS war or just a graph statistic you just got to of replicates many the functions those packages against offer to substantial advantages over whatever works for you might have 1st it
speaks to your Jason natively both for input and output this is a default assumption for everything the turf does we think that 2 adjacent is increasingly the lingua franca for open geo-data and I to a freely writes this assumption is that that dude Jason I believe this going to be instantiated as a working group is official Web standard very shortly on so this is this is a good bet to make but also provides a number of helper functions to transform the data into the kind of G adjacent data-structures expects but coming in and out on the bench this tho is that it means
that results your analysis using turf can be displayed absolutely everywhere not only in technologies that have map what's in their name by in I had to proprietary solutions psychologists or open third-party projects like you just the other major advantage that took brings
the table is that it is written in modern JavaScript and their existing geospatial analysis suites written in Java script they are often ports though from technologies that were not written with modern JavaScript development in mind but so too is happy to play with technologies like browser for AI with no JS out with whatever you might be using and this I say with with some limited my voice is really the future I'm most comfortable writing pipeline as I imagine some people in this room are other JavaScript offers substantial advantages both in terms of eking out every little last bit of performance from your processor and in terms of portability that 2nd point is the 1 the what I want to emphasize right now JavaScript's thanks to
the engine runs absolutely everywhere so obviously it can perform large scale computation on the server side in the cloud but you know of course they can run in browsers I can run comfortably on your laptop will be doing a demo of that later i and at this point the JavaScript interpreters on your mobile phone also quite adapting capable of crunching real numbers for geo applications In fact of his running by
John in the slides so this should be much much
larger and I apologize for not being this is an active slippy map showing that 2 of analysis of through 1 calls to our synthesis go over the past week if you don't have 3 1 1 in your city it's a service that a number of cities around the world are adopting where you can report the need for a city service like trash pick up were removal of graffiti things like that on and there's an open protocol called open through online that a lot of cities but these requests into and publish its nice example data secure the source points TuRF-E is been these are in real time into the neighborhoods of sensors goes what I think that geometry is and they're adjusting style and shall say apologies for the color of the slides i'm not sure what's going on display the it
so am so this is the project of around flexibility that not only in terms of where you can run these kinds of analyses but how the project is structured and how it's administered it is an open source project took is completely modular you can use as much or as little of the libraries you'd like I without including a gigantic IJS file on
a few of its more specific features I and these will be just pretty familiar to people who do this kind of work on buffering of course
this is how you would invoke a buffer call for a tour this is a contracting or extending the extent of a spatial object a point a line or polygon by set amount I would try to make it easy with calls up with units that the human readable and this is a live demo that
again is a bit too small C accurately but this is a race riot instances go for of popular foot race and a dataset of water funds within it I can see I can adjust the buffering up very quickly and find the intersection others there's no there's no crunching here but it's it's just happening in real time in the browser smoothing another option through
turf by taking the just will tolerance Bezier of a line like us to do
simplification I using the Quaker
contours if you've got a grid of points with the sequence to a full happily calculate contour lines for you
by using the ISO lines method with the final resolution as you can see ranging over
breaks my here is a data set of census population in New York City of those yellow dots represent what should the yellow dots represent the size of population to area and you can see the ISO lines have been calculated in the browser from that information as a scroll around
and finally aggregation of just as capable of doing this sort of statistical analysis based on geometry as you might wanna perform you saw some of that in the 3 1 1 Example already here's an example I again using a lot of fun dataset that we have had to effects lets us generate an arbitrary had screwed should be listening here yeah it's really hard to see what this purple and sorry are but I can then intersect these points against their grid and calculate the number of water fountain in each 1 of them instantaneously that obviously would also before operations like taking an average of the number in each grid or the maximum minimum although the basic aggregation functions you expect from the sweet like was just the on a course that's just the tip of
the iceberg this is the current functionalist as of the composition of the slide deck to be perfectly honest I I don't know the date of I but this is expanding rapidly and I as I said the open source project and open to the other functionality might need so what we we
sometimes say in these things that Turkers GIS for the web I think that's actually
an understatement Turkey's GIS for everything you can run on pretty much any platform that you might wanna throw at it and that is its major advantage even if you aren't already JavaScript developer
out you can write your else's and 1 everywhere without worrying about what multiple implementations multiple code surfaces multiple areas for bugs the pop up you just need to spend your time thinking about the problem once implementing it and enjoying appropriate amount of geometry at it whether it's on a mobile phone or multi compute cluster and
so let me talk a little about the other major advantage of this JavaScript implementation which is the amount of processing power they can bring to the table that the examples are shown so far are uncomfortable in Chrome which is what showing the slides on and we beyond that
here's an example of the kind of analysis you probably wouldn't want to run in the browser only could but this is a huge is an outline of counties the United States this is a fairly high resolution due Jason file of the pads of tornadoes in the United States maintained by the US Geological Survey so everyone they have on record since they started keeping records of it it's not 50 MAGs probably more than you go off which of the wire into a browser but using 2 if we can very easily take the median point of each 1 of those of it well those lines intersect again that due
Jason and normalized by the total surface area of each polygon producing an accurate analysis of 2008 tornadoes tend to happen in the United States this takes about 2 seconds to run on a laptop like this but it's it's a decent chunk of data and that's unoptimized but we can also move beyond that
go and start really taking advantage of every bit of computing power that's present on machine like this or anywhere else I and
this is what tau reduce comes so as you might guess from the name tau reduced is a MapReduce tool for those of you not familiar with the MapReduce concept it's a way of thinking about parallelization of computing problems where a very large problem is mapped into a repeated task which can be distributed across a large number of nodes that's the Map step and there's a reduced step for the outputs of that expensive computation are combined into 1 answer or set of answers this is how Google solves a lot of problems out do works as ontologies works except in the Map phase is tied to individual tiles with the processed and in this case I'm speaking that vector tiles as should pause for a moment to explain that is well I'd I am sure that a lot of people Miserotti familiar with the Vector top concept of for those who want it's arrangement of geo data but that uses xyz indexing just like Rasta tiles but but instead of being a j pain it is the underlying geometry from that that would normally be used to render those images packed in extremely efficient manner by into a binary format what this means is that you can serve vector tiles to a client and they can draw the tiles themselves on hand said for the browser whatever and it's great up but it also preserves the source data so you can run geospatial analyses on vector tile set and that's what tower reduces about doing but now where would you
get a vector tell dataset you might ask but there are a bunch of places and of course you can create your own as a service map box regenerates a planet wide version of OpenStreetMap every nite I into vector tau format and you can download it from from the site about 30 kids compressed 45 h uncompressed that's a lot but it's doable for any modern laptop of course that the actual format this will come in is an and the tiles file which is a sequel like database that lets us packed that tiles together very very quickly i space efficiently you can also divided 2 tiles out individually and serve them over the network and how reduce can read vector tiles for cross-network but you wanna do large-scale geoprocessing job probably go 1 habit locally and this just because otherwise I always is what's gonna take the most time but so
let's show example of hotel reduce works in practice and the example that I just implement here is based on personal experience I had a number of years ago going to a friend's wedding in Atlanta Georgia but I don't know how many of you have been to Atlanta Georgia but they are very proud of the signature crop crop which is peaches and I did not appreciate this before going to Atlanta they mean a lot of things after peaches and I went to the hotel that I thought was the right 1 on Peachtree but of all the street or whatever it was am and it took me a long enough certainly long of to watch the taxi pull away to realize that this hotel was much too small and much too full high school volleyball players to possibly be the 1 that I wanted for the wedding but so I found myself spending across the lane highway about 2 AM to get to the 1 that I wanted to ask you is this is before over and there no taxes is coming so I hold a grudge against Atlanta ever since and I have been able to quantify the and really bring in statistical terms how horrible the naming schemas ontologies I thought is a nice opportunity to do this so I wanna I would demonstrate this life to you right now and started a job running here while I start to run through at what's involved so let's let's take a look to files initially the map phase versus the task the gets repeated again and again on a per tile basis and hope that this is much more on so this is the no Jess file and uh for those of you who haven't written node that's not as terrifying my look at 1st a lot of this we're going to ignore that I want to have a working example for you I walk through it really quickly and the 1st thing we do is include the turf library that's probably familiar to anyone who's done any programming the 2nd thing we do is export the functions can be doing the work this is just a way of making sure the tower reduce knows where to find the function that's gonna be doing the actual operations and a function will always have the same 3 parameters the 1st one's called power layers that's where the geo data comes in the 2nd is called options that's where configuration for the entire job lives and the 3rd 1 is the call back this is the function that we need to call when our work is done it's a really common thing in the GS programming if you haven't seen it before I it allows for paralyzed very fast asynchronous execution so in the guts of this function we do a couple things 1st reinitialized some variables to keep track of our task and I should clarify our task is going to be looking at every road in the style of calculating its length and figuring out whether it's name matches 1 of a series of fruits and if it does were going to accord the increase in the total road count and the total road length count preferred so re-initialize variables to keep track of the number of kilometers in total count as a regular expressions to match these fruits there look through every feature in the tile layer that we've been served and some of this stuff like 0 7 data that I was in that features just specific to this being OpenStreetMap source you can pretty much admired all this to we're gonna check a few of the properties are to make sure that's been tagged as a highway because this will include everything in OpenStreetMap they will include cafe penalize it will include building footprints is a few checks to make sure that we're looking at a road and they contains lines from geometry in that it has a name because obviously checking for the name of a fruit if there's no name will be pointless we calculate the distance of the length of the road very easily using true that line distance you know keep track of everything kilometers letter to a total count and then adjusted through each 1 of the fruits testing against the name and updating the total when we're done we're gonna past that object that we've been using to tally the souls back to the call back but it's pretty simple and so this probably I think that we should take a look at the function with a reducing happens that is in the index that yes by convention and are In this file is arranged similarly would play a couple of libraries for style reduce which of course is what focused on 2nd sprint f which is just of string formatting convenience library redefined options this tells us where to find the map function that we just walk through layers gives us some information about where to find the source data but this is just the location of the the tiles file here and zoom this particular tiles file is built and zoomable 15 which is good zoom level for this kind of analysis you can run these analyses at whatever zoom level you want on but 15 is the right number for this particular source and find a couple of bounding boxes because of 1 you comparison across cities and then you can see here instantiating the tower job and using the Washington DC bounding box 1st and the options for above In this just a few things left to do but I defined 2 events fatalities to pay attention to the 1st is reduced when 1 of those fruit DRGs jobs finishes and passes back the totals it's calculated this is what's going to catch the result in added to our global totals the 2nd is the end function would absolutely everything is done the individual fire this code will run in this is much more complicated than needs to be but I want that's emoji up on screen for you guys so I went ahead and did that in the last thing we do young 145 is invoked our D star on what you can see a while ago out we produce the results for Washington DC but it's it's looks like we got 23 roads named with cherry something so as to be cherry hill archery Dale archery Lane what everyone and and this talk about 41 seconds runs machine let me um adjust this will become a new tab when you just this really quick yeah to Atlanta yeah and this again but so 1 thing I wanna point out right away is if asynchronous and paralyze nature of this this is a CPU activity display and the job is getting started by walking through the NB tiles falling out and I can see reduce real work and spread across all 4 course my CPU right away maximum things out and OK that were done already and yet we are took about 18 seconds and you can see there are way too many teachers 1 . 2 per cent of always in Atlanta have peach in their name which is ridiculous and I should say you can you can plot the output of this you don't just have to pass on JavaScript objects totals and you can passaggio Jason and construct a geometry layer so in a slightly edited version of this I can construct geometry which I can I can put on a map very easily to show were all those Peachtree lanes and streets and roads are and I trust me when I say many of them are connected to each other which is especially egregious if you ask me so I this is a a trivial
example on but a good 1 things
get pretty interesting when you move up to the cloud that you use often that each top displayed I got 4 courses on this machine this
is an Amazon C 3 8 x large machine this is not Amazon's biggest but it's 1 that we use a lot of this cost about and 80 cents US to run for hours and gives you 30 cost gives you an idea of how cheap it is to scale up this kind of analysis things
get really interesting when you move beyond city analyzing sitting on a desktop is fine but what happens when you have a worldwide dataset this is data from RunKeeper for those of you who don't know RunKeeper it's an application for tracking fitness activities like running or biking or swimming that's pretty popular in the US in some parts of Europe and 1 of the options that give the users is to share the data that they capturing during their runs as rats that other people can try and so for all the publicly shared rats we can collect that we can chop off the beginning and to preserve people's privacy so we don't see where the warehouses they're going into we plotted on a map like this which shows the intensity of different exercise rights and so that's a clear visualization things that really interesting know when
you take something like this and I you're gonna have to take my word for word given that the color layout here but there are green lines here represent OpenStreetMap geometry up and at 1 conspicuously missing from this branch yeah if we start putting together to a functions to detect where were missing geometry between layers we can figure out where we need to do more mapping where we need our team of mappers to add to the map and and we can use on a global basis use InterCon towers here's stadium that was missing as a running around are you a bunch of coastal areas people really like run by the water that you can notice from sensors go and we can run this analysis in about an hour using 20 of those instances that's an incredibly quick analysis of the World Wide geospatial problem as I mentioned to
his free open source but I would encourage you to check it out of just a word or I get help me welcome contributions is a current list of
contributors and I will show this particular to single out Morgan Herlocker who is known the author of most of the slides but most turf and i've you have if you have accolades or questions for him but I think that he's he's following anyone talk to you but I'll be very happy to take your
questions such that can thank you the fact earlier thanks for a great presentation of observation sample just thinking about and uh the distributed computing aspects of using tower as if you have a really long road stuff and then pictures and on the number of objects accounting maybe twice that's you know and yet that is that is true and that's also so for purposes of this sort toy demonstration not a huge deal but if we were actually worried about it we could try and disambiguate using DOS MID but in this particular in this particular case you know you and we are counting twice because we're looking at of another active you but it's more often the case that we're doing problems like comparing a probe dataset for overlap and there it that's the quite nicely call and transient and you mentioned the Nobel I'm most instances already set up that unit and analysis of let's run amount of machines that is insightful question we are so there is an additional layer about reduce that we use for this and tire reduced is memory memory-constrained and designed around a single machine you can imagine it's not too hard to dish out different bounding boxes for whatever geometry you want cover to an instance spends itself up and runs on the actual technology for doing that is something that relies on some some projects that were in the process of open source and but haven't yet so I think that the short answer is I keep your eyes on a map boxplot will have more for you on that if you wanna run a global scale analysis by but in the short term it's it's enough to order Roy yourself if you got a compute cluster where you the you wanna and stuff and thank so check customer OK and the the hi thank you for the intention of engines turf has been influenced by the source project and then intentions mentioned in the fall term produce visible sometimes part of it is yes I did say to about failed to mention that they're both open-source projects like most of us opera like this we licenses ICT the IIS year MIT license thank you thank you a comments yeah