Merken

Big data analysis with Tile Reduce and Turf.js

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
wp since the by means going everybody
I have this offer my apologies on behalf of my colleague alex he was able to make the trip untimely and yeah there are 2 tools that I'm hoping to talk to you about today that map boxes be investing in for large-scale geospatial analysis and which I think could be useful to your own workflow and they are
to and reduced and they complement 1 another to is a modular library for geospatial analysis and how would use is a framework for performing geospatial analysis using tour for otherwise at very large scale and so I'm going jot dive into 2 a 1st talking through what it can do and I then do applied examples tower used to show how you can actually put all this stuff together but so too 1st
of all I should say is not exclusively map OX project you can find a turf yes that word i is an open source project that predates map OX involved in it but we have been investing very heavily in the project and 2
as you might imagine is designed to manipulate map data in this way it's quite similar to existing GIS technologies that you might already be using up I imagine many of you have workflows that involve had like PostGIS war or just a graph statistic you just got to of replicates many the functions those packages against offer to substantial advantages over whatever works for you might have 1st it
speaks to your Jason natively both for input and output this is a default assumption for everything the turf does we think that 2 adjacent is increasingly the lingua franca for open geo-data and I to a freely writes this assumption is that that dude Jason I believe this going to be instantiated as a working group is official Web standard very shortly on so this is this is a good bet to make but also provides a number of helper functions to transform the data into the kind of G adjacent data-structures expects but coming in and out on the bench this tho is that it means
that results your analysis using turf can be displayed absolutely everywhere not only in technologies that have map what's in their name by in I had to proprietary solutions psychologists or open third-party projects like you just the other major advantage that took brings
the table is that it is written in modern JavaScript and their existing geospatial analysis suites written in Java script they are often ports though from technologies that were not written with modern JavaScript development in mind but so too is happy to play with technologies like browser for AI with no JS out with whatever you might be using and this I say with with some limited my voice is really the future I'm most comfortable writing pipeline as I imagine some people in this room are other JavaScript offers substantial advantages both in terms of eking out every little last bit of performance from your processor and in terms of portability that 2nd point is the 1 the what I want to emphasize right now JavaScript's thanks to
the engine runs absolutely everywhere so obviously it can perform large scale computation on the server side in the cloud but you know of course they can run in browsers I can run comfortably on your laptop will be doing a demo of that later i and at this point the JavaScript interpreters on your mobile phone also quite adapting capable of crunching real numbers for geo applications In fact of his running by
John in the slides so this should be much much
larger and I apologize for not being this is an active slippy map showing that 2 of analysis of through 1 calls to our synthesis go over the past week if you don't have 3 1 1 in your city it's a service that a number of cities around the world are adopting where you can report the need for a city service like trash pick up were removal of graffiti things like that on and there's an open protocol called open through online that a lot of cities but these requests into and publish its nice example data secure the source points TuRF-E is been these are in real time into the neighborhoods of sensors goes what I think that geometry is and they're adjusting style and shall say apologies for the color of the slides i'm not sure what's going on display the it
so am so this is the project of around flexibility that not only in terms of where you can run these kinds of analyses but how the project is structured and how it's administered it is an open source project took is completely modular you can use as much or as little of the libraries you'd like I without including a gigantic IJS file on
a few of its more specific features I and these will be just pretty familiar to people who do this kind of work on buffering of course
this is how you would invoke a buffer call for a tour this is a contracting or extending the extent of a spatial object a point a line or polygon by set amount I would try to make it easy with calls up with units that the human readable and this is a live demo that
again is a bit too small C accurately but this is a race riot instances go for of popular foot race and a dataset of water funds within it I can see I can adjust the buffering up very quickly and find the intersection others there's no there's no crunching here but it's it's just happening in real time in the browser smoothing another option through
turf by taking the just will tolerance Bezier of a line like us to do
simplification I using the Quaker
contours if you've got a grid of points with the sequence to a full happily calculate contour lines for you
by using the ISO lines method with the final resolution as you can see ranging over
breaks my here is a data set of census population in New York City of those yellow dots represent what should the yellow dots represent the size of population to area and you can see the ISO lines have been calculated in the browser from that information as a scroll around
and finally aggregation of just as capable of doing this sort of statistical analysis based on geometry as you might wanna perform you saw some of that in the 3 1 1 Example already here's an example I again using a lot of fun dataset that we have had to effects lets us generate an arbitrary had screwed should be listening here yeah it's really hard to see what this purple and sorry are but I can then intersect these points against their grid and calculate the number of water fountain in each 1 of them instantaneously that obviously would also before operations like taking an average of the number in each grid or the maximum minimum although the basic aggregation functions you expect from the sweet like was just the on a course that's just the tip of
the iceberg this is the current functionalist as of the composition of the slide deck to be perfectly honest I I don't know the date of I but this is expanding rapidly and I as I said the open source project and open to the other functionality might need so what we we
sometimes say in these things that Turkers GIS for the web I think that's actually
an understatement Turkey's GIS for everything you can run on pretty much any platform that you might wanna throw at it and that is its major advantage even if you aren't already JavaScript developer
out you can write your else's and 1 everywhere without worrying about what multiple implementations multiple code surfaces multiple areas for bugs the pop up you just need to spend your time thinking about the problem once implementing it and enjoying appropriate amount of geometry at it whether it's on a mobile phone or multi compute cluster and
so let me talk a little about the other major advantage of this JavaScript implementation which is the amount of processing power they can bring to the table that the examples are shown so far are uncomfortable in Chrome which is what showing the slides on and we beyond that
here's an example of the kind of analysis you probably wouldn't want to run in the browser only could but this is a huge is an outline of counties the United States this is a fairly high resolution due Jason file of the pads of tornadoes in the United States maintained by the US Geological Survey so everyone they have on record since they started keeping records of it it's not 50 MAGs probably more than you go off which of the wire into a browser but using 2 if we can very easily take the median point of each 1 of those of it well those lines intersect again that due
Jason and normalized by the total surface area of each polygon producing an accurate analysis of 2008 tornadoes tend to happen in the United States this takes about 2 seconds to run on a laptop like this but it's it's a decent chunk of data and that's unoptimized but we can also move beyond that
go and start really taking advantage of every bit of computing power that's present on machine like this or anywhere else I and
this is what tau reduce comes so as you might guess from the name tau reduced is a MapReduce tool for those of you not familiar with the MapReduce concept it's a way of thinking about parallelization of computing problems where a very large problem is mapped into a repeated task which can be distributed across a large number of nodes that's the Map step and there's a reduced step for the outputs of that expensive computation are combined into 1 answer or set of answers this is how Google solves a lot of problems out do works as ontologies works except in the Map phase is tied to individual tiles with the processed and in this case I'm speaking that vector tiles as should pause for a moment to explain that is well I'd I am sure that a lot of people Miserotti familiar with the Vector top concept of for those who want it's arrangement of geo data but that uses xyz indexing just like Rasta tiles but but instead of being a j pain it is the underlying geometry from that that would normally be used to render those images packed in extremely efficient manner by into a binary format what this means is that you can serve vector tiles to a client and they can draw the tiles themselves on hand said for the browser whatever and it's great up but it also preserves the source data so you can run geospatial analyses on vector tile set and that's what tower reduces about doing but now where would you
get a vector tell dataset you might ask but there are a bunch of places and of course you can create your own as a service map box regenerates a planet wide version of OpenStreetMap every nite I into vector tau format and you can download it from from the site about 30 kids compressed 45 h uncompressed that's a lot but it's doable for any modern laptop of course that the actual format this will come in is an and the tiles file which is a sequel like database that lets us packed that tiles together very very quickly i space efficiently you can also divided 2 tiles out individually and serve them over the network and how reduce can read vector tiles for cross-network but you wanna do large-scale geoprocessing job probably go 1 habit locally and this just because otherwise I always is what's gonna take the most time but so
let's show example of hotel reduce works in practice and the example that I just implement here is based on personal experience I had a number of years ago going to a friend's wedding in Atlanta Georgia but I don't know how many of you have been to Atlanta Georgia but they are very proud of the signature crop crop which is peaches and I did not appreciate this before going to Atlanta they mean a lot of things after peaches and I went to the hotel that I thought was the right 1 on Peachtree but of all the street or whatever it was am and it took me a long enough certainly long of to watch the taxi pull away to realize that this hotel was much too small and much too full high school volleyball players to possibly be the 1 that I wanted for the wedding but so I found myself spending across the lane highway about 2 AM to get to the 1 that I wanted to ask you is this is before over and there no taxes is coming so I hold a grudge against Atlanta ever since and I have been able to quantify the and really bring in statistical terms how horrible the naming schemas ontologies I thought is a nice opportunity to do this so I wanna I would demonstrate this life to you right now and started a job running here while I start to run through at what's involved so let's let's take a look to files initially the map phase versus the task the gets repeated again and again on a per tile basis and hope that this is much more on so this is the no Jess file and uh for those of you who haven't written node that's not as terrifying my look at 1st a lot of this we're going to ignore that I want to have a working example for you I walk through it really quickly and the 1st thing we do is include the turf library that's probably familiar to anyone who's done any programming the 2nd thing we do is export the functions can be doing the work this is just a way of making sure the tower reduce knows where to find the function that's gonna be doing the actual operations and a function will always have the same 3 parameters the 1st one's called power layers that's where the geo data comes in the 2nd is called options that's where configuration for the entire job lives and the 3rd 1 is the call back this is the function that we need to call when our work is done it's a really common thing in the GS programming if you haven't seen it before I it allows for paralyzed very fast asynchronous execution so in the guts of this function we do a couple things 1st reinitialized some variables to keep track of our task and I should clarify our task is going to be looking at every road in the style of calculating its length and figuring out whether it's name matches 1 of a series of fruits and if it does were going to accord the increase in the total road count and the total road length count preferred so re-initialize variables to keep track of the number of kilometers in total count as a regular expressions to match these fruits there look through every feature in the tile layer that we've been served and some of this stuff like 0 7 data that I was in that features just specific to this being OpenStreetMap source you can pretty much admired all this to we're gonna check a few of the properties are to make sure that's been tagged as a highway because this will include everything in OpenStreetMap they will include cafe penalize it will include building footprints is a few checks to make sure that we're looking at a road and they contains lines from geometry in that it has a name because obviously checking for the name of a fruit if there's no name will be pointless we calculate the distance of the length of the road very easily using true that line distance you know keep track of everything kilometers letter to a total count and then adjusted through each 1 of the fruits testing against the name and updating the total when we're done we're gonna past that object that we've been using to tally the souls back to the call back but it's pretty simple and so this probably I think that we should take a look at the function with a reducing happens that is in the index that yes by convention and are In this file is arranged similarly would play a couple of libraries for style reduce which of course is what focused on 2nd sprint f which is just of string formatting convenience library redefined options this tells us where to find the map function that we just walk through layers gives us some information about where to find the source data but this is just the location of the the tiles file here and zoom this particular tiles file is built and zoomable 15 which is good zoom level for this kind of analysis you can run these analyses at whatever zoom level you want on but 15 is the right number for this particular source and find a couple of bounding boxes because of 1 you comparison across cities and then you can see here instantiating the tower job and using the Washington DC bounding box 1st and the options for above In this just a few things left to do but I defined 2 events fatalities to pay attention to the 1st is reduced when 1 of those fruit DRGs jobs finishes and passes back the totals it's calculated this is what's going to catch the result in added to our global totals the 2nd is the end function would absolutely everything is done the individual fire this code will run in this is much more complicated than needs to be but I want that's emoji up on screen for you guys so I went ahead and did that in the last thing we do young 145 is invoked our D star on what you can see a while ago out we produce the results for Washington DC but it's it's looks like we got 23 roads named with cherry something so as to be cherry hill archery Dale archery Lane what everyone and and this talk about 41 seconds runs machine let me um adjust this will become a new tab when you just this really quick yeah to Atlanta yeah and this again but so 1 thing I wanna point out right away is if asynchronous and paralyze nature of this this is a CPU activity display and the job is getting started by walking through the NB tiles falling out and I can see reduce real work and spread across all 4 course my CPU right away maximum things out and OK that were done already and yet we are took about 18 seconds and you can see there are way too many teachers 1 . 2 per cent of always in Atlanta have peach in their name which is ridiculous and I should say you can you can plot the output of this you don't just have to pass on JavaScript objects totals and you can passaggio Jason and construct a geometry layer so in a slightly edited version of this I can construct geometry which I can I can put on a map very easily to show were all those Peachtree lanes and streets and roads are and I trust me when I say many of them are connected to each other which is especially egregious if you ask me so I this is a a trivial
example on but a good 1 things
get pretty interesting when you move up to the cloud that you use often that each top displayed I got 4 courses on this machine this
is an Amazon C 3 8 x large machine this is not Amazon's biggest but it's 1 that we use a lot of this cost about and 80 cents US to run for hours and gives you 30 cost gives you an idea of how cheap it is to scale up this kind of analysis things
get really interesting when you move beyond city analyzing sitting on a desktop is fine but what happens when you have a worldwide dataset this is data from RunKeeper for those of you who don't know RunKeeper it's an application for tracking fitness activities like running or biking or swimming that's pretty popular in the US in some parts of Europe and 1 of the options that give the users is to share the data that they capturing during their runs as rats that other people can try and so for all the publicly shared rats we can collect that we can chop off the beginning and to preserve people's privacy so we don't see where the warehouses they're going into we plotted on a map like this which shows the intensity of different exercise rights and so that's a clear visualization things that really interesting know when
you take something like this and I you're gonna have to take my word for word given that the color layout here but there are green lines here represent OpenStreetMap geometry up and at 1 conspicuously missing from this branch yeah if we start putting together to a functions to detect where were missing geometry between layers we can figure out where we need to do more mapping where we need our team of mappers to add to the map and and we can use on a global basis use InterCon towers here's stadium that was missing as a running around are you a bunch of coastal areas people really like run by the water that you can notice from sensors go and we can run this analysis in about an hour using 20 of those instances that's an incredibly quick analysis of the World Wide geospatial problem as I mentioned to
his free open source but I would encourage you to check it out of just a word or I get help me welcome contributions is a current list of
contributors and I will show this particular to single out Morgan Herlocker who is known the author of most of the slides but most turf and i've you have if you have accolades or questions for him but I think that he's he's following anyone talk to you but I'll be very happy to take your
questions such that can thank you the fact earlier thanks for a great presentation of observation sample just thinking about and uh the distributed computing aspects of using tower as if you have a really long road stuff and then pictures and on the number of objects accounting maybe twice that's you know and yet that is that is true and that's also so for purposes of this sort toy demonstration not a huge deal but if we were actually worried about it we could try and disambiguate using DOS MID but in this particular in this particular case you know you and we are counting twice because we're looking at of another active you but it's more often the case that we're doing problems like comparing a probe dataset for overlap and there it that's the quite nicely call and transient and you mentioned the Nobel I'm most instances already set up that unit and analysis of let's run amount of machines that is insightful question we are so there is an additional layer about reduce that we use for this and tire reduced is memory memory-constrained and designed around a single machine you can imagine it's not too hard to dish out different bounding boxes for whatever geometry you want cover to an instance spends itself up and runs on the actual technology for doing that is something that relies on some some projects that were in the process of open source and but haven't yet so I think that the short answer is I keep your eyes on a map boxplot will have more for you on that if you wanna run a global scale analysis by but in the short term it's it's enough to order Roy yourself if you got a compute cluster where you the you wanna and stuff and thank so check customer OK and the the hi thank you for the intention of engines turf has been influenced by the source project and then intentions mentioned in the fall term produce visible sometimes part of it is yes I did say to about failed to mention that they're both open-source projects like most of us opera like this we licenses ICT the IIS year MIT license thank you thank you a comments yeah
Mapping <Computergraphik>
Arithmetisches Mittel
Quader
Computeranimation
Analysis
Mapping <Computergraphik>
Methodenbank
Zentrische Streckung
Open Source
Programmbibliothek
Wort <Informatik>
Projektive Ebene
Turm <Mathematik>
Framework <Informatik>
Computeranimation
Analysis
Lineares Funktional
Statistik
Graph
Gruppenkeim
Zahlenbereich
Ein-Ausgabe
Computeranimation
Mapping <Computergraphik>
Benutzerbeteiligung
Offene Menge
Datenreplikation
Default
Standardabweichung
Resultante
Suite <Programmpaket>
Bit
Punkt
Browser
Applet
Term
Computeranimation
Mapping <Computergraphik>
Offene Menge
Skript <Programm>
Projektive Ebene
Coprozessor
Softwareentwickler
Mobiles Endgerät
Tabelle <Informatik>
Analysis
Rechenschieber
Zentrische Streckung
Interpretierer
Punkt
Reelle Zahl
Notebook-Computer
Browser
Server
Kartesische Koordinaten
Computerunterstütztes Verfahren
Räumliche Anordnung
Streuungsdiagramm
Computeranimation
Nachbarschaft <Mathematik>
Punkt
Protokoll <Datenverarbeitungssystem>
Datensichtgerät
Open Source
Systemaufruf
Zahlenbereich
Elektronische Publikation
Term
Räumliche Anordnung
Computeranimation
Rechenschieber
Mapping <Computergraphik>
Dienst <Informatik>
Echtzeitsystem
Programmbibliothek
Projektive Ebene
Kantenfärbung
Objekt <Kategorie>
Puffer <Netzplantechnik>
Demo <Programm>
Einheit <Mathematik>
Punkt
Systemaufruf
Maßerweiterung
Polygon
Gerade
Computeranimation
Design by Contract
Bit
Echtzeitsystem
Browser
Wasserdampftafel
Computeranimation
Instantiierung
Folge <Mathematik>
Punkt
Gerade
Computeranimation
Bildauflösung
Soundverarbeitung
Nichtlinearer Operator
Lineares Funktional
Punkt
Extrempunkt
Browser
Zahlenbereich
Statistische Analyse
Räumliche Anordnung
Quick-Sort
Computeranimation
Skalarprodukt
Menge
Flächeninhalt
Mittelwert
Kontrollstruktur
Information
Gerade
Rechenschieber
Lineares Funktional
Benutzerbeteiligung
Open Source
Projektive Ebene
Computeranimation
Multiplikation
Flächeninhalt
Flächentheorie
Implementierung
Softwareentwickler
Räumliche Anordnung
Systemplattform
Code
Computeranimation
Programmfehler
Punkt
Browser
Implementierung
Elektronische Publikation
Computeranimation
Datensatz
Font
Benutzerführung
Gerade
Tabelle <Informatik>
Leistung <Physik>
Graphiktablett
Analysis
Bildauflösung
Virtuelle Maschine
Bit
Total <Mathematik>
Flächeninhalt
Flächentheorie
Notebook-Computer
Zwei
Polygon
Computeranimation
Leistung <Physik>
Analysis
Web Site
Quader
Momentenproblem
Browser
Versionsverwaltung
Tesselation
Zahlenbereich
Fortsetzung <Mathematik>
Computerunterstütztes Verfahren
Räumliche Anordnung
Raum-Zeit
Computeranimation
Task
Textur-Mapping
Knotenmenge
Client
Prozess <Informatik>
Notebook-Computer
Binärdaten
Turm <Mathematik>
Quellencodierung
Phasenumwandlung
Bildgebendes Verfahren
Funktion <Mathematik>
Ontologie <Wissensverarbeitung>
Datennetz
Datenhaltung
Open Source
Vektorraum
Mapping <Computergraphik>
Dienst <Informatik>
Menge
Automatische Indexierung
Server
Dateiformat
Lesen <Datenverarbeitung>
Resultante
Punkt
Natürliche Zahl
Datensichtgerät
Browser
Versionsverwaltung
Zählen
Computeranimation
Übergang
Eins
Prozess <Informatik>
Gerade
Phasenumwandlung
Funktion <Mathematik>
Softwaretest
Nichtlinearer Operator
Parametersystem
Lineares Funktional
Dicke
Kategorie <Mathematik>
Güte der Anpassung
Gebäude <Mathematik>
Systemaufruf
Zoom
Elektronische Unterschrift
Ereignishorizont
Konfiguration <Informatik>
Rechter Winkel
Automatische Indexierung
Information
URL
Zeichenkette
Große Vereinheitlichung
Total <Mathematik>
Quader
Tesselation
Zahlenbereich
Zentraleinheit
Räumliche Anordnung
Term
Code
Task
Virtuelle Maschine
Variable
Weg <Topologie>
Reelle Zahl
Programmbibliothek
Turm <Mathematik>
Abstand
Inklusion <Mathematik>
Optimierung
Ganze Funktion
Konfigurationsraum
Touchscreen
Analysis
Leistung <Physik>
Videospiel
Ontologie <Wissensverarbeitung>
Open Source
Zwei
Paarvergleich
Elektronische Publikation
Hill-Differentialgleichung
Regulärer Ausdruck
Objekt <Kategorie>
Mapping <Computergraphik>
Basisvektor
Virtuelle Maschine
Streuungsdiagramm
Computeranimation
Analysis
Lineares Funktional
Subtraktion
Datenmissbrauch
Wasserdampftafel
Verzweigendes Programm
Kartesische Koordinaten
Räumliche Anordnung
Computeranimation
Konfiguration <Informatik>
Mapping <Computergraphik>
Textur-Mapping
Flächeninhalt
Rechter Winkel
Basisvektor
Mereologie
Visualisierung
Turm <Mathematik>
Wort <Informatik>
Kantenfärbung
Gerade
Instantiierung
Fitnessfunktion
Analysis
Rechenschieber
Autorisierung
Open Source
Wort <Informatik>
Computeranimation
Prozess <Physik>
Quader
Zahlenbereich
Computerunterstütztes Verfahren
Cluster-Analyse
Kombinatorische Gruppentheorie
Term
Räumliche Anordnung
Computeranimation
Überlagerung <Mathematik>
Zustandsdichte
Virtuelle Maschine
Einheit <Mathematik>
Stichprobenumfang
Luenberger-Beobachter
Vorlesung/Konferenz
Analysis
Zentrische Streckung
Open Source
Zwei
Quick-Sort
Objekt <Kategorie>
Mapping <Computergraphik>
MIDI <Musikelektronik>
Mereologie
Projektive Ebene
Ordnung <Mathematik>
Instantiierung

Metadaten

Formale Metadaten

Titel Big data analysis with Tile Reduce and Turf.js
Serientitel FOSS4G Seoul 2015
Autor Lee, Tom
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/32040
Herausgeber FOSS4G
Erscheinungsjahr 2015
Sprache Englisch
Produzent FOSS4G KOREA
Produktionsjahr 2015
Produktionsort Seoul, South Korea

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Tile Reduce is a new open source map reduce frame work for analyzing massive geo data. Tile reduce is a tile analysis framework built on the javascript GIS library Turf.js. It runs on your local computer or in the AWS cloud and scales to run thousands of processors in parallel. At Mapbox we use Tile Reduce to detect issues in global street vector data like OpenStreetMap, data comparison and data conflation. This talk will walk through the architecture of Tile Reduce, highlight advantages, limitations and future developments.

Ähnliche Filme

Loading...