Bestand wählen
Merken

Finding the Where in Big Fuzzy Data

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
call on my good hang time at 100 was your 1st false for G yeah how that's awesome a school welcome is fine it's a of all the false for G. However it to the Manager Natural User consciousness at all who who who here is been here to all the phosphor reduced anybody of course you are who we know we should give this talk I think you've seen this before Call it started making 30 seconds to use it wisely on so many and return I curly the CTO as re are DC I like just having a fully acronymic title on tell the doing i've there but are a lot of big data stuff of open source as all other projects that we've done that kind relate to this all edit process of lies every year and hopefully by the end of this you watch understand it from a little more sense on the 1st article but have
what's interesting what's going on now in general the big data world on Israel's book but by at the Townsend about smart cities and intuitive story it from the growth of cities on an island in DC announced and fasting living in an urban environment things happen but during the end of the 19th century the the vast growth of how how fast people moving into cities announced crossed over 50 % the prior to the 18 forties 18 sixties as new there were fewer than 2 million people in cities only 10 % of thing of the world unless you live in cities on the by 19 20 or 50 million people in cities so a huge huge boom people moving into the cities and urban environments are meddling dense property areas before immigrating and and move around a lot target track were run
was the so the US we have by government in our constitutional mandate a decade-old census every 10 years with 1 count all the people are more this as best measured we can and ETD 1 it was so we were moving here in the US in in the cities that it took 7 years to calculate all the results of 6 7 years when I 3 is let's start the next 1 stock happen again and maybe they estimate the 1890 census theory too many people it would take too long they would published results from the 1890 census until after the next census is taken place so essentially was and it was that it was an unbounded probably couldn't do not a solve this and so people were were both the buildings is faster than we can count them to removing them too
fast so as a young enterprising our senses clock on current Hollerith sources also opportunity and entered the idea from looms our looms is cool things called punch cards in which you use them to program apart actually we Robinson and chores including things like that is like the the same thing for tabulating supernovae these cards with punch card had it you know all hold your and people in your household and what race you were you came from you put on on these over has you pull the handle and had these cops of Mercury underneath and all our borrowers that dipped in the mercury if it if a connected to the market as a whole punch card its intellectual kernel to move the dialog and wanted you ladies cards and pull the lever valid movements keep doing it keep doing that want to 99 thousand 999 and I go back to 0 say few of the right on the numbers were support 0 NGO the and it was so effective that they went the by the 1880 census with they thought would take 10 to 12 15 years to kowtow to calculate within 2 months the odds are publishing up results in certain cities into the entire US are within 2 years from kind at beginning of this big data were mortgagees like handle you have to start on meeting and processing so it's curve is racial for were going on new current Hollerith went on to become anybody the tabulating machine company this is a they became IBM us this is how idea that started with this little Mercury cops but currents and files are the community so I'm from every DC
office for based there are we can manage your IQ and your comments are the idea is being based their local the government building tools that are actually used to help solve these important problems our actually technically bad for Java company work based in Virginia but a close enough are we add to overlook the sea really trying help and make tools more accessible understandable to solve these problems a lot of becoming open source explain why for Big Data in particular on Big Data
get is what's happening now with this explosion of data and information how to actually did happen in meaningful ways to discover answers to problems before it's too late now the common concept
with big data is yeah these 3 these this is the kind of simple they're overly simplistic view it you have a huge volumes or moving too fast or you have a lot of different hetrogeneous datatypes and that's all true bestows vary from so very far you would you feel that is a field that is that the spreadsheet and what the person field but also looking at things like Internet of Things
words can be going to we have bigger problems now but what happens when every single sensor of vehicle I your card has tons of sensors in itself has thousands of sensors what happens is search published on the data everything light post street corner intersection on building Windows Search transmit this data of huge so it's really happened is the fact that global we now
ubiquitous global network in which we can pull for ever happen 1 onto anywhere in the world with MS and the fact that hard drives in computers became singly cheap on commodity so really begin was the fact that what we have is there's no reckons into lots lots of small packets information it's actually still hard to move these huge volumes of data I want capture really became the data horrors on capture everything of figure out ways of it's useful to me all
through it all kinds of music means now is to big Data is when used to start moving the data is something you capture keep arrest and then the idea is you start throwing albums act that dx Oberlander very small so you can think about is that before
you know and typical GIS find anyone here has been doing it really any calculations for a long time he downloaded data need run it locally employed your own database he was on a server and you then process right the difference is now the that was good if you wrote code a lot against the data the data were relatively small but now the data would take too long pipeline it weight these just to get your server instead he said the data big functions small move my functions to where the data resides such that's the 1 kind of principles the idea is that some of you data around in and push arms to the data and I'll talk about tools that do that In
other words an open source and think something everyone here likes and I've seen this power our careers now in our passion and beliefs was inching here specific to things that open data is the fact that this a new kind of domain but how do we actually analyzes things using non-traditional methods by you like it's a thing like
Legos is is by open source means I'm going discovering try out new ideas that were never imagined before so do open source in the Unix forces on the lots of little modules that you conclude gathered to do things and put them together and I could never imagine synopsis of doing and 1 or how I do that across vast numbers machines and pipe and play with it tried idea see result a few seconds to say I like that now run it forever amen Austria's examples of how that works so through carbon sources is as any kind of tool builder you can't build the 1 tool anymore it's console every is problem every problem now is unique the data you need pursues unique his had genius volumes are different velocities are different With enable the developers and in the end users themselves as much as possible to put their own intelligence against that data the so at about 3 different types
of big data set that were thinking about I think is a kind of framework for differ methodologies and the tools are provided to do this so what attrition batch processing your and it's taking what you would enter a desktop announced across lots machines but doing in a way that's on around the processing analysis and I'm gonna take the engine visualize it or get a number and and make a decision making action on it and that's just a whole like mapreduce is the 1 is a is a type of of batch processing it's pretty good on the problem it can be very slow when you're waiting for you could've run it could've taken the data run and you don't know if your answer is right or wrong but too laughter processes running and to do that every time on stream crossings become much more interesting where you start having tens or hundreds of thousands of features per 2nd in which you wanna know when something interesting happened as as that stream goes by and then you your learning based on that I don't want to watch the stream is when new across a certain threshold on a different kind of big analysis and losses is the search discoveries ideas in unison with the shape of the needle on looking for my haystack distilled general kind of problems I have let me know any these things crossover these thresholds so essentially the oldest Askel a number of questions and letting me know whenever the answers are happen show up the so the
free that kind of emerging as is become a terminology capture listings together this land architecture on what is in these 2 sides you of the uh the left side is such stream processing engine ominous or go through a couple tool that do this where you're watching the data as streams by armed Europe you're running various aggregate statistics I you're looking for moving window is on and its caption for general learning of of what's going on in once in his crosses us something happened there was more crime just happen a neighborhood than I would've expected that that kicks off about crossing jobs and how this happened what's happened the last day that's now I need a process to understand what this is so it's applying these 2 things that are in synchronous to Western crossings just in your general learning here something happened and about pressing helps to understand why it happened so and then
the anyone you know visualize analysis put is kind of a a mesmerizing view but they did you start doing these things like just watching all the data
stream through scene are my organs kind of doing what they need be doing alerting me like in this case is
a trait is about a form or hurricanes are about to get violent on and the other visualize that not stash for summonsing watching and
Tommy always certain threshold is crossed so this is actually based on the on the
using the the Earth uh Nolde nolde school I guess of visualization that we've we've built it non canvases anyway my point is that you know
how to make all these tools available spot the unique knowledge against so we've we really open source never told that we call GAS tools for Hadoop a really it's a set of tools allow you to go and do these different kinds of batch streaming our social search discovery and learning mechanisms so applying lot these
tools against the common frameworks is a res walls building some ourselves so this guy was the similar things that were working with and helping us build out and American some them if you haven't and if we recommend diving into common checking them out on Hadoop is came out of yahoo originally as an open source MapReduce framework on it's it's pretty well established its success synonymous with the last 5 years of Big Data meant doing Hadoop on that's not the only answer orbit so really powerful arms with open source tools on top of Hadoop will call is a really nice Ruby wrapper around that prosody 1 go and do the job on on small samples that on pigeon is another kind of query interface and habits of human rights equal collect queries on its Poseidon Keeler thrilling 3 really nice big databases that were helping spatially-enabled and coffin storable stream processing engines that were all example actually application example using those for doing stream processing reduces enabling these processing engines on elastic search party is pretty good and hoping to help that is essentially the new big if for search it's worth Lucene and solar have now grown up to this elastic search engine which is I has some really advanced on spatial capabilities in it for doing so everything from search this basic queries of data in features to it uh and the billions of records very easily to doing even alluring certain threshold across Hollywood significant not this 1 area on solid section built into the engine and Apache Spark is coming up to be kind of the new in-memory stream and micropolitical Micah batch it's batch production in tens or hundreds of milliseconds batch on so looks like it's it's training this actually acts more MapReduce on which will introduce mention things and again 20 misses kind of blow by these ideas of links at the end you can follow 1 the so the court this is something that is kind amazing for as read and why we're here is is which she took 1 work or or core Jules Ferry we open source that on the Esri geometry API on children you name for it but is it the job engine for doing spatial processing on its GATS but under currently it's still ugliness and me still the most open source out there on GTS is under it under a copyleft license us of the viral on that's changing believe which is also on borrowing I was under the Apache License so take it easy to everyone with that country back they don't have to use in commercial products go forth and reason about open sources was against problems we work on our tools going to build the and and also she for everybody people running going do that apply the unique ideas and concepts against it we want to enable that to really get to the best answer is possible so the hardest things is if
if you use any other John changing feel very often layer this really full-featured it handles all of you numerous side new Judge types topological operations relational on combinations between these are all built a native to the library Our
import-export different formats it followed the BGO J. Sun everyday son of the BKT so all a kind of handling arm serial what this looks like how easy it is on and when British wrapped in re because waiting job on a presentations the and so we write it the idea is that you wouldn't if in Java libraries and we adjacent factoring 7 cent up 1st year J. sign and converted to a native juncture object which then done back another format or pull my special processing offshore example that few slides that is a that gives you a low-level armed operations for it I college that you then need will keep wrapping up to do more of a hollow operations on the the Libor also has the validation is to sure you have good geometries closed loop is from an interest in all the
other operations of boundaries buffers clips on my basal operations on and then even quadtrees which becomes a really important then obviously for doing a large-scale distributed chartered spatial processing so this looks like is that again I can load up I can build a quadtree and search pushing objects into it and I have this next now that I can use on in memory were serialize it out and then do queries against so these little pieces that the base of the like you need to start building things around it on beginning use it you can go and I we have really wrappers around can reduce Python wrappers routed to make it easy to use are in different languages and very fast the but we've done is
not really wants to go work that low level the 1 work at higher abstractions so we've also open source and and built in tools above that LEED do more familiar high-level processing of across these data so as well so we have to do and this geometry API injured the 1 on which going to build your own MapReduce jobs great do that but if you don't it can be the learning curve somehow that height a framework vision seeker like query against these you write sequel it turns those into MapReduce jobs we extended hide have spatial operations in Greece against that we're apple apple ecology as tools for Hadoop which is a lot of different samples in examples in and patterns and then for cells of C we put into our chest just 1 click a button desktop the nice thing is well someone can push a button desktop or even go under the hood in the libraries and get them crazy and customs they want to so what this does dimensional quadtrees in distributed processing
just in a very quick have picture about how this works is which actually build spatial indices on different servers across the cluster things a r went on to come in the you are pushed off to that a high level of a spatial index all mentions what sure the operate against and 1 to gets to the machine it then does a finer-grained are maturing within that for insights quadtree insulating 1 requests come through for doing things like joins in it is a high-level kind of saying here's your future some more New Mexico go against the server or this or this index and within that we'll find out what county your police district a neighbor Duran and that's the concept of of reduce on of all mapping out and reducing this is pushing up to the local machine where the data resides and then all do that look in a machine churning out so this should be a quadtree such is the technique that does that so what this looks like a high level the mentioned height this is in the end
people regarding sequel on this very common language heightened she gives you sequel across these big datasets so you know it's in British pretty well outside you know in this case I'm looking at the number of earthquakes are by county on and want to just have a calculation against that and so it's like you can't against a set of a relational database but now it's handling across could be cross millions of billions of records and and goes distributed across however in many machine can choose the so I just for example of kind of what this looks like and what it was
before is on where she did I think FA on of locations of flights arm and we didn't counts by counties and with 14 million of foot locations and we 1 counters by county this kind of a little silly example on the 1st time we did this a few months ago using our John these tools on top of it so it took 30 minutes to runs operation so not hobble along a fashion data Umeå overrun on their coffee on this 49 records is not bad but with the Rosa could do better we can optimize things we optimize how we showed them out with Dr. optimizes the sum of the spatial processing on the server itself but in the end it now takes about 56 seconds to Catholic 14 million area points which is not not too shabby on this still batches I even imagine streaming would be real time this is saying I have a question was that look like at nite in time for me to go a coffee and to start that impact your productivity but it's it's a it's pretty cool start seeing this was all weeds diversion actually just got released this morning I heard from the team version 1 . 2 which has more optimizations in it to so we have run benchmarks against that yet but it could be a little faster and it will be marching starting more than just point polygons on this more benchmarks from that set something working on we look here if you have use cases like that on another
example here's an interesting this is on of project for a automotive company in Japan spoken so that can kind of guess on the London about how did bomb where should they try and promote purple and actually on the 1 and 2 you take the idea that people tend to live next each other might work in the same area had we connect them together they ship used Kirkland action help in small traffic reduction so 249 points a vehicle truck locations a workable commuting looked at where they were in the morning within 15 minutes each other to where they ended up at the same place within 15 minutes of each other on within a 500 meter grid cells and and from that we can derive that down to finding on where they're actually are common proposes a hundred people that all could actually be collaborating all starting as a neighborhood and going to same workplace and certainty that I think that's about 30 minutes to run arms against we can improve the ideas use user into some pretty interesting questions on that to actually in it's a map ready answer is even a map but still listed addresses oration ProKappa lines but the idea was up was but under if it was a uh a special question on similar
work we use a lot of these tools for doing which toppled more about this publicly the port of rotterdam 1 the know from all the ships that come in and out of the different ports on you know where is the the most traffic and congestion work they you put better signaling and things like that so we took on a year of AIDS is up on matters of ship tracking taking a year of all the aid of ships coming out and did some crooks spatial aggregations text spins so check that bingo card from but once you were the congestion were over time and the idea is now modeled real-time on does this change over time not not an answer for the past so that kind of thing that again is that land
architecture where they've process today the snow the pattern of the alert what if this pattern functions of began in into cost more processing afterwards
so as you remember the rural example thank you 1 and so was also comes in for example in cities that wanted to say I have audio detectors on the rooftops where they can say detect gunshots on solution for lots of different sounds and streaming in when something cross a certain decibel level event kicks off about crossing job to triangulate went from all the microphones where that gunshot probably was I so another example the of this
is something we show the passion have seen it was looking if example on social media tweets on during a disaster this cases Hurricane Sandy a few years ago in which we did on a normalized aggregation of social media mentions of power outages compared to people tweeting in general in Manhattan critical talk about the hurricane globally competitive 1 Twitter talking globally and so it is we ask that question so OK whenever this threshold crossed we're talking more about power outage that I would expect let me know and as of disastrous part often word other things and so visualization can running into a bracket once I wanna know what's important on stories and strong with their June able started on due process in these tweets on we and through strip the added values on as a WebSocket to the browser the Amen and visualize it and then a few seconds you start seeing this would happen is a threshold was crossed I care about something just happened I cannot dive in the specific features that can now cut out the video numerous because i had defined those specific ones what did they say our power transformer exploded eye-catching grab photos of the explosion ossicles windows verify that and over time see how people are moving in response that power outage so by the next day people all move north to Grand Central Station or Manhattan were sold power so it Camus and water and blankets they were sent to work you were going to be not where they had been and where they're living so this kind of real world time on a learning and is what else like it just kind of benchmarks
those numbers matter and just look at this is a G. enabled storm are process running on a Mac are just for benchmarks just read through is 10 thousand feet per 2nd with person tweets was 6 thousand every Jew join them by grid cells all 5 thousand tweets per 2nd to do in a single a single Mac so and that's easily storm is essentially the have new storm it's like Hadoop Streaming is what Twitter opera by company open-source it's with they use for the ad engine so can handle volumes in the spatial anyway intensities of location-based questions so kind going forward I was hoping another demo this I don't yet so I will
soon and I'll blog about it but we really something called participant data which says were giving way to every governing the world to make the data open accessible via juge duties on other formats on we have a thousand
dataset cell thousand sites created globally is more showing up were hoping is that now with these big data tools people go often take all this amazing Open Government Data insert into meaningful questions around our climate disaster resilience of of location impact of schools and poverty and health and and different arm aspects and it's these important questions for society through open-source tools and Open Data so the person I'm driving foreign and what will be doing with that of the next 6 months so can wrap up these are the
these the Urals to check out so as redacted have that I O has over 300 open source projects can go exploring and i've through 1 ones as shown here on GIS tools for Hadoop is the string look for for a specific stuff I've talked about and then whatever engineers particulars of very prolific and also the amazing engineer he delight analysis as shown here Mensa rod on just of blogs prolifically all every code he does on his all in is I did have project are more he's done specific applications around these tools so finer headaches Explorer is his blog combat Africa mentioning outer and ride on did have I missed of is a lot of other kinds examples might much account so that's my super brief big open source
data talk but that do very much appreciate you being here and I'm looking for any questions fj questions Israel downloading tools are and how the next to the wife I call on 1 available after aphorism to hand chat or I was set a booth up in in this section or over coffee so again thanks very much and have good afternoon
Prozess <Physik>
Datenmanagement
Fuzzy-Logik
Open Source
Natürliche Zahl
Zwei
Systemaufruf
Projektive Ebene
Lie-Gruppe
Resultante
Kategorie <Mathematik>
Baum <Mathematik>
Gebäude <Mathematik>
Zählen
Physikalische Theorie
Computeranimation
Dichte <Physik>
Chipkarte
Vierzig
Weg <Topologie>
Flächeninhalt
Programmierumgebung
Resultante
Prozess <Physik>
Open Source
Gebäude <Mathematik>
Validität
Zahlenbereich
Strömungsrichtung
Quellcode
Elektronische Publikation
Lochkarte
Computeranimation
Office-Paket
Kernel <Informatik>
Chipkarte
Virtuelle Maschine
Verbandstheorie
Rechter Winkel
Kurvenanpassung
Hilfesystem
Brennen <Datenverarbeitung>
Subtraktion
Sichtenkonzept
Datenfeld
Tabellenkalkulation
Datentyp
Spezifisches Volumen
Internet der Dinge
Information
Computeranimation
Datennetz
Gebäude <Mathematik>
Einfache Genauigkeit
Computerunterstütztes Verfahren
Computeranimation
Chipkarte
Motion Capturing
Festplattenlaufwerk
Bildschirmfenster
Wort <Informatik>
Information
Spezifisches Volumen
Figurierte Zahl
Lineares Funktional
Subtraktion
Datenhaltung
Gruppe <Mathematik>
Server
Rechnen
Code
Computeranimation
Geschwindigkeit
Open Source
Zwei
Zahlenbereich
Quellcode
Modul
Computeranimation
Open Source
Virtuelle Maschine
Domain-Name
Forcing
Datentyp
Wort <Informatik>
Spielkonsole
Spezifisches Volumen
Leistung <Physik>
Nachbarschaft <Mathematik>
Subtraktion
Einfügungsdämpfung
Prozess <Physik>
Gruppenoperation
Stapelverarbeitung
Zahlenbereich
Framework <Informatik>
Computeranimation
Virtuelle Maschine
Streaming <Kommunikationstechnik>
Prozess <Informatik>
Datentyp
Bildschirmfenster
Analysis
Shape <Informatik>
Schwellwertverfahren
Statistik
Architektur <Informatik>
Prozess <Informatik>
Zehn
Mailing-Liste
Entscheidungstheorie
Motion Capturing
Menge
Computerarchitektur
Stapelverarbeitung
Streaming <Kommunikationstechnik>
Demoszene <Programmierung>
Streaming <Kommunikationstechnik>
Bildschirmmaske
Sichtenkonzept
Selbst organisierendes System
Visualisierung
Computeranimation
Analysis
Kraftfahrzeugmechatroniker
Subtraktion
Schwellwertverfahren
Punkt
Menge
Open Source
Stapelverarbeitung
Computeranimation
Relationale Datenbank
Stereometrie
Prozess <Physik>
Wellenpaket
Mengentheoretische Topologie
Schaltnetz
Matrizenrechnung
Kartesische Koordinaten
Kardinalzahl
Einhüllende
Räumliche Anordnung
Framework <Informatik>
Computeranimation
Netzwerktopologie
Open Source
Streaming <Kommunikationstechnik>
Datensatz
Prozess <Informatik>
Suchmaschine
Typentheorie
Datentyp
Stichprobenumfang
Wrapper <Programmierung>
Produkt <Mathematik>
Programmbibliothek
Elastische Deformation
Schnittstelle
Leistung <Physik>
Elastische Deformation
Nichtlinearer Operator
Schwellwertverfahren
Zehn
Datenhaltung
Open Source
Relativitätstheorie
Eindeutigkeit
Orbit <Mathematik>
Abfrage
Biprodukt
Binder <Informatik>
Polygon
Flächeninhalt
Rechter Winkel
Räumliche Anordnung
Garbentheorie
Speicherabzug
Mini-Disc
Stapelverarbeitung
Lesen <Datenverarbeitung>
Prozess <Physik>
Applet
Formale Sprache
Parser
Einhüllende
Nichtlinearer Operator
Kombinatorische Gruppentheorie
Räumliche Anordnung
Computeranimation
Loop
Puffer <Netzplantechnik>
Prozess <Informatik>
Vorzeichen <Mathematik>
Faktor <Algebra>
Wrapper <Programmierung>
Speicherabzug
Programmbibliothek
Punkt
Operations Research
Nichtlinearer Operator
Datentyp
Validität
Indexberechnung
Abfrage
Instantiierung
Teilbarkeit
Ordnungsreduktion
Objekt <Kategorie>
Rechenschieber
Quad-Baum
Randwert
WKB-Methode
Festspeicher
Dateiformat
Räumliche Anordnung
Serielle Schnittstelle
Shape <Informatik>
Nichtlinearer Operator
Subtraktion
Prozess <Physik>
Abstraktionsebene
Summengleichung
Abfrage
Zellularer Automat
Fortsetzung <Mathematik>
Extrempunkt
Räumliche Anordnung
Framework <Informatik>
Computeranimation
Übergang
Quad-Baum
Virtuelle Maschine
Automatische Indexierung
Prozess <Informatik>
Mustersprache
Stichprobenumfang
Server
Indexberechnung
Kurvenanpassung
Maschinelles Sehen
Punkt
Gewichtete Summe
Prozess <Physik>
Randwert
Minimierung
Formale Sprache
Versionsverwaltung
Zahlenbereich
Fortsetzung <Mathematik>
Zählen
Polygon
Computeranimation
Virtuelle Maschine
Datensatz
Zählen
Ordnung <Mathematik>
Benchmark
Nichtlinearer Operator
Relationale Datenbank
Balken
Zwei
Globale Optimierung
Rechnen
Biprodukt
Echtzeitsystem
Framework <Informatik>
Gruppenkeim
Menge
Flächeninhalt
Server
URL
Shape <Informatik>
Nachbarschaft <Mathematik>
Punkt
Mathematik
Adressraum
Gruppenoperation
Zellularer Automat
Ähnlichkeitsgeometrie
Ordnungsreduktion
Computeranimation
Überlastkontrolle
Chipkarte
Mapping <Computergraphik>
Weg <Topologie>
Flächeninhalt
Meter
Projektive Ebene
URL
Hilfesystem
Gerade
Lineares Funktional
Subtraktion
Architektur <Informatik>
Prozess <Physik>
Prozess <Informatik>
Mustersprache
Computerarchitektur
Ereignishorizont
Computeranimation
Übergang
Zentralisator
Demo <Programm>
Prozess <Physik>
Wasserdampftafel
Browser
Zellularer Automat
Zahlenbereich
Kardinalzahl
Computeranimation
Eins
Streaming <Kommunikationstechnik>
Message-Passing
Poisson-Klammer
Reelle Zahl
Digitale Photographie
Arbeitsplatzcomputer
Endogene Variable
Bildschirmfenster
Visualisierung
Spezifisches Volumen
Leistung <Physik>
Benchmark
Box-Cox-Transformation
Schwellwertverfahren
Open Source
Zwei
Benchmark
Twitter <Softwareplattform>
Hypermedia
Mereologie
Wort <Informatik>
Normalvektor
Einfügungsdämpfung
Subtraktion
Web Site
Web log
Offene Menge
Open Source
Zellularer Automat
Dateiformat
URL
Computeranimation
Umwandlungsenthalpie
Web log
Open Source
Vererbungshierarchie
Garbentheorie
Projektive Ebene
Kartesische Koordinaten
Code
Computeranimation
Eins
Zeichenkette
Analysis

Metadaten

Formale Metadaten

Titel Finding the Where in Big Fuzzy Data
Serientitel FOSS4G 2014 Portland
Autor Turner, Andrew
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31651
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent FOSS4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract We've gone to plaid. It is now easier to store any and all information that we can because it might be useful later. Like a data hoarder, we would rather keep everything than throw any of it away. As a result, we now are knee-deep in bits that we are not quite sure are useful or meaningful. Fortunately, there is now a mature, and growing, family of open-source tools that make it straight-forward to organize, process and query all this data to find useful information. Hadoop has been synonymous with, and arguably responsible for, the rise of 'The Big Data'. But it's not your grandfather's mapreduce framework anymore (ok, in internet time). There are a number of open-source frameworks, tools, and techniques that are emerging that each provide a different speciality when managing and process fast, big, voracious data streams.As a Geo-community we understand the potential for location to be the common context through which we can combine disparate information. In large amounts of data with wide variety, location enables us to discover correlations that can be amazing insights that otherwise were lost when looking through our pre-defined and overly structured databases. And by using modern big data tools, we can now rapidly process queries which means we can experiment with more ideas in less time.This talk will share open-source projects that geo-enable these big data frameworks as well as use case examples of how they have been used to solve unique and interesting problems that would have taken forever to run or may not have even been possible.
Schlagwörter big data
hadoop
analysis

Ähnliche Filme

Loading...
Feedback