We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Analyzing floating car data with clickhouse db, postgres and R

00:00

Formale Metadaten

Titel
Analyzing floating car data with clickhouse db, postgres and R
Serientitel
Anzahl der Teile
295
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Spatio-temporal datasets like sensor-data or floating car data can be rather overwhelming because they quickly get in the order of billions of records. In this talk I show how we made billions of floating car data entries into a workable datastream that outputs visually attractive and useful maps and graphs over a routable network. I will start by summarizing the relatively new OS clickhouse database and how this column store helps in dealing with massive temporal datasets. Next I explain how we set up the pipeline with postgres/gis, pgrouting and R in order to create analysis in seconds and share some interesting results that you can get from these large trafficdatasets. The talk will be relatively code-focused (mainly SQL and R) but also show some ind-depth analyses of car data.
Schlagwörter
MultiplikationsoperatorCoxeter-GruppeSchedulingSelbst organisierendes SystemURLKontextbezogenes SystemPlastikkarteProjektive EbeneBitPunktInhalt <Mathematik>Bridge <Kommunikationstechnik>AnalysisMereologieKundendatenbankKategorie <Mathematik>InzidenzalgebraInformationSkalarproduktGerade ZahlEreignishorizontVorlesung/KonferenzBesprechung/Interview
EreignishorizontZahlenbereichURLInstantiierungDifferentePlastikkarteGoogolCOMSichtenkonzeptComputeranimation
BenutzerschnittstellenverwaltungssystemRelationale DatenbankDatenfeldGeradeSprachsynthese
MAPBitMeterEinflussgrößeDatenbankDateiformatBildschirmmaskeDatenflussURLPunktSystemaufrufDialekt
Open SourceMeterEinflussgrößeZentralisatorInformationPlastikkarteServerSkalarproduktMeterService providerMultiplikationsoperatorOpen SourceSchnittmenge
Shape <Informatik>RechnernetzInhalt <Mathematik>DatenbankRelationale DatenbankIndexberechnungROM <Informatik>Automatische IndexierungInformationsspeicherungTransaktionFreewareDatenkompressionTabelleMathematische LogikSelbstrepräsentationBitSoftwareMultiplikationsoperatorDatensatzRelationale DatenbankInformationsspeicherungGoogolNormalvektorAutomatische IndexierungDatenbankHalbleiterspeicherMathematikOpen SourceZeitreihenanalyseFreewareDruckspannungWeg <Topologie>TransaktionBenutzerschnittstellenverwaltungssystemDatenkompressionCASE <Informatik>RelationentheorieFitnessfunktionSkalarproduktDisplacement MappingHilfesystemKonfiguration <Informatik>AnalysisSystemaufrufZweiDifferenteProgrammschleifeComputeranimation
ClientDateiformatBitTopologieClientCodePlastikkarteCASE <Informatik>SchnittmengeObjekt <Kategorie>MultiplikationsoperatorInformationsspeicherung
DatenbankDatensatzDatenbankProzess <Informatik>Computeranimation
GruppenkeimASCIIOrdnung <Mathematik>Funktion <Mathematik>FreewareMultiplikationsoperatorMinimumHistogrammDatensatzMereologiePunktCodeProzess <Informatik>EinfügungsdämpfungZwei
Open SourceElementargeometrieRoutingMereologiePunktBitDatenbankLokales MinimumAbstandRoutingAbfrageCASE <Informatik>InformationsspeicherungComputeranimationDiagramm
Mechanismus-Design-TheorieVideokonferenzPunktRichtungURLMinimumAbstandMinkowski-MetrikGraphfärbungMultiplikationsoperatorSprachsynthese
DiagrammAbstandMultiplikationsoperatorBitGraphfärbungGeradeRichtungPunktwolkeTextur-MappingSprachsyntheseComputeranimationDiagramm
Physikalische Theorie
DateiformatClientSummierbarkeitGruppenkeimDatenbankMultiplikationsoperatorGraphGüte der AnpassungBitHilfesystemCoxeter-GruppeInformationsspeicherungProzess <Informatik>CodeMultigraphXMLComputeranimation
Physikalischer EffektDatenbankMeterZahlenbereichMultiplikationsoperatorCoxeter-GruppeGeradeElektronische PublikationRechter WinkelMinimumResultanteFormation <Mathematik>BeweistheorieURLPhysikalisches SystemFitnessfunktionGraphAbstandComputeranimationDiagramm
Web logMultiplikationsoperatorCoxeter-GruppeComputeranimation
InformationsspeicherungMultiplikationsoperatorKonfiguration <Informatik>ZeitreihenanalyseAuswahlaxiomSchätzfunktionSchnittmengeAbfrageDatensatzSchwellwertverfahrenDatenbankSystemaufrufInstantiierungVorlesung/Konferenz
Prozess <Informatik>Web logVerband <Mathematik>MatchingMultiplikationsoperatorBitWrapper <Programmierung>InformationsspeicherungComputeranimation
i think it's the last session for today and i'll be both chairing it and doing first presentation so if i'm going out of time i give me and i am. welcome i just noticed the title her presentation doesn't really fit the title which is on their schedule i changed at some point and but don't worry the content is still the same it still in there and so there will be clear how soon will be processed their sporting car that's happened.
and and miami.
my main point will be how to deal with all the downtown and and what can you get from it. ok first a bit more about what i'm going to speak about how they said look a lot of carbon time. how to use all that kind of dots on it is really quite something and add last thing and that's a nice spot what to get from it.
and there will be a bit of i promise you are right in the euro i have some pictures with our i know it's not too much coat fortunately. it's and first the context of my name is still fun to work i work for i don't complain netherlands and we have a project together with national.
somebody else collects traffic dots are actually it is nineteen possible boys together it's minutes parities its assembly government organizations and they all store their trophy doubt in one location which they use for traffic management and information some analysis and special analysis parts and we have all seen and.
and we got the opportunity to do some experiments with that out. ok what is roads dots on a floating car about our best he can. i put that into three categories a you have the the incidents which are locations along on the road or something happens could be an accident could be bridged that opens could be events which already planned ahead of sensors which is so obviously like to measure the speed they measure how many cars powered by bad they can even measure how big the car is how many wheels has.
and how fast it's a ghost from a to b. view of commerce for instance.
and you're floating car dot com which is bit different that's doctor you get from like a tom tom and devices or you get is from a google if you have a google the location on in your pockets and the might my trek you the best usually only speed biggest numbers of that i'm not very reliable.
about the census and i think you will see the sensor and the relatives it's like these black hair like lines where they can measure how many fields boss if you have to have the census behind each other like here they're putting it in the road that can also measure speak. and this quite a lot of them. this is the situation and amsterdam.
and every red did you see map is one sensor and actually multiple sensors usually because multiple lanes next to each other. form on location over the whole country is roughly fourteen thousand of them and you have from one to up to six lanes i believe for sensor and they measure the speed in the flow for every minute and that is only for the main roads issue see the underlying growth in the sense of very few points only a couple of them that doesn't mean.
they're not being measured by because every traffic light has a kind of measurement in it just means they're not centralized in one but the race. so roughly every five hundred metres from the road you pass one sensor on the highway.
the nice thing about this it's opened arty can download it six m l is a form of which is unreadable call dial text a that it's too. but if you read documents you know about how to handle you can drive quite a lot from it which is nice. now there's also floating card as i said these are the little things in your car which measure would show you where you are where you go but they also send that information to a central server and companies can buy that if they like.
so they did so that they bought a big settle for floating carbon time from provider which covers roughly four hundred thousand kilometres of roads and that once that's about ten million segments every segment is about fifty metres long and that's aggregators per minutes.
unfortunately because the dots ice commercially sold its close source. this gives you an idea about the road network analysis is every road that exists and that includes little roads on a diet and nobody goes but they're still in there and theoretically there being measured.
but. and if you would collect all this close and carve out on how much is that this gives a bit of ideas is seven hundred fifty thousand records in minutes. that is about one billion records a day that roughly translates to us fifty gigabytes per month.
that will be three terabytes a year for storage. on top of that there's monthly they provide any relevant mark a new road network means can be changes in the road that's about ten million segments and its it should be routed in the workforce.
how do we store that because obviously you need that to use them together. the first fault if you put everything in and normal relational jess and able database which is obviously in this case both stress with boss just honest i'm that was our first four because it has precious but we all know and love has a peachy rousing on its which you get rousing kind of for free in it.
as an enormous amount of indexing options on his and which is interesting because it's time series data which needs different of indexing the normal. only thing is it gets a little problematic this kind of dots base if it doesn't fit into memory and more these in the excess and they don't have some time.
now there's something new and has also caused database it is made by young the actual next the russian google and its open source not very long time ago as far as i know and it is a corner store data base who has who has heard of a common the database.
ok for those who didn't call the database out spend a bit later but it's different and slightly different way of storing data which is not very suitable for transactions or updates which means you just put in your data in and reading its but never change anything about it. and it's very sensible for analysis and very large database roughly look the same of time and the compresses pretty well for free.
what is called storage imagine that this old fashioned help the player where you put the dots out on one track and the displacement of all time if you have a normal database normal like and posters database you stare at a record by rock record to put it so first run record when you consider the next one if you want to escape from the first record to the millionth record you have to.
keep that whole track until he reached millions across a bit different the first take the first column put all our three children and only then they start with the second and the nice things you can and hope very quickly from the first call into the second one.
which gives a very quick way off skipping through a doctor and what doesn't work if you want to remove the ass and middle because they have to remove everything in between of the tax lot of time. so this about what a corner store it's now how do we get the data in sight. a little bit of code and first of it we put our.
so don time so the the top political gossip in posthumous which is just an object or your comment and the second when you see is ensuring the data from the the flaws in our data sets which is based case a c.c. set and that's said that in just directly into the clubhouse client.
now that what do you get what you see here is that over your what's in that clubhouse down to base in the corner store. that first row shows the data for roughly one month i said two hundred forty six to go by it's that is thirty two billion records and.
just to give you an idea. he and and to get an idea about how you clearly these databases it just curious accept his famous process just ask your can you see in the top. but what does he had interesting parts on the bottom because you what happens the last time is seven point five seconds and what it did process one point three billion rose.
that is so well paying nine hundred sixty megabytes per second which being processed. well redistricting of course but this is the amount of data which theoretically has been passed what you see here is that the average peter for every.
our on a specific day and the kids and a nice a free mr graham you just get in your and your output. ok that loss code which costs and maybe not that interesting and now you're to some pictures and this is the bar where you are.
the first bit of routing and for you don't know you use the post years database to a to use p.g. rousing on top of it and you can easily go from point a to point b. and the region would be done with that simple query over them. in this case we are we went along the highway from that of them.
if. and those segments can be combined to the data which is in the corner store and what you do is you get the data for that particular role segment of the database and then this is what a traffic jam would look like at one o'clock you see that same route which he took the this is the speed and this is the.
the distance over here and you see not much is happening as one stretch of highway where you can go that fast barely bears and and maximum on it if you go to foreclosure in the afternoon the situation is different you see dramatic spike senate which are obviously traffic jams. ok.
to show you how a traffic jam actually looks like i found this very nice video us to play. yeah. this is an experiment were actually create a traffic jam. and watch the white car this one will be making a traffic jam. it will break at some point.
but that goes. what you see happening is that the cars continue but the gem whether stop actually moves back courts space is now over them and you see it's going very quick at some point. so that location where the traffic jam is is morphing opposite of the direction of the cost.
remember that because you need it the next crop sell show. i think this is a slight be complicated been but once you know it's not that difficult you have time on the bottom still have distance on the vertical one and the color is the speech so if he would travel by car in a normal situation remember the one o'clock said.
duration you travel with a little bits off of slower speeds on them on the green party and basically you go in a straight line from my from a to b.. if he would travel later it would look like this year and of traffic jams you go slower and the texture more time to go from a to b. so obviously those blue clouds traffic jams and speak and you see the traffic jams actually move the opposite direction of the car direction.
what is interesting is that these traffic jams have been constant speed now this strange somebody told me actually travel in eighteen kilometres an hour but once that's fixed speak.
there is however a one line in there that want which i couldn't explain what is that line doing that. it probably has something do with this on this particular day the weather and analysts look like this you see dramatic front of rain going through the center here.
my theory was that has something to do have some influence on the traffic. now how do you prove that. you have to load all this radar rain radar time into a database so that's what you see here and this is just to prove its possible just believe it.
next thing you do and this goes again into the corner store data base it's quite a lot of data this every on every five minutes every kilometre for two months to collect about so you start summarizing and see what happens and and enjoy a graph about.
to confront this is the only bit of are called i have my presentation. and to make his wars are called actually consists from ninety percent of extra help. on that but maybe that's good giveaway s.q.l. is your friend if you during our because what you want to do is you want to remove as much as data as possible before you start putting into our and processing it so that's basically what it is we cut down already get us into a database and only then moved to to our i am the graph to see a comic straight from.
more.
i'm. so this is the end result. you see that same draw from the top gun and on the bottom is the same location its again time over here and this comes along the road over them. and you see that strange line again over there.
and if this is the press attention in millimetres per minute per hour i believe eight million system itself little and then you see there is a yellow band right over them. now this is a seems i don't have numbers but this seems like a pretty good fit and so i think you can and you would be able to kind of proof that these kind of friends off file whether of moving over the country doesn't have an influence on your time on your travel behaviour and.
the reason i showed this is not because i want to prove that the weather has influence on the traffic i think everybody believes that the reason is i want to show you how easily this can be done with the use of off such a database likely cause database and and really a lot of that's because you he'd have to look into the data to find these kind of.
pots and and. we were indians able to do this in a matter of minutes to calculated.
and yet with this when i came to the end of presentation i think of time spent a great thing about much of. the first time for picture should be on it.
is there any questions. put it that you mention that the ecology is a very good for the last day this is to have any kind of estimation what would be a kind of a threshold in the heat up by set these when when it's how does a reason to use the click of approach.
to be honest i think it is not so much about the amount of data but more than that the question i have it there question was one when do you suppose just when you go to click house so it is more about the kind of doctor you have an idea that which makes you benefit from a corner store database so even if.
it would be not too much not too much being still thousands of records of course but your database time series for instance or sends or so are then i would write a user can store in less i need to use advanced geographical queries and which is not unable to in caution it costs us for them.
there is however talk about a new in post press thirteen i believe maybe twelve even that might be option to at home store to process itself.
the pair of going back. i know. the. use say that you just click causes the calm store it should try other comes to us before choosing calls or was it the obvious choice for some reason.
the. i. the answer is yes we did i used to be much is a common story that's about it. this is not bad but it doesn't have as nice supports me that the manual is not bad readable asked how the most.
the installation took a little bit more effort i used to their said dr pepper actually in post dress which also unable to call mr also use that one and it doesn't match the speed off after a cost. if business the questions that we have enough time to change rooms i move thanks so much.