Merken

Making Data Dance

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the the time and and so and already out so welcome to the data and analytics track your else come and clock and the clock and we're talking about making data dance or anything think of this as promised a radical into patterns for fun and profit are excited to see so many people here in the audience I also want to play with data are excited about data I'm super excited about it and I promise this is a bold claim but I promise to be a least 1 in this whole mess that a debilitating use a little bit about me answers basically Mars may work better for the last 15 or so years started out as a my Western longer but still a long time AOL in a market research group in our in in production we used Berkeley DB yes really Berkeley DB informants Cyprus 2 versions of it's actually the my SQL post press and sequel server and we did most of those the last 4 we did with rails which was super exciting ideology of course i which is an asset tracking company that mostly works in the oil and gas space so lots of geodata really interesting stuff after so we suppose PostgreSQL prejudice and we used rather right now I'm Sarah labs have been there for 2 and a half years where an emerging technique you beta within Sarah which is a big company that I'm sure but you actually not but if you know the brand travelocity we just saw that so much that I knew that that were I we use post grass I when you GAS stuff are which JS usually reticent and occasionally might you 0 well all these things have in common was so great and
on the right I've seen a lot of data like I've seen a lot of really interesting there's a lot of it and generally play with it with 3 b not always but generally and then again generally have played with that with rails again now as you can do some really useful stuff with final the on this talk is have based on a series of blog posts that it a while back our and post press weekly the weekly the picked up a handful of and I'm super excited IDL I
the so active Record great handles most of the data access needs it's a great place to start and maintain it you know eventually outgrow it and maybe even at the rails at least in the monorail or what we call on our integrated the best conversation for later runtime time of that having said that the rails is still argue to tool in the lab especially the build a back end this is the talk about active record this is the talk about right not where I guilds arise QL by is cabin these 2 2 areas are you can build up complicated to you can do joins in record scopes you can change grouped together and that's great but also starts he can't be right and so I find myself rather than figuring out how all these things fit together and then you've got this thing from the scope it's reference in the so the scope and that's weird religious right query In the you get deeper database functionality Active Record maps you know all of your current functionality and that's great but how to use something in the database that's a little more robust a little more complex or unique to that database will dig into the word unless you up and also really hominid you purge js the RG agenda is is fantastic but I know question yes and those those functions and so I have a really will learn a new API for this sort of thing that I mean I use some but also requires so I know you're thinking bad do what about database portability right right here is offered doubt right but raise your
hand if you've ever change databases and established project repair testing have either last year you don't OK so some of you know it just
doesn't work like that there portability is a pipe dream and maybe I had a different experience than I have about a handful times and it really sucks even beyond just the basic query interface that Active Record gives you at the data types are different these are different databases there different data types and even if you have like that really cool eyes throughout a prosperous script you still have to do all the work to migrate that takes a lot of effort the and then a bike and a well we switched versions of Sybase so we were within the same database right even that was a lot of work we did all the migrated the test everything was great we still spend a month fighting fires in part because the VBA that's the story was just a month fighting fires Davis probability just doesn't work that so I say you chose the whatever the posters database for reasoning but use and that it may not have been you chose the best but let's just assume that you or that other person are not more uncomfortable that assumption today on the under some caveats carriers I the more work you put on the database the more that can become a single point of failure so pay attention to your connections attention to connection pool recurrent connections a hands out if you have a query that takes longer than 500 milliseconds like you do it wrong something is bad on it probably you don't want to spend even that much time the this half a 2nd is an insane amount time in the database pirate talk about how to do that and also point out you to be benchmarking your queries against production data because of there were a lot differently in your local that database with a couple hundred records and in doing a production database with a couple million records cool so let's go ahead sorry window functions this is going to be the foundation of our entire today as the money here play with when the functions of the stressor awful the a bigger right so if you've you know this is way cool stuff so the documentation says a window function forms a calculation across the table rows somehow related to the through the calculation crossing rose related to the current documentation goes the press Chris documentation I think is pretty good calculation across a set of rows related to the core it's not that I think can simplify yeah fold data from other writers in this room OK so there are probably a dozen anymore when the functions these are the 5 this is when of function the lead and lag 1st value last value and row number so let's take a look give Mensonge cheesy you're pretty sample data here rank these functions so the latter at 1st read take the lead ID and lack ID note that over open bring close friend that's redefine the window were not anything special and this figure case reducing the table is the window the sir and take the next ID in the previous ID for that given row so you see for the 1st row the next ideas to there is no previous ID so it's in all then on and on the last row there is no next ID so it's no pre-assigned these so we're doing calculation across some set of rows related to that and so this talk about how to define we the partition fruit
so here repetition by the value of fruit ever in order by the so all occurrences of the similar troops are going to be much together the Rich on the table of different so that 1st value of the ID across the fruits so for Apple we have for the 1st idea is 1 this 1 banana and he's saying in there 10 and we have 5 pairs 1st there happens at 4 if we were to look at the last value Apple would be a 5 banana still and here is you see those ideas bounce around but as a matter we rearrange the data but we can round out of the Melissa Five with row number so we see each rose position within its window you know each window function gets its own petition the fight the but can be different if you want to do that they they're not related at all so for each row compute across the rows that are within that partition so Apple or 2 3 4 1 banana 5 pairs once you from great now let's look at a practical example rule-making make a chat so here you see we have some messages we got a couple different rooms and I want to make this app behave collected message out on your phone were not gonna call out the person's name with each message so how we do that so we could pull out all the data and we can live through it do all the stuff in the client that's a lot of work so staring to just let the database tell us what's the next 90 right break it by room the that 2 rooms or any number of so for each room there were order by room ID which is a strictly necessary born on looking at the data it just makes it easier to look at the toward and they were ordered by messenger so this allows us to hide that name better for the 2nd and 3rd message because we see as the same user right so that some user the user is about to go to the airport and then they just landed In wow and somebody else comes in and says hey so the segment the message in there doesn't have the person's name in the meanwhile in some other rooms summarize way for a new the and as a bonus protein this does anything to do with when the functions which you can use positional field numbers for the order by inter group by this is especially handy if you have a calculated field like a when the function or count for something the so you had to re specify that the calculation
great so is the rest of how we do some this is how you execute that rise well inactive records you could also do find bias throughout but I just prefer a record based connection excuse the so we've got a calculated field In this file in this query that's not going to show up when you inspect the object because the class doesn't know about it but it's there not so
you really think it was a little weird that's Sulaiman uncomfortable what how anti-pattern ideas of have a conversation for that's slide you have to use Active record you can just use the PG Jim directly or if you're just using plain ordinary you can talk to a database supplant ordinary that is that it
might look something like this so whereas with Active record you got an array of message objects just directly with PGG and I use the an array of field right so you get an array of arrays with the data as a
reminder models don't have to be backed by active record you don't have to hear from you don't even have to persist the data in the database we can just put business logic In the than the classical model Estonia prominent right so now we can answer questions like when something change like when estimating the how long the each thing last how long were people in which places when you enter and exit events the you could totally pull that data down all that transactional data and you can mind you in your mind that's a lot of objects log Active Record objects or lot of stuff to some across the wire to you drudgery framework the but to do this to answer these questions we need more sophisticated queries as already subgroups you can separator filter in groups so you can wrap a query within operates a faltering group you so query for a field value that's really cool retirement there you can also do query and join were not it's about that but it is a thing you can do it so the sonar based query for a given phone call next matter feel looking at data the comes in from a Bluetooth began and I begin so do an indoor location stuff here this can help us see how film moves around in an area covered by the again I water by phone there is a strictly necessary but it just makes it easier to see so we're order at repetition the data by phone so we have a for each of the funds will have all the readings and the B in order by the really timestamp there is like something out were trying to answer the question of when something change so where
rap that query in the query and then we can say mean that minor in the next Minor were different so now I have a change and we name our sub queries so so it's all that as a beacon readings with united i'd
like race sub query integrate the inquiry library the so now we can see how people move around in the building where they move from into and how often do those things occur the daily sublime organization saint to digest that the great really this is what they looks like so I
used t 3 To make a core diagram to show the connections and show the intensity of those connections so this is the power of right to don't be afraid of OK as
we factor we're building up queries and messing
around and that's and and that's the making it unwieldy so we have an answer to
that common table expressions so here is our original query with the outer query we rewrite this with this also known as a with query because we have with were so many of query as all that stuff and then you see the bottom half we just select from that and it essentially becomes like a named variable query the now this case is it really on a spectacular I that's not really that different the power of common table expressions comes in something like this where you can build up bite-size pieces of query the specifics of this very really aren't that important it doesn't cool stuff on a lot were what we're doing here and all that stuff is we're just going at time series data were putting an hourly buckets were doing that
so we can look at this heat map we can see how long the people in which places again just using the 3 now you may
notice hate them so that this apparent report he had ever using this base gray all over the place was gong we should bear repeating ourselves yes sometimes you may have a greater gives a lot of a lot of different places that's we can fix the problem or we
conclude that the hearing do is called a view a store query in the database it's not a stored procedure just a query the saved in the database it looks like a read-only table so the create we you create or replace that or places optional but it's a super good idea i is this save you from if you were to rewrite it and then it will complain just right so we got this and we can
totally put this in Active Record migration which is a good idea if you do this in arouse that you wanna do this it's not a reversal migration obviously to have to find that out in the down that execute is had been there because I with an accurate a migration it understands that this active record face connection x the ridges pipe I here Doc into so that and that strip here Doc is a cool trick so we're looking at the log output it's not hanging out like way over on the right if you strip all the leading spaces following so at reasonable in your life output and then we dropped it you would drive you if exists again that if exists is optional but it'll save you from having error if you is something done so now we have this word just select
from that view is much cleaner but you look at it we can use that all over the place and I said that we get this is this logic in our database and that's a super bad idea right yeah know I've had their lives belts I find that religious battles I'm not saying but all the business logic in the database unless they start there I'm saying think about it talk it over with your co-workers reappears the point is you can decide where your code runs and why with a faster my queries to
slow I say 506 hundred milliseconds and unicorns a price the so let's do the heavy lifting a separate process we can do that With the materialized so whereas before we had of you where the crew stored in the database we yes you that it's going affecting the results for you in real time with materialized view where take the definition of the query and materialize the results from restore the results from the query ahead of time and it can then be index which is great to restore the query definition and the data in the data so where parameterized
you whenever and algorithm this may not be the canonical Postgres naming convention that's OK just have some convention be consistent select out of that you or your select the most recent 24 hours because in this particular case we have a reporting AP inwards show much there and then you can see we have indexes on this materialized views which is good again you can put this and migration I would recommend that then this is will drop for you automatically you have to specify a drop in as we got material you when you drop a table of just it's indexes for yeah cool so we've got
now some fresh hot data but you materialize that we materialize it you it's a one-shot thing you have to go updated it's television-based database update that so we have this
refresh materialized views and brand new impose this 9 . 4 which dropped in December we can do that concurrently which is good because it would basically table log necessary fast features like relatively which can or doesn't have to wait for all slots so we can update the data now and and it's great the if you're
hosting on her rocky the users or you can use rate that but if you're sonata at you use a simple rate task and you can update as often as every 10 minutes using the scheduler plugin if you need a bit more frequently than that hurricanes gotcha back on that also that's different solution for that the so we're refresh the materialized view In we've always got up-to-date reasonably up-to-date data now what we're talking about rate tasks and His test that I wrote to pull down production data and loaded into local database so the the I can benchmark my queries against real data I can run mapped against real data on this might be a security problem for you in your situation so maybe you can do that and that's fine but this up for you there's a just there and you in 42 chronic or whatever there's also on post breast her agree that Our and some were down and all that page they have uh law port where you can see your slow queries and and he puts it in order and shows you the on average about a good they are the but let's keep dancing with
data showing rest subgroup for field I
is a super cool so this query here
is do look ahead the integrated dual look ahead to see what is the reading IDE for basically an eggs and that when somebody cross the threshold that I call it the which is minor 0 the scene were selected from the materialized view twice and we're using that name query filter the integral you can do that so I just want the next occurrence where the miners is 0 but is greater than the current record and we're doing that filtering in there which is why I can do function so we can build up this summer shocked to
see all the places worst where people went and how often you know those past overlap again just a 3 cool so of
some other useful things well over here if you're because a person who likes to read documentation and see what call things exist then by all means check reviews innumerable lots of good stuff in there also look at proestrus data types there is an array data where you can take some other data types and put it in an array of things and that's really handy up to the right-hand side is also a date range and timestamp range data types so if your paper 1 of those earlier queries did some timeboxing where we want to look at something where we start date was whatever and then date was whatever you can use this data range type where it has a certain date as a single field in the range and has 0 it it has its own operate operator so you can ask give me all the records that exist within this at this point time and this range field understand and they have their own indexes so that's really handy this is jason data types and then 9 4 we got to some binary just stored in binary instead the sun index is a little more efficient by with the dates and the offset so if you're familiar with the age store data type this is like that but actually useful and you can do like a nested hierarchy and you can have very down into it and with posters you can index anything you can select from also so I guess said the Jason be more efficient and so you could pull out like all the values for a field as an array or you could pull out all the keys as an array and then job I revere jump character what every languages understand how they interact on collection so I don't the were UUID so if a story you ideas In a character character very film but maybe you only get this but it has its own index type and a you get for free validation that it's actually so that some if you think but so you see that the bomb out its cattle that it's guides rebound rails site I have some good information on how to use the colors In finite record and rails future you will thank you yeah good thanks so just kind general had a post rest but if you use your stand by you have post crescent sold on but I couldn't tell you what it is having to reduce and you can stop a and pushchairs with homebrew if you're run Linux I super-easy with you the young without repositories or you could just download workers PostgreSQL and that and you get post GIS and all dependencies for free there are a lot of indices for approaches i takes have a lot and stop the current version of post press is 9 . 4 within ideas 2 weeks has pressed 9 for dropping her appeal was already supporting which pretty amazing so database tools like a gooey interface there's a PGM in 3 the for us and I think the geometry is what platform is free the key bindings in this understand crazy but you a cat that got several different offerings I'm free trials I am command-line PSQ of and use all 3 of those and if you nothing where with the explain analyze command you need to be familiar with explain analyzed so you can run your query like against actual data and see where your hot spots or you can take that
and copied into that website and see your hot spots that's pretty cool the but so to recap it's OK to
write SQL really it's a cat recall so that when I refactor this is on the order I go and steps 0 is just given myself and unless something down my way into start and if I see that I started to come and mess things in a scandal crazy then maybe already format that with some common table expressions if I have something I'm using a lot of a that you if I need squeeze somewhat forwards out there and dematerialized these are just tools which you have a new that if he didn't know about I do again I'm not saying do everything in the database but it's not a terrible thing when you decided that's what you want to just expanded toolbox you insert the more you know she across the screen here post presses also I'm still finding stuff that is just amazing to me and I can't believe it's in there and so thank you we have here my my my
Bit
Betafunktion
Mustersprache
Server
Vererbungshierarchie
Versionsverwaltung
Gruppenkeim
Fortsetzung <Mathematik>
GRASS <Programm>
Analytische Menge
Biprodukt
Raum-Zeit
Lineares Funktional
Umsetzung <Informatik>
Datumsgrenze
Web log
Hyperbelverfahren
Datenhaltung
Reihe
Abfrage
Rechenzeit
Strömungsrichtung
Lineares Funktional
Quick-Sort
Computeranimation
Datenhaltung
Mapping <Computergraphik>
Datensatz
Funktion <Mathematik>
Flächeninhalt
Rechter Winkel
Front-End <Software>
Vererbungshierarchie
Wort <Informatik>
Mobiles Endgerät
Subtraktion
Punkt
Versionsverwaltung
Zahlenbereich
Bildschirmfenster
Rechenbuch
Computeranimation
Datenhaltung
Differenzengleichung
Datensatz
Bildschirmmaske
Datentyp
Bildschirmfenster
Stichprobenumfang
Skript <Programm>
Vorlesung/Konferenz
Strom <Mathematik>
Mobiles Endgerät
Figurierte Zahl
Schnittstelle
Softwaretest
Einfach zusammenhängender Raum
Lineares Funktional
Freier Ladungsträger
Datenhaltung
Softwarepiraterie
Einfache Genauigkeit
Abfrage
Rechnen
Biprodukt
Partitionsfunktion
Faltung <Mathematik>
Fensterfunktion
Funktion <Mathematik>
Menge
Rechter Winkel
Mereologie
Speicherabzug
Projektive Ebene
Tabelle <Informatik>
Partitionsfunktion
Subtraktion
Ortsoperator
App <Programm>
Gruppenkeim
Zahlenbereich
Computerunterstütztes Verfahren
E-Mail
Zählen
Message-Passing
Datensatz
Client
Gruppentheorie
Bildschirmfenster
Mixed Reality
Kontrollstruktur
Strom <Mathematik>
Ordnung <Mathematik>
App <Programm>
Lineares Funktional
Datenhaltung
Anwendungsspezifischer Prozessor
Computervirus
Rechnen
Partitionsfunktion
Datenfeld
Fensterfunktion
Rechter Winkel
Ordnung <Mathematik>
Message-Passing
Einfach zusammenhängender Raum
Objekt <Kategorie>
Umsetzung <Informatik>
Datenhaltung
Klasse <Mathematik>
Abfrage
Elektronische Publikation
Computeranimation
Sinusfunktion
Objekt <Kategorie>
Mustersprache
Message-Passing
Datensatz
Datenfeld
Körper <Physik>
Ordnung <Mathematik>
Partitionsfunktion
Retrievalsprache
Wasserdampftafel
Mathematisierung
Gruppenkeim
Mathematische Logik
Framework <Informatik>
Computeranimation
Zeitstempel
Datenhaltung
Untergruppe
Message-Passing
Datensatz
Informationsmodellierung
Passwort
Ordnung <Mathematik>
Ereignishorizont
Modul
Array <Informatik>
Trennungsaxiom
Datenhaltung
Klassische Physik
Abfrage
Systemaufruf
Digitalfilter
Ereignishorizont
Objekt <Kategorie>
System F
Datenfeld
Gruppenkeim
Flächeninhalt
Rechter Winkel
URL
Modelltheorie
Ordnung <Mathematik>
Message-Passing
Lesen <Datenverarbeitung>
Partitionsfunktion
Abfrage
Gruppenkeim
Selbst organisierendes System
Gebäude <Mathematik>
Mathematisierung
Zählen
Programmbibliothek
Abfrage
Ordnung <Mathematik>
Lesen <Datenverarbeitung>
Einfach zusammenhängender Raum
Diagramm
Teilbarkeit
Rechter Winkel
Gebäude <Mathematik>
Abfrage
Speicherabzug
Computeranimation
Leistung <Physik>
Umwandlungsenthalpie
Partitionsfunktion
Rundung
Abfrage
Computeranimation
Mapping <Computergraphik>
Arithmetischer Ausdruck
Zeitreihenanalyse
Gruppenkeim
Rechter Winkel
Zählen
Mehrrechnersystem
Ordnung <Mathematik>
Tabelle <Informatik>
Leistung <Physik>
Partitionsfunktion
Subtraktion
Sichtenkonzept
Datenhaltung
Abfrage
Wiederkehrender Zustand
Speicher <Informatik>
Sichtenkonzept
Algorithmische Programmiersprache
Computeranimation
Konfiguration <Informatik>
Abfrage
Ordnung <Mathematik>
Speicher <Informatik>
Verkehrsinformation
Tabelle <Informatik>
Partitionsfunktion
Mathematische Logik
Punkt
Mathematische Logik
Code
Raum-Zeit
Computeranimation
Datenhaltung
Datensatz
Reverse Engineering
Code
Migration <Informatik>
Vererbungshierarchie
Zählen
Ordnung <Mathematik>
Funktion <Mathematik>
Einfach zusammenhängender Raum
Videospiel
Tropfen
Sichtenkonzept
Datenhaltung
Abfrage
Migration <Informatik>
Sichtenkonzept
Gruppenkeim
Rechter Winkel
Wort <Informatik>
Fehlermeldung
Resultante
Retrievalsprache
Echtzeitsystem
Sichtenkonzept
Abfrage
Automatische Indexierung
Datenhaltung
Speicher <Informatik>
Abfrage
Sichtenkonzept
Computeranimation
Sichtenkonzept
ASCII
Automatische Indexierung
Datenhaltung
Migration <Informatik>
Default
Besprechung/Interview
Indexberechnung
Sichtenkonzept
Tabelle <Informatik>
Faserbündel
Stellenring
Bit
Reihenfolgeproblem
Gesetz <Physik>
Computeranimation
Datenhaltung
Homepage
Task
Physikalisches System
Last
Task
Reelle Zahl
Speicherabzug
Softwaretest
Sichtenkonzept
Computersicherheit
Datenhaltung
Raum-Zeit
Konfigurationsraum
Abfrage
Plug in
Programmierumgebung
Biprodukt
Bitrate
Sichtenkonzept
Scheduling
Ordnung <Mathematik>
Mini-Disc
Tabelle <Informatik>
Demoszene <Programmierung>
Schwellwertverfahren
Sichtenkonzept
Datenfeld
Abfrage
Ordnung <Mathematik>
Extrempunkt
Körpertheorie
Programmierumgebung
Computeranimation
Lesen <Datenverarbeitung>
Untergruppe
Web Site
Punkt
Formale Sprache
Versionsverwaltung
Hierarchische Struktur
Systemplattform
Räumliche Anordnung
Computeranimation
Datenhaltung
Shareware
Datensatz
Spannweite <Stochastik>
Wechselsprung
Prozess <Informatik>
Typentheorie
Datentyp
Zeitstempel
Elektronischer Programmführer
Indexberechnung
Speicher <Informatik>
Schnittstelle
Schnelltaste
Navigieren
Dokumentenserver
Finitismus
Datenhaltung
Güte der Anpassung
Validität
Abfrage
Strömungsrichtung
Visuelles System
Spannweite <Stochastik>
Datenfeld
Automatische Indexierung
Information
Kantenfärbung
Computerunterstützte Übersetzung
Versionsverwaltung
Schlüsselverwaltung
Tabelle <Informatik>
Arithmetischer Ausdruck
Wärmeausdehnung
Datenhaltung
Regulärer Ausdruck
Ordnung <Mathematik>
Computerunterstützte Übersetzung
Sichtenkonzept
Computeranimation
Datenhaltung
Touchscreen
Tabelle <Informatik>

Metadaten

Formale Metadaten

Titel Making Data Dance
Serientitel RailsConf 2015
Teil 71
Anzahl der Teile 94
Autor Clark, Barrett
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/30681
Herausgeber Confreaks, LLC
Erscheinungsjahr 2015
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Rails and the ActiveRecord gem are really handy tools to get work done, and do it quickly. They aren't a silver bullet, though. Sometimes the database is the best place to run a complicated query. Especially when you have big transactional or time-series data. Databases are also incredibly powerful, and can do remarkable things at scale. Why not take advantage of that power? This talk will discuss some of the things that you can do with PostgreSQL that will both blow your mind and also help you transform data into knowledge. Let's have some fun playing with data together!

Ähnliche Filme

Loading...