If you can't Beat 'em, Join 'em!
Formal Metadata
Title 
If you can't Beat 'em, Join 'em!

Subtitle 
Integration NoSQL data elements into a relational model

Alternative Title 
If you can't beat 'em, join 'em (... a pun)

Title of Series  
Number of Parts 
29

Author 

Contributors 

License 
CC Attribution  ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and noncommercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license. 
Identifiers 

Publisher 

Release Date 
2015

Language 
English

Production Place 
Ottawa, Canada

Content Metadata
Subject Area  
Abstract 
Why, when and how you can integrate documents and keyvalue pairs into your relational model There is a pitched battle going on between the relational, documentbased, keyvalue and other data models. PostgreSQL is uniquely capable of leveraging many of the strengths of multiple data models with JSON(b), HSTORE, XML, ltree data types, arrays and related functions. This presentation outlines the usecases, benefits and limitations of documentbased, keyvalue and hierarchical data models. It then presents practical advice and code snippets for incorporating them into PostgreSQL's relational framework. The presentation ends with SQL examples and code snippets for loading, accessing and modifying (where possible) JSON, HSTORE, XML, ltree and array data types. This presentation begins with a very quick review of the rationale, benefits and implications of the relational data model. It then does the same for documentbased models and hierarchical models. The balance of the presentation works with three publicly available data sets, worldwide airports, Wikipedia Inbox keyvalue pairs and Google address JSON objects, showing how they can be be incorporated into a simple relational model. The presentation also includes snippets of code for loading the files and accessing elements. The full SQL, and shell code will be available on the web site.

00:00
Presentation of a group
Sequel
Relational database
Mass
Type theory
Mixed reality
Business model
Integrated development environment
Hydraulic jump
Beat (acoustics)
Metropolitan area network
Stress (mechanics)
Information technology consulting
Bit
Basis <Mathematik>
Port scanner
Statistics
Element (mathematics)
Inclusion map
Process (computing)
Computer animation
Integrated development environment
Personal digital assistant
Business model
Quicksort
Curve fitting
01:33
Presentation of a group
Graph (mathematics)
Multiplication sign
Mereology
Subset
Expected value
Etext
Uniformer Raum
Curve
Theory of relativity
Relational database
Moment (mathematics)
Shared memory
Fitness function
Data storage device
Port scanner
Term (mathematics)
Flow separation
Order (biology)
Right angle
Data type
Resultant
Sequel
Software developer
Virtual machine
Maxima and minima
Division (mathematics)
Average
Discrete element method
Goodness of fit
Term (mathematics)
Business model
Analytic continuation
Electronic data processing
Focus (optics)
Graph (mathematics)
Server (computing)
Graph (mathematics)
Information technology consulting
Database
Density of states
Singleprecision floatingpoint format
Film editing
Computer animation
Search engine (computing)
Query language
Boom (sailing)
Revision control
Natural language
Key (cryptography)
Oracle
04:40
Mobile app
Table (information)
Software developer
Multiplication sign
Relational database
Power (physics)
Data model
Mathematics
Different (Kate Ryan album)
Touch typing
Business model
Data structure
Theory of relativity
Matching (graph theory)
Data storage device
Database transaction
Bit
Density of states
Plane (geometry)
Mathematics
Word
Computer animation
Different (Kate Ryan album)
Right angle
Table (information)
Row (database)
05:37
Ocean current
Presentation of a group
Sequel
State of matter
Multiplication sign
Relational database
1 (number)
Virtual machine
Dynamic randomaccess memory
Element (mathematics)
Power (physics)
Formal language
Revision control
Mathematics
Term (mathematics)
Googol
Set (mathematics)
Business model
Software framework
Gamma function
Selforganization
Metropolitan area network
Multiplication
Theory of relativity
Relational database
Software developer
Consistency
Interactive television
Database transaction
Database
Port scanner
Mathematics
Data model
Process (computing)
Computer animation
output
Right angle
Business model
Quicksort
08:42
Standard deviation
Service (economics)
Computer file
View (database)
Virtual machine
Mereology
Element (mathematics)
Mathematics
Positional notation
Different (Kate Ryan album)
Singleprecision floatingpoint format
Energy level
Objectrelational mapping
Standard deviation
Key (cryptography)
Server (computing)
Database transaction
Mereology
Port scanner
Open set
Singleprecision floatingpoint format
Positional notation
Computer animation
Energy level
Object (grammar)
10:20
Trail
Presentation of a group
Table (information)
Computer file
Link (knot theory)
Structural load
Real number
Similarity (geometry)
Field (computer science)
Strategy game
Information
Data type
Serial port
Link (knot theory)
Forcing (mathematics)
Structural load
Computer file
Projective plane
Sampling (statistics)
Data storage device
Ext functor
Database
Bit
Subject indexing
Computer animation
Website
Right angle
Key (cryptography)
Table (information)
Row (database)
12:29
Point (geometry)
Addition
Computer file
Link (knot theory)
Sequel
Information overload
Mountain pass
Interactive television
Maxima and minima
Directory service
Electronic mailing list
Limit (category theory)
Shape (magazine)
Mass
Mereology
Valueadded network
Order (biology)
Information
Statement (computer science)
Data structure
Gamma function
Text editor
Message passing
Metropolitan area network
Link (knot theory)
Graph (mathematics)
Computer file
Electronic mailing list
Ext functor
Peg solitaire
Directory service
Port scanner
Singleprecision floatingpoint format
Sign (mathematics)
Word
Computer animation
Personal digital assistant
output
Selforganization
Right angle
Energy level
Remote Access Service
Table (information)
14:44
Computer file
Interior (topology)
State of matter
Multiplication sign
Mathematical singularity
Cellular automaton
Limit (category theory)
Counting
Hand fan
Theory
Revision control
Order (biology)
Finite element method
Sign (mathematics)
Computer configuration
Operator (mathematics)
Aerodynamics
Message passing
Metropolitan area network
Forcing (mathematics)
Operator (mathematics)
Subject indexing
Computer animation
Statement (computer science)
Right angle
Energy level
Key (cryptography)
Procedural programming
Object (grammar)
Reading (process)
Row (database)
17:16
Presentation of a group
Multiplication sign
Real number
Mathematical singularity
Price index
Abstract syntax tree
Medical imaging
Order (biology)
Latent heat
Operator (mathematics)
Ring (mathematics)
Square number
Text editor
Arc (geometry)
Chisquared distribution
Metropolitan area network
Operator (mathematics)
Drop (liquid)
Ext functor
Port scanner
Term (mathematics)
Portable communications device
Element (mathematics)
Subject indexing
Process (computing)
Computer animation
String (computer science)
Revision control
Right angle
Key (cryptography)
Object (grammar)
Musical ensemble
Game theory
Block (periodic table)
Remote Access Service
Row (database)
19:34
Metropolitan area network
Table (information)
Key (cryptography)
Ext functor
Abstract syntax tree
Euler angles
Element (mathematics)
Element (mathematics)
Position operator
Plane (geometry)
Dynamic Host Configuration Protocol
Computer animation
Different (Kate Ryan album)
Operator (mathematics)
Software testing
Data structure
Gamma function
Chisquared distribution
20:21
Metropolitan area network
Presentation of a group
Key (cryptography)
Mathematical singularity
Maxima and minima
Operator (mathematics)
Cloud computing
Abstract syntax tree
Price index
Drop (liquid)
Ext functor
Euler angles
Arm
Element (mathematics)
Variance
Subject indexing
Order (biology)
Computer animation
Operator (mathematics)
Remote Access Service
21:09
Position operator
Metropolitan area network
Computer animation
Logic
Ext functor
Right angle
Euler angles
Element (mathematics)
Arm
Smith chart
21:51
View (database)
Multiplication sign
Water vapor
Element (mathematics)
Finitary relation
Ranking
Message passing
Form (programming)
Metropolitan area network
Theory of relativity
Trail
Key (cryptography)
Information
Structural load
Quark
Code
Grand Unified Theory
Machine code
Element (mathematics)
Length of stay
Word
Computer animation
Interface (computing)
Right angle
Table (information)
Integer
23:25
Trail
Table (information)
View (database)
Maxima and minima
Price index
Limit (category theory)
Icosahedron
Arm
Valueadded network
Tabu search
Ring (mathematics)
Gamma function
Message passing
Chisquared distribution
Metropolitan area network
Trail
Lucas sequence
Drop (liquid)
Portable communications device
Mathematics
Subject indexing
Length of stay
Computer animation
Right angle
Table (information)
24:08
Point (geometry)
Table (information)
Computer file
View (database)
Maxima and minima
Price index
Icosahedron
Regular graph
Field (computer science)
Smith chart
Tabu search
Mathematics
Array data structure
Escape character
Operator (mathematics)
Operating system
Data conversion
Extension (kinesiology)
Message passing
Hydraulic jump
Chisquared distribution
Metropolitan area network
Theory of relativity
Trail
Executive information system
Data storage device
Operator (mathematics)
Bit
Port scanner
Cartesian coordinate system
Element (mathematics)
Mathematics
Computer animation
Website
Right angle
Key (cryptography)
Quicksort
Table (information)
Data type
Form (programming)
Row (database)
Extension (kinesiology)
27:16
Metropolitan area network
Trail
Real number
Expression
Maxima and minima
Limit (category theory)
Euler angles
Hand fan
Array data structure
Sign (mathematics)
Computer animation
Gamma function
Table (information)
Message passing
Chisquared distribution
28:09
Metropolitan area network
Trail
Computer animation
Sequel
Information systems
View (database)
Limit (category theory)
Quicksort
Message passing
Valueadded network
Chisquared distribution
28:54
Context awareness
Latin square
Graph (mathematics)
Mathematical singularity
Sheaf (mathematics)
Price index
Inverse element
Turtle graphics
Order (biology)
Strategy game
Set (mathematics)
Circle
Chisquared distribution
Metropolitan area network
Trail
Information systems
Transport Layer Security
Latin square
Data storage device
Drop (liquid)
Process (computing)
Right angle
Quicksort
Remote Access Service
Row (database)
Frame problem
Trail
Functional (mathematics)
Real number
Maxima and minima
Limit (category theory)
Electronic mailing list
Automorphism
Operator (mathematics)
Subject indexing
Business model
Message passing
Ćquivalenzprinzip <Physik>
Mathematical optimization
Graph (mathematics)
Operator (mathematics)
Euler angles
Subject indexing
Word
Computer animation
Personal digital assistant
Video game
Musical ensemble
Integer
32:04
PC Card
Digital filter
Functional (mathematics)
Table (information)
Link (knot theory)
Graph (mathematics)
Interactive television
File format
Maxima and minima
Price index
Icosahedron
Storage area network
Valueadded network
Smith chart
Number
Uniformer Raum
Set (mathematics)
Gamma function
Arc (geometry)
Metropolitan area network
Link (knot theory)
Graph (mathematics)
Theory of relativity
Trail
File format
Data storage device
Database
Bit
Port scanner
Flow separation
Element (mathematics)
Degree (graph theory)
Computer animation
Query language
Video game
Right angle
Table (information)
Integer
Form (programming)
34:17
Point (geometry)
Trail
Group action
Table (information)
Quantum state
Link (knot theory)
Cloud computing
Price index
Limit (category theory)
Arm
Smith chart
Order (biology)
Term (mathematics)
Ranking
Gamma function
Chisquared distribution
Metropolitan area network
Link (knot theory)
Scaling (geometry)
Trail
Weight
Euler angles
Computer animation
Data Encryption Standard
Quicksort
Table (information)
Row (database)
35:11
Metropolitan area network
Computer icon
Link (knot theory)
Mountain pass
Multiplication sign
Grand Unified Theory
Subgroup
Computer animation
Query language
Cycle (graph theory)
Gamma function
Recursion
Message passing
36:19
Metropolitan area network
Link (knot theory)
Information systems
Data storage device
Code
Maxima and minima
Special unitary group
Control flow
Revision control
Medical imaging
Computer animation
Gamma function
Message passing
37:08
Point (geometry)
Graph (mathematics)
Multiplication sign
Interactive television
Maxima and minima
Function (mathematics)
Arm
Valueadded network
Subset
Tabu search
Web 2.0
Insertion loss
Mathematical optimization
Chisquared distribution
Metropolitan area network
Computer icon
Link (knot theory)
Theory of relativity
Information systems
Control flow
Arithmetic mean
Computer animation
Addressing mode
Software testing
Row (database)
38:19
Sequel
Graph (mathematics)
Parameter (computer programming)
Rule of inference
Area
Coefficient of determination
Type theory
Singleprecision floatingpoint format
Authorization
Metropolitan area network
Link (knot theory)
Electronic mailing list
Database
Bit
Hecke operator
Price index
Statistics
Type theory
Spring (hydrology)
Computer animation
Search engine (computing)
Software testing
Key (cryptography)
Quicksort
Remote Access Service
Task (computing)
Resultant
39:40
Presentation of a group
Code
INTEGRAL
Direction (geometry)
Source code
Relational database
1 (number)
Mereology
Disk readandwrite head
Computer configuration
Finitary relation
Office suite
Exception handling
Covering space
Normalform game
Real number
Structural load
Gradient
Fitness function
Data storage device
Database transaction
Port scanner
Element (mathematics)
Type theory
Arithmetic mean
Process (computing)
Selforganization
Energy level
Quicksort
Online chat
Spacetime
Sequel
Real number
Gene cluster
Virtual machine
Student's ttest
Rule of inference
Field (computer science)
Number
Element (mathematics)
Causality
Term (mathematics)
Boundary value problem
Data structure
Objectrelational mapping
Address space
Scaling (geometry)
Key (cryptography)
Server (computing)
Forcing (mathematics)
Interface (computing)
Projective plane
Affine space
Word
KeilfĆ¶rmige Anordnung
Film editing
Computer animation
Personal digital assistant
Logic
Business model
Object (grammar)
Table (information)
00:00
better literary I think hopefully get over the at the post lunch why they try to jazz this up a little bit it has been pointed out that the presentation isn't exactly as described on the PGA counts at all but world here so for the time of by doing Jimmy Hansen of this is if you can't beat join upon in integrating no sequel interposed pressure and you'll see you just jump
00:36
right we'll start with we have about a quarter of the presentation is unavoidably sort of theoretical and conceptual and the 1st is the the no sequel check on everything from a magic panacea to a hot mass up and I the basis for the rest the presentation is surrounding assertion that it doesn't matter it's it's it's a really useful in some situations and not in in others but it's here that that's really the promise this is argue for or against this is no sequel is and it's our job from what poststressed authorized to be able to use POST rests with no sequel so how can we thrive in this environment and and sort of armor searching I'm asserting that the answer is to know where our sweet spot is in the relational model and we have a lot of capabilities and accept the edge cases of 0 stress isn't run Google search but actually here how to right so rather than the hype
01:36
curve I did this is my nose stages and no sequel acceptance this is like finding the time so again no single is but it doesn't kill relational databases certainly doesn't kill poststressed we have more capabilities than other relational databases in the marketplace and I wanna show you ways that we can plot on top of right so on the trying to turn the traditional presentation or that the traditional focus on a dead thing that you ordinarily we given that rest as we are posters conference popular leverage no sequel and what I'm trying to do for least 40 minutes is to reverse which is to say given that knows he was out there in the business world on how can I leverage PostgreSQL to make it work
02:29
right now I'm much of the stuff Martin Fowler's presentation you to read the Catholic in his book is also pretty darn good he coined the term Polyglot Persistence which of he solid I bought it and I'm sharing it here so I actually so here's poverty in several be no sequel as I said can be all these other things were 444 the next 40 minutes it's just these 5 things I Document store databases were gonna look at Jason for us to be XML as well a wide column store or but we're not going do that but that didn't make the cut keyvalue pairs at 8 store data process on graph database that's really not a datatype but it's a recursive query going to go through 1 of those the end and the Apache solar serve as a search engine on put applied for and flying share of PG from PG continuity read had a talk on ranking generating ranked results with natural language queries with full text search but that's not in this presentation you away it really focus on document keyvalue pair and the graphs for right so part of it is and we gotta go double damages from moment here on basically we need to make sure we all know the 3 critiques against relational databases that we have to know what other technologies are saying about us in order to refute that the 1st is on yeah you know sequel because there's too much data for 1 or a few machines to process on that I know the smart people who disagree with this but from 1 for the rest of presentation if it doesn't fit on a few machines it's not a good candidate on for relational that's really an education for our purposes i put out for sharding and I know the charter can be done in relational model but it's very difficult and no sequel has for the expectation showing I realize there's a lot more to the story but for our purposes to conquered to our
04:41
2nd because relational model stores data in a little tiny pieces in lots and lots of different places it gives a transaction power this but it doesn't match oriented approach the example here is if I was building a desktop app I would like to have a method that says you get desktop or post desktop on but in a relational model you really got in touch the to the little pieces and that's a bit of a pain in the arse and and then
05:14
through the last 50 years because it is difficult over time as you data structure for your business needs change to keep it all in a relational model in other words you can't have 1 table this as as a no now i'm changing the structure but I wanna keep all the old records in the center of art you didn't come to poster press conference to hear bad stuff so that's the end of that just 1 of the right the other side there
05:39
is where strongest contender ourselves the 1st is where an incumbent we are everywhere Google has relational databases we rock the house we have an extremely powerful interaction language sequel but it's widely known you need expertise in all over the place is mature and it still has her development witnessed this parties we're transactions are real ones down at the data element that's pseudo which transaction like eventual consistency real low crap transaction making changes visible everywhere on and then this last 1 because I get to the cool kids play with pool size data but most of us really don't like you can handle comfortably multi terabytes data you really only need that and the petabyte side but and is not that many datasets that large around on and then this last piece is is that we're better at finding relationships between data elements in there are times when no crop you do want to stop that from the wonderful thing about a glorious flexible data model is you have to think about anything in Pollock and rapid which don't always actually want so anyway this is 1 of the but a framework of this is where the the the current state of relational version of right so where does all that stuff on Martin Fowler coined the term Polyglot Persistence and the short his argument and and I agree with is that we have relational databases and no sequel database in our job is to match the business needs plus the data to the technology and I'm not argue that a lot of that does fit very very well in the post and this presentation is is really about finding were integrated in demonstrating how to encode input how to integrate it and again this is this is really what it is I get 4 hits 2 in college when the privatize I need pose not survive I needed to try various selfishly very personally so that really means we have to make this no sequel stuff work so we wanna thrive not just survive in a world that includes the sequel question some so anyway here's all that
08:07
stuff in 1 shot so that amount of data size a business transactions basically PostgreSQL relational database model with the other 1 machine but not really suited if it goes beyond a few machines and we'll talk about transaction sizes of roughly document size important to point out no sequel really doesn't have the capability to change data elements if it's smaller than a document that I realize this is an evolving sort of technology but but it's not wellsuited that's not it's sweet spot so therefore we were down now
08:45
tho the technical part hopefully more fun part saw scenario is we would just given a million Jason files and we want to know what to do and and by we wanna know what to do with it that's the universal service do I really need that among the DB so just had 2 questions can dataset easily fit on 1 or a few machines have gas and then how large the business transactions and almost all universal answer to that is on and were perfect for poster at 1 of the business transactions because we get the object relational mapping in and you wanted you want a business transaction you know if you get started on a view of all the individual pieces of but most people don't know all that stuff and no single technology doesn't allow you to change individual pieces but PostgreSQL allows you to either update the whole document or the pieces to get more flexibility but so quick review and J. soundobjects JavaScript Object Notation of all it before basically we're looking at key value pairs Australian here you can have raised as as elements with Jason everything
10:03
were saying for would also apply to XML amount for conceptually they're just different standards documents but the XML is not nearly as cold so it's not in here hope that the that captures the the level of discussion that's amount of
10:23
so now the what's in our dataset where I wanted to do this presentation in real with real data because I wanted to force index usage wanted to show you a real data is is always inherently flawed is just the way it works so that's a good thing for us on the show you what it's like to load so there's a million song database out there that the public works projects like there the sample data set 1 % of roughly 10 thousand rows and we had a track I be an artist a song title and then there's 2 will convert store but the 2 are a field 1 similarity the like Pandora so this song this strategy is similar to the strategy with a certain way and then you have tags which is you know this song is a rock song value 88 this summer's alternative value 67 but will dive into that so that's basically that the corpus of documents are popular songs right so how do do this and by the way we removed through some see for example the realtime but all that's posted on the website so don't don't worry about any particular syntax based on create a table and is used by just per mole Jason file and we loaded with the copy command but that's invented here that is used to escape i have embedded trademark thing as an artist with an apostrophe yes so that's that's real datasets of anyway that gets around the problem from when these days and be a because we should and that the left right and so on when load the stuff there requires a little bit Linux work but not to that we just you know it's a zip file just unloaded and then our create symbolic links to underneath PG data I'll show you why it's convenient for me I'm interested in 1 that starts coming on me show you what I mean so we are seeing
12:34
this graph the last song in anyway it unzips to adjust at a hot massive it's a song is a song that a a b songs from this really nasty the nested case and then you just get that each piece of which adjacent files and I turn it into symbolic links here I did that because I want to propose rests on a lot of ways you could do this but I wanted to within inputs I use the PG Alistair command which from the from some postresidency gives you a listing of whatever files the directory and I use that to generate my copy can so in other words what I wanted the whole point of this is on estate I have a nasty hot mass of this complicated directory structure I want no part of 1 Linux command gives me all the symbolic links and a selfgenerating sequel command gives me overload right filter so that's that's I created a table find shape here's if you haven't used it Austin command it only works under PG data so we use this to documents C R O 4 is that the problem of anyway what I wanted to show you that all I did I didn't move files have been copy files someone handicraft another of the set up of files in a directory I couldn't see within PEG data I was another PG data don't move anything don't do anything just so note I am so that's little 1 of
14:45
those so if you'll see later but it it looks it's genocide up procedures and come to my father and Honduras have 3 at 93 hemocytes goal I so we don't have a lot of knowledge browse because the the the 1st answering the 1st question really when adjacent houses within the answer is generally act so we use Jason BIG keys that's all the outer layer tags in your final so it's a great way to explore the have on here is to operators to pull the top 1 with the 2 read and signs returns the text object the other 1 returns adjacent object you use the return date someone if you wanna national and this is the pragmatic so this statement says they give me all the records that have the article for watching this and C will be right so we don't 90 100 files so but so the 1st within a day some files theory those all the tags artist a similar is is is like the Pandora piece from it let's look at how many are to come song each artist so it is that this is our dataset it's randomly chosen so make no comment on that 1 but that you can see how many different songs from each piece um and let's see
16:23
what we can show on the show you some of the weighted index to make it to make it faster but to get fast version of star so this is how Amazon the force on February monkeys on 1 thing I want to point out that is the rubber mallet search technique which is converted to text and news and I like that but it does work function to other options some say here's where but if the value of the article that the artist had his art of monkeys therefore songs and then the same people were looking at right here I have adjacent had that's what I wanted to did it varies each time but this is now roughly notes so it's at the trust and this 1 adopted 83 ms and a dozen years and but that states I
17:18
wanted you know better faster so create an index of 1 the 1 point out but no thought involved and it's pretty easy 1 index the whole day some it and what I wanna show you is but trust me it does not speed up the text search but look what happens here when I use but remember so what we're doing in this square here is we have an index on adjacent objects I don't care which says it is the coordinates the holder and then and on show me with everything that has that tag their wicked it when there from roughly 80 something 2 . 8 on so hot 99 times faster no thought process are very easy to easy to maintain so we just play rental that will stroll for data on that question mark operator there is that a show me all records that have this tag are dataset they all have the same type of of not required it's easy to say the other records that have this tag images that 10 images from and that just as you know what are the right so let's look for a specific song right here well you can match by tag ID pop now we been
18:47
wrecked you knew we weren't going to get through music presentation about the sale of right yeah I can't imagine it was an accident but anyway nobody say from from a role on so let's look at that tags here let's explore that because we know it's 1 time but it's really about to up here so what we're seeing is written ghastly is 100 % 80's music can argue that come and because this is real data it means you get things that make node him sense but it definitely is not ruled that men metal but will say that is that is almost a game that the at the advantage of of real data is that these work so it let's blow
19:45
that up with on what I wanted to show you 1 operator I pull what had been an array of key value pairs pull them out road operated so here's all my difference as you can you can see it has will get to it later but it has the the structure of an 8 story and then erase perfectly valid to state from within
20:08
but now that we know the user erased we now have access to a whole bunch of array operator selects for seniors plot that this 0 for the 1st element in the 2nd 1 we need 1st the purpose of of
20:30
the of the of the fact that fact so when you have what you see in here so even if I say 0 here's the easy way to explore all the tags the outside but is not fast enough is easy weighted index enough on on that 1 I see 1 of those pieces is pretty complicated it's an array of key value pairs now I can put them 1 operator into the role and I can pull different pieces of on what getting and as you can presenters however you want to so here's another
21:11
logical piece from let's say I want all the rocks and that's
21:17
really or saying there is every song with the tag rock it's not that optimized on so it's a little slow what you're seeing here is so this is everything that had been the tags array a tagger rock all but only happy with that because some of these are in my judgment really write songs I what that's what changes what about really rocks and what that says is by songs that are at least twothirds right so the
21:52
2nd element has to be at least you know it needs a rock tag and then the 2nd element needs to be at least 2 thirds so these should be really right hard to understand so now you have whatever the value is high so we're deep into adjacent pulling out individual at pulling out comparing individual values of this array of key value pairs always equal rights and again the
22:31
syntax is all posts right but it's a it's huge enough water for working with high right now to start if mystical whatever I say the word and the instinctive reaction while you're there are some technologies in a crap load times of old codes that needs you to present this information in a relational form and we said we need to integrate it so let's explore that how would I take these J. some files by loaded pretty easily and make them look like they're just a normal relations on you can create a view also create a materialized view if you want to and then it will look to the outside world just like a table and the key piece in this 1 is are not going to do anything ahead of time and you can do is just by exploring data back so it is a simple
23:30
1 but tragedy artist and song 5 now the same
23:41
data of but it looks like a table again I didn't creating and presenting it so this allows me to integrate with my API with other tables whatever whatever I want that that each set right if 0 1 at index it or do other things a materialized view let's if I
24:02
had that I would definitely index on track idea because the intention would be I wanted to behave just like a traditional table
24:13
and that's all you need to work so you can also refresh the materialized view whenever you Jason files 1 or whatever your data involves changes so so let's look at this right here Clapton dancing in anyway my point on this 1 is we made a we can fast queries less than a millisecond and 10 thousand rows of data that I imported 10 minutes ago is Jason files didn't copy many where my operating systems and now it looks to all the rest of my application in the world just like a regular relation that table a well indexed all that stuff is there so jump a little bit farther from but wanna get on that 1 is not really giving up much by keeping in a relational another at the fact that you have posters we have the operators to explore the adjacent but you also have this extra stuff that you can't do on those by both jump back site so those those 2 field those 2 pieces of modulation had from and the similar so remember the similar as is where you want the depend which of those really look like keyvalue pairs don't think so let's walk down the H store past maybe I can presented this data is suppose instead of a traditional table relation only user h sort of so on to create a song on the same here I know that track the artist and title are all their so chromatids regular can't make sense but I haven't 8 store for here's all your similar and then a store for here's all the tax and I will use only Jason V. and H store operators to do the conversion so no Python no cold no PG PG PL SQL not just piercing book on or you do it remember we had the adjacent array elements you can pull out the pieces and you have in a store eights stores both the data type and an operator unfortunate but it is what it is and the R. S so we're gonna blossom all apart from the j with the array elements put putting together with array and and then make him as a change as a as he's an H 2 I contend that requires the extension H store retrieve there here's my table
27:14
and a a little bit a sequel through but not to from you see I
27:20
have use a common table expressions and then it's just a syntax piece that I couldn't use the I I had to put the breaking up of the pieces that array elements into are a common table expressions but anyway you'll see in to supply and none of this is optimized right so we have a 48 hundred but then I should point out that not all the songs are dataset have either of these 2 2 which is also a sign of real data it's not uniform so that's it is that we're trying to
28:06
do in 5 German and
28:16
Austrian real estate quite we have some syntactic that's a story this is extreme 50 hiphop sixties so that's exactly what we had hoped for and then to work and then we want to do the same thing with similar so on this sequel got a little weird on me for that 1 and I we're just overly complicated so I created a view and this is just a sort of an artifact to make the sequel simpler for through but show you where this goes in
28:54
and on of so I have another collect it anyway what I want is a so this is what the similar case says that track ID is similar to this strategy with that would make sense for traditional graph based on this last feature here brings it all together and then we will check out songs assumes that comes back from the lunch Our 1 1 here right so we have rows and not everything has similar song that we would expect from all the wonderful have tag so you can rows and we now build our same day we converted to N. H. store model right on which brings us into our next section gives a whole bunch of new operators and functions on right here is that the H store operator for everything with the title Latin of this is sort of the equivalent this as I what really Latin music which is a tag of Latin and a value of over 60 so that to those 1st thing is index is also simple in this case I would index track ID and the 2 H circle I'm showing the Turtle reasons to choose g inverse is just the complement here
30:45
for context the both were fun so let's look for just have really that's of less than a millisecond in this 1 by the way you know thought indexing other than I put in the index of my age to out no more thought process than that a less than a millisecond I get all my really Latin songs and an open dataset has idea you can see it's it's basically again I chose an arbitrary value it has to have that tag over piece of pretty powerful stuff that was pretty easy and again the optimisation piece the answer is I don't know I just open index on it and it seemed to work well which is what we're after simple simple simple right so here it is another another question what all the tags and all the songs by the artist monkeys well coming you'll you'll see I like this piece of the real data from extinctions have problems it's got but that that's real life for us in other words all the syntax examples from the publication of the work or too simple they don't anyway that's what I wanted to show you this is every 5 every tag but among right so that we just did
32:07
adjacent which is documents we propose an XML because it's easy to take them all down but actually all these pieces really apply was slightly different syntax to XML function are similar in and it is still a valid and useful piece we converted the same dataset internally over to a store number last piece of promise to show you was really a graph database so let's say we have similar so let's see if we can do that on neo for J is the market leader in this 1st segment of no single data and I wanted to point out for infinity of PostgreSQL and it's really relational data are suited for graph life and therefore J is really really powerful peaceful initially here and accepting their own little bit a crappy performance we'll find links and our graph that's 5 degrees of separation that would at least get us to Kevin Bacon onedimensional so anyway just for that provides a right so when you use the recursive query whether the recursive format I just took it right out of the book were Cinderella posters documentation on on the filter our dataset a little bit and the limited to only rock songs and limited to write songs that have a strong relationship between what you see in the tag they're related to every thing and and I really want to filter our dataset it to make it more manageable and so we're going to enter that burning burning question is there a path of related songs from lady gaga poker face to Justin Timberlake what goes around comes around no remember you don't get this kind of insight from see I just want to make sure I just I just want to make sure we know where where bread and butter knife
34:11
thank so I will watch clear I created table
34:18
strong rock links and what I'm saying is here to stay where the weight is greater than it had a greater than point 1 5 I just a different scale that the the link weight goes from 0 1 and where the tags rock by easy so I got 16 thousand rows um I don't know what
34:41
I need an index on but it seems like the idea in the link will work in terms of the action that used to promote and this is what that sort of table looks like so so Stevie Ray bonds crossfire is related to a song a track with this strategy and in this way make sense and it's there all related to themselves with a weight of 1 2 surrogate detail now I This can
35:18
run along Israel's day but this I put in the ending song right here lady gaga I I see that with that because I didn't wanna ask all relationships because along with it across join in and I did I follow the example in the book to make sure that it doesn't cycle of recursive queries can cycle on a lot of really smart people wrote documentation and tell you all about it I and I chose ahead of time I know it can have a depth of 5 they just reduces my my filters that because you can there are other relationships from lady I got that are less than 5
36:00
and what you can see it producers and war and when that when the query of the song by song a rotor subgroup to different query to say hey where do all the where all relationships come from and that is that this is
36:26
I think but they call awkward you have to trust me on this that may be overcome by looking at only with
36:36
surprise yes regret version
36:42
is expensive and anyway and this slowness on that's all on it is on POS presses isn't very very good J. Sun XML H store place it is an acceptable recursive place but the good thing is a lot of data to just acceptable so what we're going see is we have images come that work that the
37:15
output of the there are 3 what you
37:22
get is a bunch of it you hit the relations between the 5
37:29
steps that lead up to
37:31
various insightful business knowledge that Justin Timberlake is related with the value of point to a to the Black Eyed Peas who are in turn related with value . 1 2 Britney Spears who goes down the reality and all the way to live sound my point is this true real example is not an easy piece of data to get I mean it's it's onto the but that's 5 different layers in a couple attend hours in a row records that that kind of VM with no optimization didn't come pretty didn't come certainly web but it did culminating in an acceptable amount of time on makes sense so that's our
38:22
it did the dog that's but
38:24
anyway what what you get in this past that is a commadelimited it's an array of a commadelimited list of the passage walked and to get from 1 to the other I the spring
38:37
out from that a couple things 1 is the assertion of the presentation of takeitorleaveit but my faces there's a bit interested is 1 no single is here to stay but it is a pointless and ultimately sort selfdefeating exercise to say how much we hate no sequel and we wanted to go away and we see other contenders and this 1 is no different in relational slay this 1 2 on the lots of arguments to say no sequel is different it will be used but but the other side that is it's not indicators that we need to play nice verified which types of rule of no single databases on posters plays well with 4 of I didn't see this search engine piece we have really a heck of a capability with posters fulltext search really really nice ranked results authors of the other pieces we played really well with document keyvalue pair we played OK here and really well with certain summary to but you can
39:43
load no sequel data but it doesn't cut you off from the relational world does that make sense so you don't have to say the the adjacent or a table poststressed gives you both in 30 minutes nothing else does or almost no other technologies that has a officially nothing but but really that's a pretty remarkable pieces that means you have a whole lot of embedded code that you don't need to throw in a hand motion real no credit datasets fit you know you really have to go above a bunch of terabytes to make this impractical are sweet spot on data that fits well on 1 or maybe a few machines on transactions you the whole document workpiece within the document I you want to enforce some referential integrity on number in loading adjacent provided enforced any there that that's really a business to it but we have that option but anyone affine relationships within the data that that's a a powerful suite but on on the most organizations with these 4 criteria call real did you know I mean Google Yahoo sales force that come with great headlines wonderful pushing the boundaries of technologies but also age cases for most us posters plays nice in what at least in the majority of my office on any 2 asserting if you're buying it at least education to no sequel is not useful for post yesterday we can do that to to everything but is we can do so much so well by from and then I guess the last piece is the rules are more complicated now it's it's just it's too simplistic to continue down the thought process of of St. David's not 3rd normal form urinated and it's wrong but I don't know how much that is still out there I run into it on occasion I think the world has changed you agree we have to expand the and that we have more capabilities and poststressed than the other guys with leverage them from another implication is you can't do that unless you know your business needs and your data the data itself alone won't explain how to use you have to invest more in of I know it's not pretty it's not even an appealing answering a very technical conference but the the answer we're stuck with if you're the leverage posters of most people capabilities you gotta invest some how business users it what the structures of Our my objective again remember 2 kids in college 1 and the other guys will will this used to arrive if you just get by you have to get some of these other guys and that doesn't happen if I use a note we see a lot other contenders no sequel you know will blow away or a we can do everything postresidency 10 thousand the clusters and things like that you smart people may disagree I don't think that the sweet
42:48
on questions the rate of OK if you if you know what you want to know that you have a lot of 1 of the U I and the direction of motion of the the of the of the of the of this yeah but but but I like to work with but I like to hear that because a lot of real problems are manageable like that and if you know it seems lady gaga space trivializes the example but it's not so easy is inherent in any other questions and when using J. some news using the stuff that I would assert that next year's conference will see more and more on the head or or or you know you like the meaning of it all you need to think about what people were doing was more than all the word find their efforts on it and it all in 1 of the death of a lot of money yeah we're all of you or some of you to a multipart question fortune on my answer I don't know but in terms of updating adjacent elements by posters 9 5 is just around the corner clearly there's a huge still going on next month so you have to buy that now so I I I leave that to the 95 has a lot of that stuff that you talked about on I have not integrated with Mongo DB were playing with it home I know I have used up Python and Google's addressed providing and so when you look at an address it with Google search that give you Jason object and we're back and I used the that the Python interface in work grade I just plug my Jason logic natively it with the noted exception you have to every real piece of data has embedded field delimiters I didn't know about that that copies syntax so I had to work my way around it on but I have said from a JDBC type source was Python not Mongo alluded indirectly I'm happy to talk after I didn't use it wasn't in this conference sort or in this presentation serverside project on but Python talking to Google to give an address a pullback is adjacent object and storage in post it works no problem at all it it was a lot of yes no yes yeah my my assumption is that works of Python it'll work with with much but other things I did that for the obvious reason because it's easier and the answer the so perfect so that that's Polyglot Persistence that's saying Mongo can can Condor party but we're still apart because they're still pieces that PostgreSQL knew that the other ones can do any other stories from the head of the of the the next 1 of the all of the data is still being held on so they know you you're that that was the year that the critique of that that this object relational mapping that had the rash us hibernate work on its real and then some students having taught solution coming out of college and on and what to say about that 1 but all of can so all right so if you want 1 person speeds all idea for that is I only talked of use whenever I couldn't applications like that the only thing that is my object relational mapping but I am very disciplined I never ever talk directly to the table cause I'm always wrong when I designed in the beginning that's my methodology is to do a wrong on that's all I can I also learn from or apps I worked a lot with them and they also allow you to do it yes yesterday on also yeah this lecture we really going to have have but so that in this presentation is on the scale OK the sequel finally broken center I'll OK because you know this thank you out all all all talk to Dan amount has you need that and it took me a while so part of it is I'm going to get a good please download please tell your friends on it took me a while to get some the syntax is awkward in part because you know it's sequel doesn't cover this sort of stuff as so on her own wilderness but it is possible so shamelessly copied my code because I shamelessly copied other people's but it works and the other pieces going into those arrays PostgreSQL rocks array once you get remote that the tags in a similar pieces were home when get that so another duet with Jason was that hey tag that's just an array of key value pairs that's my good place I know how to work with the arrays in poststressed complied individual elements and and all that stuff like that this is not that that's good it helps put a fork as were done