If you can't Beat 'em, Join 'em!

Video in TIB AV-Portal: If you can't Beat 'em, Join 'em!

Formal Metadata

If you can't Beat 'em, Join 'em!
Integration NoSQL data elements into a relational model
Alternative Title
If you can't beat 'em, join 'em (... a pun)
Title of Series
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
Why, when and how you can integrate documents and key-value pairs into your relational model There is a pitched battle going on between the relational, document-based, key-value and other data models. PostgreSQL is uniquely capable of leveraging many of the strengths of multiple data models with JSON(b), HSTORE, XML, ltree data types, arrays and related functions. This presentation outlines the use-cases, benefits and limitations of document-based, key-value and hierarchical data models. It then presents practical advice and code snippets for incorporating them into PostgreSQL's relational framework. The presentation ends with SQL examples and code snippets for loading, accessing and modifying (where possible) JSON, HSTORE, XML, ltree and array data types. This presentation begins with a very quick review of the rationale, benefits and implications of the relational data model. It then does the same for document-based models and hierarchical models. The balance of the presentation works with three publicly available data sets, world-wide airports, Wikipedia Inbox key-value pairs and Google address JSON objects, showing how they can be be incorporated into a simple relational model. The presentation also includes snippets of code for loading the files and accessing elements. The full SQL, and shell code will be available on the web site.
Presentation of a group Sequel Relational database Mass Type theory Mixed reality Business model Integrated development environment Hydraulic jump Beat (acoustics) Metropolitan area network Stress (mechanics) Information technology consulting Bit Basis <Mathematik> Port scanner Statistics Element (mathematics) Inclusion map Process (computing) Computer animation Integrated development environment Personal digital assistant Business model Quicksort Curve fitting
Presentation of a group Graph (mathematics) Multiplication sign Mereology Subset Expected value E-text Uniformer Raum Curve Theory of relativity Relational database Moment (mathematics) Shared memory Fitness function Data storage device Port scanner Term (mathematics) Flow separation Order (biology) Right angle Data type Resultant Sequel Software developer Virtual machine Maxima and minima Division (mathematics) Average Discrete element method Goodness of fit Term (mathematics) Business model Analytic continuation Electronic data processing Focus (optics) Graph (mathematics) Server (computing) Graph (mathematics) Information technology consulting Database Density of states Single-precision floating-point format Film editing Computer animation Search engine (computing) Query language Boom (sailing) Revision control Natural language Key (cryptography) Oracle
Mobile app Table (information) Software developer Multiplication sign Relational database Power (physics) Data model Mathematics Different (Kate Ryan album) Touch typing Business model Data structure Theory of relativity Matching (graph theory) Data storage device Database transaction Bit Density of states Plane (geometry) Mathematics Word Computer animation Different (Kate Ryan album) Right angle Table (information) Row (database)
Ocean current Presentation of a group Sequel State of matter Multiplication sign Relational database 1 (number) Virtual machine Dynamic random-access memory Element (mathematics) Power (physics) Formal language Revision control Mathematics Term (mathematics) Googol Set (mathematics) Business model Software framework Gamma function Self-organization Metropolitan area network Multiplication Theory of relativity Relational database Software developer Consistency Interactive television Database transaction Database Port scanner Mathematics Data model Process (computing) Computer animation output Right angle Business model Quicksort
Standard deviation Service (economics) Computer file View (database) Virtual machine Mereology Element (mathematics) Mathematics Positional notation Different (Kate Ryan album) Single-precision floating-point format Energy level Object-relational mapping Standard deviation Key (cryptography) Server (computing) Database transaction Mereology Port scanner Open set Single-precision floating-point format Positional notation Computer animation Energy level Object (grammar)
Trail Presentation of a group Table (information) Computer file Link (knot theory) Structural load Real number Similarity (geometry) Field (computer science) Strategy game Information Data type Serial port Link (knot theory) Forcing (mathematics) Structural load Computer file Projective plane Sampling (statistics) Data storage device Ext functor Database Bit Subject indexing Computer animation Website Right angle Key (cryptography) Table (information) Row (database)
Point (geometry) Addition Computer file Link (knot theory) Sequel Information overload Mountain pass Interactive television Maxima and minima Directory service Electronic mailing list Limit (category theory) Shape (magazine) Mass Mereology Value-added network Order (biology) Information Statement (computer science) Data structure Gamma function Text editor Message passing Metropolitan area network Link (knot theory) Graph (mathematics) Computer file Electronic mailing list Ext functor Peg solitaire Directory service Port scanner Single-precision floating-point format Sign (mathematics) Word Computer animation Personal digital assistant output Self-organization Right angle Energy level Remote Access Service Table (information)
Computer file Interior (topology) State of matter Multiplication sign Mathematical singularity Cellular automaton Limit (category theory) Counting Hand fan Theory Revision control Order (biology) Finite element method Sign (mathematics) Computer configuration Operator (mathematics) Aerodynamics Message passing Metropolitan area network Forcing (mathematics) Operator (mathematics) Subject indexing Computer animation Statement (computer science) Right angle Energy level Key (cryptography) Procedural programming Object (grammar) Reading (process) Row (database)
Presentation of a group Multiplication sign Real number Mathematical singularity Price index Abstract syntax tree Medical imaging Order (biology) Latent heat Operator (mathematics) Ring (mathematics) Square number Text editor Arc (geometry) Chi-squared distribution Metropolitan area network Operator (mathematics) Drop (liquid) Ext functor Port scanner Term (mathematics) Portable communications device Element (mathematics) Subject indexing Process (computing) Computer animation String (computer science) Revision control Right angle Key (cryptography) Object (grammar) Musical ensemble Game theory Block (periodic table) Remote Access Service Row (database)
Metropolitan area network Table (information) Key (cryptography) Ext functor Abstract syntax tree Euler angles Element (mathematics) Element (mathematics) Position operator Plane (geometry) Dynamic Host Configuration Protocol Computer animation Different (Kate Ryan album) Operator (mathematics) Software testing Data structure Gamma function Chi-squared distribution
Metropolitan area network Presentation of a group Key (cryptography) Mathematical singularity Maxima and minima Operator (mathematics) Cloud computing Abstract syntax tree Price index Drop (liquid) Ext functor Euler angles Arm Element (mathematics) Variance Subject indexing Order (biology) Computer animation Operator (mathematics) Remote Access Service
Position operator Metropolitan area network Computer animation Logic Ext functor Right angle Euler angles Element (mathematics) Arm Smith chart
View (database) Multiplication sign Water vapor Element (mathematics) Finitary relation Ranking Message passing Form (programming) Metropolitan area network Theory of relativity Trail Key (cryptography) Information Structural load Quark Code Grand Unified Theory Machine code Element (mathematics) Length of stay Word Computer animation Interface (computing) Right angle Table (information) Integer
Trail Table (information) View (database) Maxima and minima Price index Limit (category theory) Icosahedron Arm Value-added network Tabu search Ring (mathematics) Gamma function Message passing Chi-squared distribution Metropolitan area network Trail Lucas sequence Drop (liquid) Portable communications device Mathematics Subject indexing Length of stay Computer animation Right angle Table (information)
Point (geometry) Table (information) Computer file View (database) Maxima and minima Price index Icosahedron Regular graph Field (computer science) Smith chart Tabu search Mathematics Array data structure Escape character Operator (mathematics) Operating system Data conversion Extension (kinesiology) Message passing Hydraulic jump Chi-squared distribution Metropolitan area network Theory of relativity Trail Executive information system Data storage device Operator (mathematics) Bit Port scanner Cartesian coordinate system Element (mathematics) Mathematics Computer animation Website Right angle Key (cryptography) Quicksort Table (information) Data type Form (programming) Row (database) Extension (kinesiology)
Metropolitan area network Trail Real number Expression Maxima and minima Limit (category theory) Euler angles Hand fan Array data structure Sign (mathematics) Computer animation Gamma function Table (information) Message passing Chi-squared distribution
Metropolitan area network Trail Computer animation Sequel Information systems View (database) Limit (category theory) Quicksort Message passing Value-added network Chi-squared distribution
Context awareness Latin square Graph (mathematics) Mathematical singularity Sheaf (mathematics) Price index Inverse element Turtle graphics Order (biology) Strategy game Set (mathematics) Circle Chi-squared distribution Metropolitan area network Trail Information systems Transport Layer Security Latin square Data storage device Drop (liquid) Process (computing) Right angle Quicksort Remote Access Service Row (database) Frame problem Trail Functional (mathematics) Real number Maxima and minima Limit (category theory) Electronic mailing list Automorphism Operator (mathematics) Subject indexing Business model Message passing Äquivalenzprinzip <Physik> Mathematical optimization Graph (mathematics) Operator (mathematics) Euler angles Subject indexing Word Computer animation Personal digital assistant Video game Musical ensemble Integer
PC Card Digital filter Functional (mathematics) Table (information) Link (knot theory) Graph (mathematics) Interactive television File format Maxima and minima Price index Icosahedron Storage area network Value-added network Smith chart Number Uniformer Raum Set (mathematics) Gamma function Arc (geometry) Metropolitan area network Link (knot theory) Graph (mathematics) Theory of relativity Trail File format Data storage device Database Bit Port scanner Flow separation Element (mathematics) Degree (graph theory) Computer animation Query language Video game Right angle Table (information) Integer Form (programming)
Point (geometry) Trail Group action Table (information) Quantum state Link (knot theory) Cloud computing Price index Limit (category theory) Arm Smith chart Order (biology) Term (mathematics) Ranking Gamma function Chi-squared distribution Metropolitan area network Link (knot theory) Scaling (geometry) Trail Weight Euler angles Computer animation Data Encryption Standard Quicksort Table (information) Row (database)
Metropolitan area network Computer icon Link (knot theory) Mountain pass Multiplication sign Grand Unified Theory Subgroup Computer animation Query language Cycle (graph theory) Gamma function Recursion Message passing
Metropolitan area network Link (knot theory) Information systems Data storage device Code Maxima and minima Special unitary group Control flow Revision control Medical imaging Computer animation Gamma function Message passing
Point (geometry) Graph (mathematics) Multiplication sign Interactive television Maxima and minima Function (mathematics) Arm Value-added network Subset Tabu search Web 2.0 Insertion loss Mathematical optimization Chi-squared distribution Metropolitan area network Computer icon Link (knot theory) Theory of relativity Information systems Control flow Arithmetic mean Computer animation Addressing mode Software testing Row (database)
Sequel Graph (mathematics) Parameter (computer programming) Rule of inference Area Coefficient of determination Type theory Single-precision floating-point format Authorization Metropolitan area network Link (knot theory) Electronic mailing list Database Bit Hecke operator Price index Statistics Type theory Spring (hydrology) Computer animation Search engine (computing) Software testing Key (cryptography) Quicksort Remote Access Service Task (computing) Resultant
Presentation of a group Code INTEGRAL Direction (geometry) Source code Relational database 1 (number) Mereology Disk read-and-write head Computer configuration Finitary relation Office suite Exception handling Covering space Normal-form game Real number Structural load Gradient Fitness function Data storage device Database transaction Port scanner Element (mathematics) Type theory Arithmetic mean Process (computing) Self-organization Energy level Quicksort Online chat Spacetime Sequel Real number Gene cluster Virtual machine Student's t-test Rule of inference Field (computer science) Number Element (mathematics) Causality Term (mathematics) Boundary value problem Data structure Object-relational mapping Address space Scaling (geometry) Key (cryptography) Server (computing) Forcing (mathematics) Interface (computing) Projective plane Affine space Word Keilförmige Anordnung Film editing Computer animation Personal digital assistant Logic Business model Object (grammar) Table (information)
better literary I think hopefully get over the at the post lunch why they try to jazz this up a little bit it has been pointed out that the presentation isn't exactly as described on the PGA counts at all but world here so for the time of by doing Jimmy Hansen of this is if you can't beat join upon in integrating no sequel interposed pressure and you'll see you just jump
right we'll start with we have about a quarter of the presentation is unavoidably sort of theoretical and conceptual and the 1st is the the no sequel check on everything from a magic panacea to a hot mass up and I the basis for the rest the presentation is surrounding assertion that it doesn't matter it's it's it's a really useful in some situations and not in in others but it's here that that's really the promise this is argue for or against this is no sequel is and it's our job from what post-stressed authorized to be able to use POST rests with no sequel so how can we thrive in this environment and and sort of armor searching I'm asserting that the answer is to know where our sweet spot is in the relational model and we have a lot of capabilities and accept the edge cases of 0 stress isn't run Google search but actually here how to right so rather than the hype
curve I did this is my nose stages and no sequel acceptance this is like finding the time so again no single is but it doesn't kill relational databases certainly doesn't kill post-stressed we have more capabilities than other relational databases in the marketplace and I wanna show you ways that we can plot on top of right so on the trying to turn the traditional presentation or that the traditional focus on a dead thing that you ordinarily we given that rest as we are posters conference popular leverage no sequel and what I'm trying to do for least 40 minutes is to reverse which is to say given that knows he was out there in the business world on how can I leverage PostgreSQL to make it work
right now I'm much of the stuff Martin Fowler's presentation you to read the Catholic in his book is also pretty darn good he coined the term Polyglot Persistence which of he solid I bought it and I'm sharing it here so I actually so here's poverty in several be no sequel as I said can be all these other things were 444 the next 40 minutes it's just these 5 things I Document store databases were gonna look at Jason for us to be XML as well a wide column store or but we're not going do that but that didn't make the cut key-value pairs at 8 store data process on graph database that's really not a datatype but it's a recursive query going to go through 1 of those the end and the Apache solar serve as a search engine on put applied for and flying share of PG from PG continuity read had a talk on ranking generating ranked results with natural language queries with full text search but that's not in this presentation you away it really focus on document key-value pair and the graphs for right so part of it is and we gotta go double damages from moment here on basically we need to make sure we all know the 3 critiques against relational databases that we have to know what other technologies are saying about us in order to refute that the 1st is on yeah you know sequel because there's too much data for 1 or a few machines to process on that I know the smart people who disagree with this but from 1 for the rest of presentation if it doesn't fit on a few machines it's not a good candidate on for relational that's really an education for our purposes i put out for sharding and I know the charter can be done in relational model but it's very difficult and no sequel has for the expectation showing I realize there's a lot more to the story but for our purposes to conquered to our
2nd because relational model stores data in a little tiny pieces in lots and lots of different places it gives a transaction power this but it doesn't match oriented approach the example here is if I was building a desktop app I would like to have a method that says you get desktop or post desktop on but in a relational model you really got in touch the to the little pieces and that's a bit of a pain in the arse and and then
through the last 50 years because it is difficult over time as you data structure for your business needs change to keep it all in a relational model in other words you can't have 1 table this as as a no now i'm changing the structure but I wanna keep all the old records in the center of art you didn't come to poster press conference to hear bad stuff so that's the end of that just 1 of the right the other side there
is where strongest contender ourselves the 1st is where an incumbent we are everywhere Google has relational databases we rock the house we have an extremely powerful interaction language sequel but it's widely known you need expertise in all over the place is mature and it still has her development witnessed this parties we're transactions are real ones down at the data element that's pseudo which transaction like eventual consistency real low crap transaction making changes visible everywhere on and then this last 1 because I get to the cool kids play with pool size data but most of us really don't like you can handle comfortably multi terabytes data you really only need that and the petabyte side but and is not that many datasets that large around on and then this last piece is is that we're better at finding relationships between data elements in there are times when no crop you do want to stop that from the wonderful thing about a glorious flexible data model is you have to think about anything in Pollock and rapid which don't always actually want so anyway this is 1 of the but a framework of this is where the the the current state of relational version of right so where does all that stuff on Martin Fowler coined the term Polyglot Persistence and the short his argument and and I agree with is that we have relational databases and no sequel database in our job is to match the business needs plus the data to the technology and I'm not argue that a lot of that does fit very very well in the post and this presentation is is really about finding were integrated in demonstrating how to encode input how to integrate it and again this is this is really what it is I get 4 hits 2 in college when the privatize I need pose not survive I needed to try various selfishly very personally so that really means we have to make this no sequel stuff work so we wanna thrive not just survive in a world that includes the sequel question some so anyway here's all that
stuff in 1 shot so that amount of data size a business transactions basically PostgreSQL relational database model with the other 1 machine but not really suited if it goes beyond a few machines and we'll talk about transaction sizes of roughly document size important to point out no sequel really doesn't have the capability to change data elements if it's smaller than a document that I realize this is an evolving sort of technology but but it's not well-suited that's not it's sweet spot so therefore we were down now
tho the technical part hopefully more fun part saw scenario is we would just given a million Jason files and we want to know what to do and and by we wanna know what to do with it that's the universal service do I really need that among the DB so just had 2 questions can dataset easily fit on 1 or a few machines have gas and then how large the business transactions and almost all universal answer to that is on and were perfect for poster at 1 of the business transactions because we get the object relational mapping in and you wanted you want a business transaction you know if you get started on a view of all the individual pieces of but most people don't know all that stuff and no single technology doesn't allow you to change individual pieces but PostgreSQL allows you to either update the whole document or the pieces to get more flexibility but so quick review and J. sound-objects JavaScript Object Notation of all it before basically we're looking at key value pairs Australian here you can have raised as as elements with Jason everything
were saying for would also apply to XML amount for conceptually they're just different standards documents but the XML is not nearly as cold so it's not in here hope that the that captures the the level of discussion that's amount of
so now the what's in our dataset where I wanted to do this presentation in real with real data because I wanted to force index usage wanted to show you a real data is is always inherently flawed is just the way it works so that's a good thing for us on the show you what it's like to load so there's a million song database out there that the public works projects like there the sample data set 1 % of roughly 10 thousand rows and we had a track I be an artist a song title and then there's 2 will convert store but the 2 are a field 1 similarity the like Pandora so this song this strategy is similar to the strategy with a certain way and then you have tags which is you know this song is a rock song value 88 this summer's alternative value 67 but will dive into that so that's basically that the corpus of documents are popular songs right so how do do this and by the way we removed through some see for example the real-time but all that's posted on the website so don't don't worry about any particular syntax based on create a table and is used by just per mole Jason file and we loaded with the copy command but that's invented here that is used to escape i have embedded trademark thing as an artist with an apostrophe yes so that's that's real datasets of anyway that gets around the problem from when these days and be a because we should and that the left right and so on when load the stuff there requires a little bit Linux work but not to that we just you know it's a zip file just unloaded and then our create symbolic links to underneath PG data I'll show you why it's convenient for me I'm interested in 1 that starts coming on me show you what I mean so we are seeing
this graph the last song in anyway it unzips to adjust at a hot massive it's a song is a song that a a b songs from this really nasty the nested case and then you just get that each piece of which adjacent files and I turn it into symbolic links here I did that because I want to propose rests on a lot of ways you could do this but I wanted to within inputs I use the PG Alistair command which from the from some post-residency gives you a listing of whatever files the directory and I use that to generate my copy can so in other words what I wanted the whole point of this is on estate I have a nasty hot mass of this complicated directory structure I want no part of 1 Linux command gives me all the symbolic links and a self-generating sequel command gives me overload right filter so that's that's I created a table find shape here's if you haven't used it Austin command it only works under PG data so we use this to documents C R O 4 is that the problem of anyway what I wanted to show you that all I did I didn't move files have been copy files someone handicraft another of the set up of files in a directory I couldn't see within PEG data I was another PG data don't move anything don't do anything just so note I am so that's little 1 of
those so if you'll see later but it it looks it's genocide up procedures and come to my father and Honduras have 3 at 93 hemocytes goal I so we don't have a lot of knowledge browse because the the the 1st answering the 1st question really when adjacent houses within the answer is generally act so we use Jason BIG keys that's all the outer layer tags in your final so it's a great way to explore the have on here is to operators to pull the top 1 with the 2 read and signs returns the text object the other 1 returns adjacent object you use the return date someone if you wanna national and this is the pragmatic so this statement says they give me all the records that have the article for watching this and C will be right so we don't 90 100 files so but so the 1st within a day some files theory those all the tags artist a similar is is is like the Pandora piece from it let's look at how many are to come song each artist so it is that this is our dataset it's randomly chosen so make no comment on that 1 but that you can see how many different songs from each piece um and let's see
what we can show on the show you some of the weighted index to make it to make it faster but to get fast version of star so this is how Amazon the force on February monkeys on 1 thing I want to point out that is the rubber mallet search technique which is converted to text and news and I like that but it does work function to other options some say here's where but if the value of the article that the artist had his art of monkeys therefore songs and then the same people were looking at right here I have adjacent had that's what I wanted to did it varies each time but this is now roughly notes so it's at the trust and this 1 adopted 83 ms and a dozen years and but that states I
wanted you know better faster so create an index of 1 the 1 point out but no thought involved and it's pretty easy 1 index the whole day some it and what I wanna show you is but trust me it does not speed up the text search but look what happens here when I use but remember so what we're doing in this square here is we have an index on adjacent objects I don't care which says it is the coordinates the holder and then and on show me with everything that has that tag their wicked it when there from roughly 80 something 2 . 8 on so hot 99 times faster no thought process are very easy to easy to maintain so we just play rental that will stroll for data on that question mark operator there is that a show me all records that have this tag are dataset they all have the same type of of not required it's easy to say the other records that have this tag images that 10 images from and that just as you know what are the right so let's look for a specific song right here well you can match by tag ID pop now we been
wrecked you knew we weren't going to get through music presentation about the sale of right yeah I can't imagine it was an accident but anyway nobody say from from a role on so let's look at that tags here let's explore that because we know it's 1 time but it's really about to up here so what we're seeing is written ghastly is 100 % 80's music can argue that come and because this is real data it means you get things that make node him sense but it definitely is not ruled that men metal but will say that is that is almost a game that the at the advantage of of real data is that these work so it let's blow
that up with on what I wanted to show you 1 operator I pull what had been an array of key value pairs pull them out road operated so here's all my difference as you can you can see it has will get to it later but it has the the structure of an 8 story and then erase perfectly valid to state from within
but now that we know the user erased we now have access to a whole bunch of array operator selects for seniors plot that this 0 for the 1st element in the 2nd 1 we need 1st the purpose of of
the of the of the fact that fact so when you have what you see in here so even if I say 0 here's the easy way to explore all the tags the outside but is not fast enough is easy weighted index enough on on that 1 I see 1 of those pieces is pretty complicated it's an array of key value pairs now I can put them 1 operator into the role and I can pull different pieces of on what getting and as you can presenters however you want to so here's another
logical piece from let's say I want all the rocks and that's
really or saying there is every song with the tag rock it's not that optimized on so it's a little slow what you're seeing here is so this is everything that had been the tags array a tagger rock all but only happy with that because some of these are in my judgment really write songs I what that's what changes what about really rocks and what that says is by songs that are at least two-thirds right so the
2nd element has to be at least you know it needs a rock tag and then the 2nd element needs to be at least 2 thirds so these should be really right hard to understand so now you have whatever the value is high so we're deep into adjacent pulling out individual at pulling out comparing individual values of this array of key value pairs always equal rights and again the
syntax is all posts right but it's a it's huge enough water for working with high right now to start if mystical whatever I say the word and the instinctive reaction while you're there are some technologies in a crap load times of old codes that needs you to present this information in a relational form and we said we need to integrate it so let's explore that how would I take these J. some files by loaded pretty easily and make them look like they're just a normal relations on you can create a view also create a materialized view if you want to and then it will look to the outside world just like a table and the key piece in this 1 is are not going to do anything ahead of time and you can do is just by exploring data back so it is a simple
1 but tragedy artist and song 5 now the same
data of but it looks like a table again I didn't creating and presenting it so this allows me to integrate with my API with other tables whatever whatever I want that that each set right if 0 1 at index it or do other things a materialized view let's if I
had that I would definitely index on track idea because the intention would be I wanted to behave just like a traditional table
and that's all you need to work so you can also refresh the materialized view whenever you Jason files 1 or whatever your data involves changes so so let's look at this right here Clapton dancing in anyway my point on this 1 is we made a we can fast queries less than a millisecond and 10 thousand rows of data that I imported 10 minutes ago is Jason files didn't copy many where my operating systems and now it looks to all the rest of my application in the world just like a regular relation that table a well indexed all that stuff is there so jump a little bit farther from but wanna get on that 1 is not really giving up much by keeping in a relational another at the fact that you have posters we have the operators to explore the adjacent but you also have this extra stuff that you can't do on those by both jump back site so those those 2 field those 2 pieces of modulation had from and the similar so remember the similar as is where you want the depend which of those really look like key-value pairs don't think so let's walk down the H store past maybe I can presented this data is suppose instead of a traditional table relation only user h sort of so on to create a song on the same here I know that track the artist and title are all their so chromatids regular can't make sense but I haven't 8 store for here's all your similar and then a store for here's all the tax and I will use only Jason V. and H store operators to do the conversion so no Python no cold no PG PG PL SQL not just piercing book on or you do it remember we had the adjacent array elements you can pull out the pieces and you have in a store eights stores both the data type and an operator unfortunate but it is what it is and the R. S so we're gonna blossom all apart from the j with the array elements put putting together with array and and then make him as a change as a as he's an H 2 I contend that requires the extension H store retrieve there here's my table
and a a little bit a sequel through but not to from you see I
have use a common table expressions and then it's just a syntax piece that I couldn't use the I I had to put the breaking up of the pieces that array elements into are a common table expressions but anyway you'll see in to supply and none of this is optimized right so we have a 48 hundred but then I should point out that not all the songs are dataset have either of these 2 2 which is also a sign of real data it's not uniform so that's it is that we're trying to
do in 5 German and
Austrian real estate quite we have some syntactic that's a story this is extreme 50 hip-hop sixties so that's exactly what we had hoped for and then to work and then we want to do the same thing with similar so on this sequel got a little weird on me for that 1 and I we're just overly complicated so I created a view and this is just a sort of an artifact to make the sequel simpler for through but show you where this goes in
and on of so I have another collect it anyway what I want is a so this is what the similar case says that track ID is similar to this strategy with that would make sense for traditional graph based on this last feature here brings it all together and then we will check out songs assumes that comes back from the lunch Our 1 1 here right so we have rows and not everything has similar song that we would expect from all the wonderful have tag so you can rows and we now build our same day we converted to N. H. store model right on which brings us into our next section gives a whole bunch of new operators and functions on right here is that the H store operator for everything with the title Latin of this is sort of the equivalent this as I what really Latin music which is a tag of Latin and a value of over 60 so that to those 1st thing is index is also simple in this case I would index track ID and the 2 H circle I'm showing the Turtle reasons to choose g inverse is just the complement here
for context the both were fun so let's look for just have really that's of less than a millisecond in this 1 by the way you know thought indexing other than I put in the index of my age to out no more thought process than that a less than a millisecond I get all my really Latin songs and an open dataset has idea you can see it's it's basically again I chose an arbitrary value it has to have that tag over piece of pretty powerful stuff that was pretty easy and again the optimisation piece the answer is I don't know I just open index on it and it seemed to work well which is what we're after simple simple simple right so here it is another another question what all the tags and all the songs by the artist monkeys well coming you'll you'll see I like this piece of the real data from extinctions have problems it's got but that that's real life for us in other words all the syntax examples from the publication of the work or too simple they don't anyway that's what I wanted to show you this is every 5 every tag but among right so that we just did
adjacent which is documents we propose an XML because it's easy to take them all down but actually all these pieces really apply was slightly different syntax to XML function are similar in and it is still a valid and useful piece we converted the same dataset internally over to a store number last piece of promise to show you was really a graph database so let's say we have similar so let's see if we can do that on neo for J is the market leader in this 1st segment of no single data and I wanted to point out for infinity of PostgreSQL and it's really relational data are suited for graph life and therefore J is really really powerful peaceful initially here and accepting their own little bit a crappy performance we'll find links and our graph that's 5 degrees of separation that would at least get us to Kevin Bacon one-dimensional so anyway just for that provides a right so when you use the recursive query whether the recursive format I just took it right out of the book were Cinderella posters documentation on on the filter our dataset a little bit and the limited to only rock songs and limited to write songs that have a strong relationship between what you see in the tag they're related to every thing and and I really want to filter our dataset it to make it more manageable and so we're going to enter that burning burning question is there a path of related songs from lady gaga poker face to Justin Timberlake what goes around comes around no remember you don't get this kind of insight from see I just want to make sure I just I just want to make sure we know where where bread and butter knife
thank so I will watch clear I created table
strong rock links and what I'm saying is here to stay where the weight is greater than it had a greater than point 1 5 I just a different scale that the the link weight goes from 0 1 and where the tags rock by easy so I got 16 thousand rows um I don't know what
I need an index on but it seems like the idea in the link will work in terms of the action that used to promote and this is what that sort of table looks like so so Stevie Ray bonds crossfire is related to a song a track with this strategy and in this way make sense and it's there all related to themselves with a weight of 1 2 surrogate detail now I This can
run along Israel's day but this I put in the ending song right here lady gaga I I see that with that because I didn't wanna ask all relationships because along with it across join in and I did I follow the example in the book to make sure that it doesn't cycle of recursive queries can cycle on a lot of really smart people wrote documentation and tell you all about it I and I chose ahead of time I know it can have a depth of 5 they just reduces my my filters that because you can there are other relationships from lady I got that are less than 5
and what you can see it producers and war and when that when the query of the song by song a rotor subgroup to different query to say hey where do all the where all relationships come from and that is that this is
I think but they call awkward you have to trust me on this that may be overcome by looking at only with
surprise yes regret version
is expensive and anyway and this slowness on that's all on it is on POS presses isn't very very good J. Sun XML H store place it is an acceptable recursive place but the good thing is a lot of data to just acceptable so what we're going see is we have images come that work that the
output of the there are 3 what you
get is a bunch of it you hit the relations between the 5
steps that lead up to
various insightful business knowledge that Justin Timberlake is related with the value of point to a to the Black Eyed Peas who are in turn related with value . 1 2 Britney Spears who goes down the reality and all the way to live sound my point is this true real example is not an easy piece of data to get I mean it's it's onto the but that's 5 different layers in a couple attend hours in a row records that that kind of VM with no optimization didn't come pretty didn't come certainly web but it did culminating in an acceptable amount of time on makes sense so that's our
it did the dog that's but
anyway what what you get in this past that is a comma-delimited it's an array of a comma-delimited list of the passage walked and to get from 1 to the other I the spring
out from that a couple things 1 is the assertion of the presentation of take-it-or-leave-it but my faces there's a bit interested is 1 no single is here to stay but it is a pointless and ultimately sort self-defeating exercise to say how much we hate no sequel and we wanted to go away and we see other contenders and this 1 is no different in relational slay this 1 2 on the lots of arguments to say no sequel is different it will be used but but the other side that is it's not indicators that we need to play nice verified which types of rule of no single databases on posters plays well with 4 of I didn't see this search engine piece we have really a heck of a capability with posters full-text search really really nice ranked results authors of the other pieces we played really well with document key-value pair we played OK here and really well with certain summary to but you can
load no sequel data but it doesn't cut you off from the relational world does that make sense so you don't have to say the the adjacent or a table post-stressed gives you both in 30 minutes nothing else does or almost no other technologies that has a officially nothing but but really that's a pretty remarkable pieces that means you have a whole lot of embedded code that you don't need to throw in a hand motion real no credit datasets fit you know you really have to go above a bunch of terabytes to make this impractical are sweet spot on data that fits well on 1 or maybe a few machines on transactions you the whole document workpiece within the document I you want to enforce some referential integrity on number in loading adjacent provided enforced any there that that's really a business to it but we have that option but anyone affine relationships within the data that that's a a powerful suite but on on the most organizations with these 4 criteria call real did you know I mean Google Yahoo sales force that come with great headlines wonderful pushing the boundaries of technologies but also age cases for most us posters plays nice in what at least in the majority of my office on any 2 asserting if you're buying it at least education to no sequel is not useful for post yesterday we can do that to to everything but is we can do so much so well by from and then I guess the last piece is the rules are more complicated now it's it's just it's too simplistic to continue down the thought process of of St. David's not 3rd normal form urinated and it's wrong but I don't know how much that is still out there I run into it on occasion I think the world has changed you agree we have to expand the and that we have more capabilities and post-stressed than the other guys with leverage them from another implication is you can't do that unless you know your business needs and your data the data itself alone won't explain how to use you have to invest more in of I know it's not pretty it's not even an appealing answering a very technical conference but the the answer we're stuck with if you're the leverage posters of most people capabilities you gotta invest some how business users it what the structures of Our my objective again remember 2 kids in college 1 and the other guys will will this used to arrive if you just get by you have to get some of these other guys and that doesn't happen if I use a note we see a lot other contenders no sequel you know will blow away or a we can do everything post-residency 10 thousand the clusters and things like that you smart people may disagree I don't think that the sweet
on questions the rate of OK if you if you know what you want to know that you have a lot of 1 of the U I and the direction of motion of the the of the of the of the of this yeah but but but I like to work with but I like to hear that because a lot of real problems are manageable like that and if you know it seems lady gaga space trivializes the example but it's not so easy is inherent in any other questions and when using J. some news using the stuff that I would assert that next year's conference will see more and more on the head or or or you know you like the meaning of it all you need to think about what people were doing was more than all the word find their efforts on it and it all in 1 of the death of a lot of money yeah we're all of you or some of you to a multipart question fortune on my answer I don't know but in terms of updating adjacent elements by posters 9 5 is just around the corner clearly there's a huge still going on next month so you have to buy that now so I I I leave that to the 95 has a lot of that stuff that you talked about on I have not integrated with Mongo DB were playing with it home I know I have used up Python and Google's addressed providing and so when you look at an address it with Google search that give you Jason object and we're back and I used the that the Python interface in work grade I just plug my Jason logic natively it with the noted exception you have to every real piece of data has embedded field delimiters I didn't know about that that copies syntax so I had to work my way around it on but I have said from a JDBC type source was Python not Mongo alluded indirectly I'm happy to talk after I didn't use it wasn't in this conference sort or in this presentation server-side project on but Python talking to Google to give an address a pullback is adjacent object and storage in post it works no problem at all it it was a lot of yes no yes yeah my my assumption is that works of Python it'll work with with much but other things I did that for the obvious reason because it's easier and the answer the so perfect so that that's Polyglot Persistence that's saying Mongo can can Condor party but we're still apart because they're still pieces that PostgreSQL knew that the other ones can do any other stories from the head of the of the the next 1 of the all of the data is still being held on so they know you you're that that was the year that the critique of that that this object relational mapping that had the rash us hibernate work on its real and then some students having taught solution coming out of college and on and what to say about that 1 but all of can so all right so if you want 1 person speeds all idea for that is I only talked of use whenever I couldn't applications like that the only thing that is my object relational mapping but I am very disciplined I never ever talk directly to the table cause I'm always wrong when I designed in the beginning that's my methodology is to do a wrong on that's all I can I also learn from or apps I worked a lot with them and they also allow you to do it yes yesterday on also yeah this lecture we really going to have have but so that in this presentation is on the scale OK the sequel finally broken center I'll OK because you know this thank you out all all all talk to Dan amount has you need that and it took me a while so part of it is I'm going to get a good please download please tell your friends on it took me a while to get some the syntax is awkward in part because you know it's sequel doesn't cover this sort of stuff as so on her own wilderness but it is possible so shamelessly copied my code because I shamelessly copied other people's but it works and the other pieces going into those arrays PostgreSQL rocks array once you get remote that the tags in a similar pieces were home when get that so another duet with Jason was that hey tag that's just an array of key value pairs that's my good place I know how to work with the arrays in post-stressed complied individual elements and and all that stuff like that this is not that that's good it helps put a fork as were done