Formal Metadata

Alternative Title
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
Use the unstructured data type that's right for you. PostgreSQL 9.4 adds a new unstructured data type, JSONB, a variant of JSON optimized for storing objects. JSONB complements the existing XML and JSON document objects, as well as HSTORE. Which one is right for you? We'll take a tour of the features of each, and their advantages and disadvantages for various use cases, all illustrated with real-world examples. There has been a lot of work on the representation of unstructured data in PostgreSQL, culminating in the addition of the JSONB type in the forthcoming 9.4 release. JSONB complements the existing HSTORE, XML, and JSON types, not to mention arrays. With so many options, which do you use? As usual it depends on your use case. In this presentation, we'll review the unstructured data types in PostgreSQL, and look at their advantages and disadvantages for: Document storage Configuration management A "schemaless database" Object serialization Entity/Attribute/Value models Path queries
Software engineering Dependent and independent variables Mechanism design Personal digital assistant Multiplication sign Civil engineering Data type
Data Encryption Standard Data type Data type
Point (geometry) Slide rule Implementation Addition Icosahedron Wiki Sic Operator (mathematics) output Implementation Data type Pairwise comparison Theory of relativity Expression Civil engineering Operator (mathematics) Database Equivalence relation Subject indexing Query language output Quicksort Data type Conformal map Resultant Probability density function
Standard deviation Functional (mathematics) Electric generator Sequel Forcing (mathematics) Expression Civil engineering Electronic mailing list Database Generating function System call Element (mathematics) Element (mathematics) Number Goodness of fit Arithmetic mean Word Predicate (grammar) String (computer science) Right angle
Functional (mathematics) Beta function Table (information) Logarithm Parameter (computer programming) Counting Rule of inference 2 (number) Hexagon Office suite Metropolitan area network Graph (mathematics) Mapping Namespace Information systems Expression Civil engineering Electronic mailing list Counting Database Instance (computer science) Library catalog System call Word Query language Quicksort Table (information) Row (database)
Functional (mathematics) Run time (program lifecycle phase) Table (information) Expression Price index Counting Port scanner System call Element (mathematics) 2 (number) Subject indexing Bit rate Query language Scalar field Single-precision floating-point format Software testing Right angle Data structure Table (information)
Overhead (computing) Functional (mathematics) Overhead (computing) Table (information) Computer file Multiplication sign Civil engineering Data storage device Sound effect Database Data storage device Mereology Cartesian coordinate system Query language Core dump Configuration space Quicksort Data type
Inclusion map Subject indexing Key (cryptography) String (computer science) Operator (mathematics) String (computer science) Data storage device Object (grammar) Data structure Data type Electric generator Data type
Channel capacity Key (cryptography) Sequel Patch (Unix) Expression Data storage device Operator (mathematics) Equivalence relation Vector potential Subject indexing Hash function Operator (mathematics) Network topology String (computer science) Arrow of time Extension (kinesiology) Extension (kinesiology)
Functional (mathematics) Key (cryptography) Quantum state Expression Constructor (object-oriented programming) Data storage device Operator (mathematics) Density of states System call Number Array data structure Sparse matrix Message passing Different (Kate Ryan album) Operator (mathematics) Order (biology) Program slicing Right angle Arrow of time Key (cryptography) Data type Row (database)
Functional (mathematics) Theory of relativity Key (cryptography) Computer file Multiplication sign Data storage device Set (mathematics) Parameter (computer programming) Coma Berenices System call Equivalence relation Message passing Exclusive or Analog-to-digital converter Blog Set (mathematics) Energy level Row (database)
Group action Overhead (computing) Table (information) Structural load Length Decision theory Multiplication sign Counting Total S.A. 2 (number) Product (business) Number Order (biology) Average Single-precision floating-point format Software testing Extension (kinesiology) Computer file Data storage device Electronic mailing list Database Line (geometry) Query language Right angle Table (information) Data type Physical system Row (database) Directed graph Extension (kinesiology)
Point (geometry) Laptop Overhead (computing) Group action Regulärer Ausdruck <Textverarbeitung> Overhead (computing) Mathematical singularity Expression Price index Drop (liquid) Database Counting Complete metric space Product (business) Product (business) Subject indexing Order (biology) Query language Operator (mathematics) Touch typing Data structure Data type
Overhead (computing) Regulärer Ausdruck <Textverarbeitung> Overhead (computing) Key (cryptography) Multiplication sign Gender Expression Price index Counting Mereology 2 (number) Subject indexing Order (biology) Query language Resultant
Axiom of choice Overhead (computing) Price index Client (computing) Total S.A. 2 (number) Number Goodness of fit Operator (mathematics) String (computer science) Representation (politics) Utility software Physical system Form (programming) Parsing Key (cryptography) Suite (music) File format Data storage device Database Binary file Subject indexing String (computer science) Utility software output Right angle Data type Physical system Row (database)
Point (geometry) Server (computing) Functional (mathematics) Addition State of matter Codierung <Programmierung> Order (biology) Mathematics Operator (mathematics) String (computer science) Representation (politics) Integer Arrow of time Data structure output Extension (kinesiology) Error message Addition Standard deviation Validity (statistics) Key (cryptography) Server (computing) Data storage device Operator (mathematics) Database Doubling the cube Order (biology) output Key (cryptography) Object (grammar) Resultant Spacetime
Functional (mathematics) Building Sequel Parameter (computer programming) Storage area network Number Scalar field String (computer science) Hierarchy Integer Data structure Data type Boolean algebra Key (cryptography) Constructor (object-oriented programming) Data storage device Electronic mailing list Special unitary group Complete metric space Mixed reality Summierbarkeit Key (cryptography) Object (grammar) Data type Row (database)
Functional (mathematics) Statistics Overhead (computing) Table (information) Length Multiplication sign Set (mathematics) Data storage device Total S.A. Number Element (mathematics) 2 (number) Array data structure String (computer science) Set (mathematics) Representation (politics) Data structure Data conversion Boolean algebra Metropolitan area network Overhead (computing) Key (cryptography) File format Computer file Expression Data storage device Database Personal digital assistant Normal (geometry) Summierbarkeit Key (cryptography) Object (grammar) Quicksort Table (information) Data type Physical system Row (database)
Group action Regulärer Ausdruck <Textverarbeitung> Overhead (computing) Multiplication sign Price index Counting Mereology Product (business) 2 (number) Order (biology) Bit rate Military operation Operator (mathematics) Single-precision floating-point format Curve Parsing Key (cryptography) Cellular automaton Expression Data storage device Product (business) Subject indexing Query language Right angle Resultant Row (database)
Metropolitan area network Overhead (computing) Data storage device Function (mathematics) Counting Total S.A. Subject indexing Order (biology) Message passing Arithmetic mean Query language Function (mathematics) Order (biology) Physical system Row (database)
Axiom of choice Table (information) Mathematical singularity Data storage device Order (biology) Different (Kate Ryan album) Hacker (term) Scalar field Operator (mathematics) Set (mathematics) Overhead (computing) Key (cryptography) Validity (statistics) Expression Data storage device Density of states Port scanner Subject indexing Order (biology) Key (cryptography) Object (grammar) Table (information) Data type Spacetime Row (database)
Server (computing) Functional (mathematics) Codierung <Programmierung> View (database) Equaliser (mathematics) Data storage device Medical imaging Order (biology) Array data structure Operator (mathematics) Representation (politics) Integer Data structure Implementation God Addition Constraint (mathematics) Key (cryptography) Server (computing) Binary code Data storage device Operator (mathematics) Binary file Subject indexing Query language Order (biology) Object (grammar) Data type
Existence Table (information) Computer file Multiplication sign Virtual machine Water vapor Counting Total S.A. 2 (number) Product (business) Order (biology) Array data structure Bit rate Operator (mathematics) Software testing Data structure Metropolitan area network Overhead (computing) Key (cryptography) Data storage device Operator (mathematics) Bit Equivalence relation Product (business) Subject indexing Curvature Query language Order (biology) Object (grammar) Table (information) Data type Physical system Resultant
Laptop Metropolitan area network Overhead (computing) Greatest element Group action Overhead (computing) Key (cryptography) Multiplication sign Data storage device Operator (mathematics) Price index Database Bit rate Mass Computer Product (business) Subject indexing Order (biology) Query language Operator (mathematics) Right angle Office suite Object (grammar)
Context awareness Functional (mathematics) Regulärer Ausdruck <Textverarbeitung> Overhead (computing) Multiplication sign Price index Function (mathematics) Counting Mereology Total S.A. 2 (number) Power (physics) Number Order (biology) Operator (mathematics) Subject indexing Mathematical optimization Social class Graphics tablet Boolean algebra Context awareness Addition Default (computer science) Parsing Key (cryptography) File format Expression Data storage device Mathematical analysis Operator (mathematics) Database Special unitary group Bit Binary file Subject indexing Query language Function (mathematics) Network topology Order (biology) Statement (computer science) Key (cryptography) Object (grammar) Figurate number Data type Physical system
Key (cryptography) Multiplication sign Data storage device Variance Data storage device Event horizon Curvature Operator (mathematics) String (computer science) Key (cryptography) Object (grammar) Data structure Data type Error message Spacetime
Metropolitan area network Addition Internetworking Civil engineering
I couple all of the things yeah I think we have better uses of your time now gene and extract anomic mechanical agenda next on x amount how the 1st thing that 10 the and so on 1 exhaust Peter we need the traffic of that is used to OK thank you data tells me center starts I guess we will were we all know how that well subways run in Moscow so is gonna follow that example try to be as on-time responses my name is David Wheeler here talking about scary all upper case name data types now
there are several semi-structured or unstructured data types that you can use for your columns PostScript's probably all of you know about there's XML H Dorr Jason and 9 . 4 will do some B so you may be wondering yourself self which should I use when should I use it why so here now is your brief tour of my opinions this so it's not
xml was added to postgrad as in 8 . 2 it is my slides are posted by
the way there on speaker decade and PDF chicken downloads linked to from the talk page so you can take a screenshot this you know at least so
anyway it was added a point 2 is an XML types but it's was intended to support things like publishing and exporting database of values of design for sickle 2003 conformance and has an XPath query support built so the implementation and is basically the XML is validated on input and stored as text in the data so there are no comparison operators no equivalence XML there's no greater than or less than or any of those sorts of things so as a result of that or in relation to that there is no native indexing of XML datatypes you're putting a text blob they're essentially which again however you can cast at the text if you need those kinds of composite operators comparative operators are indexed but even better you can use in that XPath and expression indexes which I'll get back to that will later so I had never
actually met much with the the XML support and post present a list serve a private stock figuring I should probably be comprehensive and I was surprised that XML support itself and post grows pretty comprehensive so to start with you have a slew of XML generation functions like XML element XML attributes XML comment x not catch this XML force that like strings a bunch of stuff together into some XML thing here's an example where we can use nested calls functions you know here is working in element name name uh 0 yeah its name is through right this is the funky sequel standard thing right it's name is true when you have an active you'd xyz with the value bark no value at the bar with a value x y right so it ends up with here XML element the name ABC comments the elements element in that and you get back nicely formatted XML they're so pretty good generation tools in the database
itself the XML support also comes with the number of predicates there's the is document predicate which tells you if you have a XML document which I think means as doctor but there is the XML is well formed a function which is more generally useful it tell you if you century to valid well-formed there's the XML exists which uses an XPath expression to see if a value exists and in in an XML values and here's an example of that if we have were passing the XML towns with the town Portland town Ottawa we can use the XPath expressions of any town anywhere in the XML words the value of its text is Ottawa and so this returns true because they were not Ottawa you were our right
after last night some of you may not be sure it also has a custom XPath function which is probably a little more than syntax you might expect for was grows functions here you pass an XPath expression as the 1st argument and your x amount evaluate as 2nd argument and it returns and a text array of all the values that match the XPath expressions so here I'm saying all the a tags and p tag with only the text and so we're getting tested here and me and that comes out and here similarly there's the XPath exists which returns to a false if the expression here looking for exists within the but both of these functions support namespacing so if you have a name space requirements for XML maybe you have HTML embedded in your atom beta something you can it takes an optional 3rd argument with an array of namespace rules or assignments namespaces 1 interesting feature is that table mapping we have functions called table to XML and queried XML and there's something else that's this example 1 just call that have a simple query in which I'm getting data of the PG catalog and it shows you how dumps that such is schema called them words XML schema instance which I think is specific to databases and just give me a table 1 rows in each row has a common with values and extract relevant data so this is all built a goes can export XML right now today I did it yesterday now growing up i.e.
grabbed all the data from the US Patent Office for grants from this month of January which I think is about 26 thousand documents and look up to database to ransom XPath queries against that I want to see how well it could perform so here's how it looks like it's is very long x query but basically it is I want to see a list of all the examiners there last the text of the last name no matter what kind of examining the artist means any sort surveillance with that and say OK show me a count of how many of the documents in the graph databases have the 2 of the 2 reviewers examiners Anthony Brooks and rocks and so passionate queries returns the array as we saw before so we're just comparing and so it finds for documents reviewed by those 2 exam but it takes a little while to run this query this is 37 seconds so 1 thing we
can do to get around that is we can index is called or to the indexing expressionist call as I said you can't index of the XML itself but you can use XPath to extract values and use that expression index so here of running basically the same the same functions before casting it to a text array and indexing on that text or so that it plays a run this inquiry were now down to less than a millisecond performance which is pretty good but the however you can only
compare scalars the scalars in this in this way so we had in an array there and I had to compare the whole rates the whole rate if what I wanna know is how many documents or any of the examiners is Brooks we end up doing a table scan we're back to 40 seconds of runtime here because it it doesn't is unable to pull apart the index the array value compared to a single strand so yeah it gets to that test so if you have a very specific need when you create your XML say you know exactly 1 value you wanna get in any document has only 1 so it pays to understand the structure of documents you can index that ah instead so here instead of getting any examiners just primary example there should be only 1 primary examiner per document and so I'm indexing on that but so this is actually the XPath query answers getting the 1st element out of the array right because there's only going to be 1 and this allows me to get 12 back so they're 12 documents were the primary examine is Brooks and of course we can index that single expression here by creating an index on just a single element of the array and now we're back to pretty good performance of around 2nd for so you're you quite limited in the kinds of indexing you can do to speed up the queries that that happens to match the requirements of the way you want to look at your documents again watering going to point out that we were just getting a single text values here to index rather than the array of so when should use x
amount you should use a smell when XML is required for example you have existing documents are you need support so a something maybe have a bunch of xhtml you want to be able to have a new database that maybe you have a bunch of may our on configuration files God forbid that sort of thing it's it is a decent for document storage for hold it storing the whole document but and C a fast IOC getting data in getting data out pretty efficient because the effect essentially on the way in which is valid in XML and storing the text on the way out this has the dump text and happens also my love using XPath agree XML stuff on forced to use XML but it's going to be the best performing obviously when you have a scalar values that you know in advance you need to get well indexed otherwise you need to accept that you are in round-table stands in for your particular application perhaps that's that's book but do know that every XPath function called against an XML document is a new parts of that document and execution execution of the XPath query against encumber that overhead of repassing XML document every time you want to query whether Foley
so now let's talk about story H star was also added in 8 . 2 hot its structure is simply key value pairs where the keys and values are only allowed to be string uh there's no datatypes course there's also nested support so you can have a value be and each store object itself but it does have a slew of very useful operators requiring the uh the values in that in in H store and also supports just engine indexing Geste ingenio however anybody wants them in
9 . 0 0 and Rodman spur submitted a patch with uh that improve each for quite a lot including a slew of new operators and increasing the size of that the whole thing is like to be the capacity of each store so could store a lot more potential payday think of the size of whatever and the tree and hash indexing you actually have equivalents operation the
syntax well 1st you have to treat the extension this is a contrapuntal that ships with grows to you always need to create extensions for users create extension of course is added 9 1 think before that you need to the sequel fall directly so the syntax is inspired by Pearl hashes were basically does have key value pairs separated by fat arrow between keys values and and a comma between the pairs in simply cast it your text expression H so there is 1 of special values you can use in a store and that's no so you can have still strings and you can have no values although I don't believe you can have no keys yes there would be goofy so like I said it comes with
a slew of operators the arrow operator allows you to fetch the value for a key so here i'm fetching and the user key in getting back Fred you can also use the arrow keys with an array of values and will return array of is an array of keys and return all the values for those keys are it also has the containment operators and clean this 1 it says basically are all the values in each story right also indicate store on the left and that compares the full path of the expression of each so it's essentially a intersect operation so it not only does it have to have ID but it has to have ID where the value is 1 in order to match there are
many more there's a concatenation operator the question mark operator to see if a particular key exists in your age stored value question mark ampersand this way 0 this you pass an array right to see if any of the keys in your existing AIX store and all 0 right that's right this is all about so all the keys nearing must exist in this 1 this is do many keys exist that you specify and be shipment examples is also a minus operator which allows you to delete keys and any you can convert it to a ray in a couple different ways there are ways to get us slices of your age store so that there's there's a bunch of cool stuff you can do with it comes to the number of functions that there's a simple constructor read sparsity in evaluating get each store you can also pass composite values or over rows records to it to the aged are constructed and will turn those into key value pairs with the key is the call naming a composite type value costs value so pretty easy to go back and forth between records in each store you can also pass an array of key value pairs to get that to each story and there's also a nested 1 we capacitor array of 2 value arrays each with a key and a value to turn to to next a keys will
return an array of all the keys enriched or valve return array of all values you can convert H. store Jason simply bypass enriched for this function and returns adjacent equivalent note that all the values including you know the numeric 1 here come out as strings in adjacent again the old only special value is not their functions that we give you sets that you allow you to extract set values from your age store as cues returns the set of he's a s valves returns a set of vowels and each function returns a set of keys and the values as rows in relation which is great for iterating over a set of values and function for a long time I exclude exclusive use of H star was to pass key value pairs to a function call so I had name parameters so let's
talk about the performance of each store for a few minutes of size Stevie had a blog post earlier this year in which they published a bunch of reviews from Amazon . com from 1998 when the ice in Amazon was little more open about such that this is adjacent file so I converted it with nested values a converted flattened into any each store so it's just 1 level of keys and value so to
test it that this has almost 590 thousand rows at age 8 storage is on I created a database and group love the extension and has created a table with a single reviews common type store and then I copied the data from the reviews file into the and it took about 8 seconds to copy at all so this is a little under 70 thousand records the 2nd took to copy all those values and this is on my 80 air with no sink right but with this is the so the copy follows 233 megabytes invaded by size database size ended up with 256 megabytes so around a 10 % storage overhead which is pretty good as these things the so I
adopted adapted to me a query from that side of the people but basically it's a list of products so I was interested in what were the average reviews of all DVD reviews in the database I groups by relative the buckets with the relative lengths of the titles that super useful that kind kind so we have 6 buckets here of varying lengths of book titles and the average review for DVDs decisive books and DVDs of those lines and the number of reviews in each not so many reviews of longer titles so the running time was over 200 ms which is the same but it could be better
H star was the 1st uh semi-structured data type to supporting indexing for fast searches of so if we want to we can you create a gene index on the reviews and let it run for a while ago the coffee it took a little while to run on my laptop here but then I run almost the same query only instead of extracting the review where the product group was DVDs and using containment operator this is the operator that can take advantage of the gene index so they use the containment operator passing basically complete path structure he in value that I was interested in we got the query now down to just a little wonder 30 ms so this is a point 8 fold improvement so with this
database the size it with this index the size of the database is up from 255 megabytes to 3 and 1 megabyte so we have around 17 18 % overhead for the gene index on the touch however we can if if we know in advance like the XML that this very specific values that we wanna get we can instead create a B-tree index on a fixed on an expression so if I drop that index and I create a new 1 in which I'm just fetching a product group he that's region text for the prod prod through this resonate 2nd so it's substantially faster to create and maintain snapper databases
266 tutors 68 megabytes who around 5 per cent overhead for this index and the performance
of the run the query again and back to just extracting the value of the scalar value and it's down to around 20 seconds so it's it's it's a little faster than using the gender the gene index is going to be more useful if you don't know in advance what parts of the story might want Coreper values and it's basically in into index the entire thing but if you know like this 1 key we look at particularly All the timing is for filtering results it can be more efficient to use an expression on the text value in a B-tree index
but now let's talk about dumping in store internally each store has binary representation stored in the database so unlike XML which stored as text this is basically right down to binary representation so for input and output it must be passed in format this does make it much more efficient for operations such as extracting the values at each store but for input and output we have the overhead of passing form so is quite fast so my system for dumping 590 thousand records it took a late 1 . 4 seconds where no 1 should use store this question is the fast very fast and efficient key-value store of using binary representation but the bios can be a little slower than something that users just text representation if what's important to you is getting documents and in document out as efficiently as possible each store may not be the best choice but if you want to like it just to give actually useful particular keys it's pretty good choice and right with the gene index supported vast operation however its utility can be limited uh somewhat and a number of ways there is no nesting as I said there are no data is no datatypes support at all of the and strings and also and the format is custom so you would need to have specific formatting of passing support built and your clients into your database so this is the cost that you that you run into for that any questions about it so you will get
us talk about Jason was
added post grows 9 . 2 but all it did this is like the simplest thing we could possibly do it is simple validation of adjacent structure on input and store the results as text this is just like the XML standard and it uses a server encoding some this way it's slightly different from standard which mandates UTF-8 if using UTF-8 database you get standard Jason using a Big Five database you're going to get big 5 days on which is not technically Jason which should do the trick because it's text representation of preserves the order of the keys in you Jason value and that if you have any duplicate keys those are preserved as well and you pointed out to me earlier that it also preserves white space which means occurred because messing changes some of which can but on alleged Jason formatters but also to crap into adjacent so all that states in 9 . 3 0 Andrew added a bunch of functions and operators to make it much more useful those are actually also available as extensions from 9 point to just and their new building functions added 9 . 4 so how many of you
went to induce talk earlier know about a quarter that's pretty good at that there be more overlap that as non-speech through all this because you seen it already but only if you best be taken out that so like a store there's an error operation only cool thing here is you can pass an integer to it and it assumes that what you're trying to get at is an array indexed so return the value at that a random so that this 1 here and course you can also passes the string and it will get the value at that he position what now Jason these from these operators don't return text they returned Jason as you can see here this is jason object and adjacent object if what you want instead is not Jason text use the double arrow operator here passes a you know you can use an integer or a strength to give back the text representation In addition we have a path operation which you pass through this tube operator here is an array that represents the path to the value you want to get to the chase some value what yeah so here I wanna get under the a object that a key there is an object with the key and what the value of the keys to your aid to object to be key to see with the so if what you want is not again not attacked Jason representation but text representations that double arrow here with the prof operation it comes with a
couple of construction constructors J. Sun in PostgreSQL and in compliance of include the latest RFC allows you have pure scalar values that are neither raise nor objects but still Jason surpasses string here it computes converted to j some value complete with the quotation marks an internal quotation marks escape a similar to the was it rotates each store function I think you can just pass a composite object or record to the 2 Jason function and returned an object of a key value pairs represented by their value and of course where possible J. Song datatype preserves but compatible data types between sequel was grows equal and Jason so here on a passing integer and numeric an integer a Boolean and all in a string we get back in the numeric value here a boolean value here and 0 here and strength all of which are supported by Jason new 9 4 we
have these functions allow you to build more complicated Jason objects the Jason build a function you simply pass a list of values which may be of any data type including mixed data types and you'll get back the J sum of array with the datatypes preserved in here and J. Sun of build object also takes work that it takes any number of arguments as long as you have at least an even number of to build adjacent object and again datatypes are preserved where possible the cool thing about these in like the XML constructions functions you can nasty sectors to create more complicated hierarchical Jason structure so what I'm calling Jason build object I wanna food key with the number 1 and when a has an array with these 3 values and so that's make it through 1 array cancer king I don't know I can move back to the so
again like the age store support you uh can get sets of values at adjacent adjacent each function will return the keys and values for days on object and jason each text so here this is returning some values Jason each textual return text value the and of course if the values that you store you have sort of adjacent on numbers you can cast of from the text representation that would be like select key common value cast to integer or whatever and there are a whole bunch of other adjacent functions J. summary length is assumed you're passing adjacent array a structure to it and I will tell you how many values are object keys will return of set of keys for adjacent objects and jason array elements will return all the elements in the array as a set course the return values of this would be Jason values of what if you wanna text values you can tell it to use this function instead but some type of there's a handy function so as the type of a particular case of value which can be string number boolean normal object or a and there's adjacent to record so you had before so the 2 adjacent to which you pass a record we can also use to record to cast to convert Jason structure into record object and you call is like a tables you pass an as expression define the columns that you want you don't even have to define the composite datatype in advances and yeah lots lots more and there's been very busy the last 2 years adding all sorts of reasons that the center densities over here with the so let's
compare the performance of Jason datatypes from the stored in time here I'm creating a table called reviews this is exactly like what I did before for each store only the datatype is Jason and here I'm copying in the fully nested structure of adjacent not the flattened 1 like it for each store so we're preserving the original format provided by scientists of statistical around a little under 7 seconds to run so around 85 thousand records second and the copy follows 208 megabytes in the database and something about 240 megabytes so we have about a 15 % storage overhead for 4 days sum this city compared to a store this is faster than H store which took a little longer to load because it had the overhead of passing and converting into the binary representation but this takes upsetting more overhead th sure storage was more efficient although it might also be because it was flat it's hard to tell so let's
look at this bucket of books query again here I'm doing a path look up so this this is the path expression right I'm passing array of products and the product key the group key I want studied the DVD right and do the same thing here I'm getting the the title of the product that's being reviewed and the rating of the review by extracting paths from the Jason so we get the same results which is nice but performance curves socks so where is this query without an index on each store to grant 200 ms this takes around 17 hundred cells so it's 8 or 9 times slower or as long duration so
let's talk about indexing because the operations is slower than a store this is because of the overhead of having the parts the j some value for every row for every 1 of those operator expression now there is no general or just indexing support for Jason however we can use an expression index similarly to how we could have an expression index for its so we don't have the flexibility we would say that I put in inverted index in this whole thing in I don't have to worry in advance that will when acquiring but he noted dance you can add an expression index to make efficient which is right here because I know this specific they want filter on the product groups for DVD I just go ahead and index that single expression here and I'm using the the path query operation that returns text so that takes around 10 seconds to run the index pretty pretty quickly
overrun query again we're now down to 91 ms now again this is still a lot slower than a store eights store we got down to around 20 milliseconds for the same basic thing and I'm pretty sure that's because for every 1 of the rows are actually getting here this was a index lookup so is efficient but we're having to do a pass of the reviews Jason here in order to get this stuff out and revenue another 1 here in order to get the stuff you have the overhead of 2 passes of adjacent and to the path queries for every single row so you know you pay for that but
let's look at the output of the of adjacent values like XML are stored internally as text so this means there is no need to pass and reformat on output we get just dump it then in fact dumping is about 50 66 per cent faster than was for each store here it took about 6 tenths of a 2nd three-fifths of the so what is this meaning when
should you use J. Jason J. is useful when you are what you need is document storage 1 what's important to you is being able to get things in and out quickly if it's important you to preserve a duplicate keys or the order of the keys appearing adjacent objects or the white spaces and Jason J. Sun is adjacent type is your choice but now because of the text the storage and sufficient to get in and out quite quite fast and the operations you you can do on it with the the querying operator operators is pretty awesome thinking that he's convention has a indexes this is gonna work best indexed scalars of course or when you OK with tables can remember you they paid a price for every single expression have against Jason operations per row some familiar these reasons are almost all the same as with XML but 1 differences that it's not XML of questions about Jason J. Jason with with that yes yeah what actually some like XML Schema stuff that you can do with the XML support in here although it doesn't do schema validation thank you you know you can like store some skinny finger generates some skinny thing but it doesn't any knowledge of the the from the quality of the yes yes they are much easier to use than XPath expressions and XPath returns an array which I found a little annoying all of the
most all of the above and a user sends to adjacent don't know I am on does I was trying to find a decent quote for j some being ordered by was this you know kind of incredulous question on Hacker News from some random person you mean I can do this so was thought
Reggie it's new in 9 . 4 coming coming soon this is a full representation just like the Jason datatypes and just like the Jason datatype uses server encoding it however is inspired by uh the H store to experiment that's our friends only intended or worked on in the last year and which I wrote a lot of documentation for we will never use the idea that at all but like a story uses binary star so the idea of a store to was that it would have nested would have basic data like Jason and nested structure just like Jason and ended up being just like on only with the binary representation of images you have reached so in addition to the operators who did you have adjacent operators and functions you have a query H store style query operators and constrain operators you can use it against unlike the Jason datatype there's no key order or duplicate preservation view of duplicate keys and object the last 1 to appear in the object winds up and the order of the keys come back in any any way which if you used to using you know a dictionary or hashmap object should be but like store has very fast access operations the binary data type and has Jin indexing through a
so for operators we have all adjacent operators us equality yes you can compare that to j some objects are equal 1 note this demonstration of no order preservation is a and B being in the name but you have the same as God intended and like I a store we have the containment operation where again you pass of full path to the thing you're interested in to find out the values the and also a store you have a question mark operation to find out if key exists in the some I think you can also pass an integer return true if that the array index exists and adjacent array no more fully it will be welcome to 3 look at
In addition and unlike each store we have nested operations here so for example uh with equivalence we have to look also not not only do is do we have the old he's the object but we have the same or a and note here that this is equivalent here that they have the same all the same stuff and with the containment operation you basically specifying that the basic structure you want but water again doesn't matter so even know I'm looking for in a a key with an object with the CQ with the value of 3 of it doesn't matter that the sea he appears after the beaky and it doesn't matter of his honorary here but if this was an array index here have 3 and they could be in an arena with 1 2 3 and they would match yet his example that we have an array 1 2 3 does it contain 3 and 1 in and returns true even know of course the orders so water is not what matters the presence of the values what matters
like via a store data type have an existence operations do any of the values near a past exist as keys in this object or do all of the values in the rate pass exist as keys and now as with a store and J. Jason to low performance test with the same data created a table with the type j some and I copied although data and this took almost 10 seconds to load so the little slower around 60 thousand records second as saying 200 in 8 megabyte file for as for Jason but the internal storage is at 277 megabytes so we have about a 30 32 % uh overhead storage here again so for each story was in I think hands the machine yes 0 8 start but a lot faster and I'm going to go out on a limb and guess this is because it was flats where is this is a nested data structures and it only had 50 % over at so there's quite a bit more overhead for the storage here for their time and again I don't know how much that is because this is nested values as opposed to a flat however we go back to running a query the this getting the product review DVDs to get all the same results and were about 381 milliseconds so remember of adjacent this took of 1 . 7 seconds and with the age store it took 270 ms so it's a little slower than starboard again I'm going to guess this is because the nesting versus the flat data structure which right but it's in the same ballpark unlike J something this is an index
but also a gage story you can use a gene index on that day some value the genome that supports the containment existing exist and it exists or operators so I go ahead and create a review on here into my shock this was 10 times faster than index in each store value of I don't know if that you know maybe my computer was doing something else and doing so I don't have much time spent on the different than the last time at the bottom so there's a lot of talk the mass of the people world where the path officers patches I know it's a lot faster than store so when yeah so the database size used 341 megabytes afterward so we have about a 23 per cent overhead for the index so I think it's quite a bit smaller than 0 it's in the same ballpark as is reached of we execute the
same query again only now we're using containment operator instead of fetching and text value comparing it so I have to say in the product key with an object of the group keywords values DVD right we get the performance down to 35 ms and the same basic uh query with a story that was 28 ms so right around that that's probably within the do not grab my laptop as probably around the same thing we all right now in
addition to being able to just use the default J psi ops operator class with gene index there is a special 1 called J. Sun path ops now the advantage to this 1 is that it's only for the containment operator operator not for the exists operators but if you are only using operator as we are these examples it's a lot more efficient there's only 1 index tree in cheaper paths throughout the tree as opposed to the other 1 which indexes everything is an individual value so if I drop that index they create new 1 using Jason be ops have path thoughts it takes half as long to run in an index that so this is going to be a big winner for using his containment uh that's the only place we need to make sure you use an index so now the database sizes 220 323 megabytes so we have only a 17 per cent overhead for the index on the and if we
run the same query again we find that the performance is about the same but you have a smaller index going take a blessed remind us and this can scarcely memory-resident longer and be more efficient in general for lookups for the containment operator so it's it's a pretty big win I want to find the 1 compared dumping the outputs because is a binary representation like with a story must be passed on output format it has Jason text so dumping is actually quite a bit slower than with each yeah it took me a little under 2 seconds to run so when some beepers on the part of H star is not nested of it in addition to it a strapping nested the each store doesn't have any datatype yes I'm assuming it's the nesting and you know the numbers and Booleans and it probably has you know an analysis which statement of figure out the data types are I am I suspect there is some room for optimization on the text in a text out functions of thinks it all they can't ever told me that when they're working on each store to back in the day but this time I can live with this so when when is J. somebody you honey is J somebody when you 98 . 9 . 4 of course so in in your future when you don't care about the preservation of duplicate keys or the order of the keys or white whitespace appears outside of your values which you shouldn't it's great for object storage because the binary resentation and the fast operations and the gene indexing the operations are really off some fetched keys that pads are there all gene index aware that you can use expression indexes with B-tree indexes or indexes to get full powers that you guys had a expression index on Jan index the Jason array was amazing that that was really cool of
but do very mind that the I O is a little slower than it is going to be which is not so if what you have is adjacent documents and must immediate the whole document in the whole document down the probabilities Jason but if you want to be able to query the individual actually it's at the some object or value itself 1 he's is just so quick review as I'm running over time when you use XML only when somebody's making you you have all the advantage of XML with the jails of the data type that adjacent type in all the events they store with the design be digitized when he when he's store well you when is a star when you're OK with the flat data structure you going to get a little more performance out that when you are OK with having only strings and Knowles for your values and structure i and you require fast operations and the other thing I would say is that when you are on something earlier than 9 . 4 when should use the adjacent datatype basically when you want to document storage or preserve keys or duplicates or white spaces and when you J some the error variance be my recommendation reason so as most of my
thanks to all the Internet or additional amazing work to make such an amazing awesome side thanks to enter Dunstan and Peter Gatien for getting in and 9 . 4 I think this is really interact with the to thank thank you for that will you for an hour


  812 ms - page object


AV-Portal 3.20.1 (bea96f1033d39fbe77f82542458e108105398441)