Schemaless data in PostgreSQL
Formal Metadata
Title 
Schemaless data in PostgreSQL

Alternative Title 
CREATE INDEX ... USING VODKA

Title of Series  
Number of Parts 
31

Author 

Contributors 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2014

Language 
English

Production Place 
Ottawa, Canada

Content Metadata
Subject Area  
Abstract 
An efficient indexing of nested structures We present a prototype of new access method, heavily based on GIN and optimized for efficient indexing of nested structures like hstore and json(b). Introducing of the nested hstore and jsonb in PostgreSQL brought new challenge to the developers, namely, an efficient indexing of hierarchical keys. Those keys are consist of duplicated strings, which made index to be uselessly huge if store keyvalue pairs independently. We propose to replace btree data structure, which used in GIN to index keys, by digital tree. To do this in 'right way', we would like to experiment with hybrid access method based on of SPGiST and GIN. This is a first step in making GIN more flexible to support richer set of queries. In principle, one could be able to use other than btree data structure to index not just keys, but also the posting lists.

Keywords  Alexander Korotkov Oleg Bartunov Teodor Sigaev 
00:01
and after that we'll talk about some exercises on June in we support of G community and finally we come up with what gasses never so let's stop the 1st
00:16
slide is about the history of false humorless about data and political so the first extension to works with a set of structural data is also a story which we design designed and i and j belong in the morning until years ago here in that in latter we June index for each store and it's because part of all the school and the environment too would go to Jerusalem the which is so it was just as stating that apply potential that apply without any in this example of what it was to be the more and more flexible more reaches is stored away because they store is key William model and as the common base model and the last year we started a new work by total with support of funeral with all over it and you of stories that have was started to work on the binary and storage for each store and we call these sort of things go to the next iteration of a story and then we need to switch to the Jews only the body and sold all hell structure from what historians use all its adjacent B is submitted and you will will be liable to 4 so a lot of community work was done for improving all work and where where it but we still want to say you offense to develop a school the really the push that all work for the production of space and and it's I think it's it's due in part that is still need some proving
02:08
that for people who don't know what is Jerusalem B and how it's different different from the jerusalem this simple serial shoulder all differences soldiers on these potential storage is selected as the so if you with you can where white squares as you see the replicators keys all everything is as these injuries and we would have no no always places no duplicates T and they're all cases are sort that so this is all for differences between adjacent the and sort so why would you do this because
02:52
if we see them and because adjacent being was designed for performance and we as since the Jerusalem was already available from lined up to the level of many application called use this features like the duplicated case slice places so long so is this is the uh decided to introduce new simple and that but by edges on me it's a moral it's a new thing uncle union but just is of is a little the by and the really cold agism you will do will give a push to pull this going to the most leftmost with you what so here's a simple experience with the judicial clerks it's so more than you more than 1 million bookmarks dissident online mumble clear and we do this we just as in the form of success performance and search performs users can short of system of of shoulder ones resource so it looks like a like an nest but it's still contains all the stuff like want it's a race color and so on so
04:14
this is now it is seen that storage which introduces the over here it's not so white just less than the 4 per cent so all triple binary storage for Jerusalem on the of course Jerusalem and text presentation that looks the same and in the
04:39
performance so I will use a could become 1 tool all data in the database and we see that the fastest this sticks with small surprised Jason is the 2nd because it needs to do some solutions and Jason is this always but also wants to because if the user tells you need to some within the song and put to the binary storage so here's this over here all villages on the access performance existed for most we see that now that we have the Jerusalem is about times faster than the than song is just because so if you have a binary so if we don't need to pass so is the wave number is just deleted that data from Table and Jason exists and the song so here these numbers if we itself direct ways number we can see this clean a number and the other 20 times faster than the full we have and the nonlinear it's
05:51
just excel search performance so far Jason kills contains separated solely get the optimal estimator we use it just uh this uh querying and use and chip just feels element of the within the Church didn't check all elements of a because the film will just so reduce on with health accessible system that elements of the wave the and this could be good for 10 seconds FIL cycles for just
06:32
the which has storage we have contains at the rate of only about 10 times faster and this is just a situation where there is no no indexes involved but now we can Jason V. because isn't because my money in the storage we're going to have a June in this this is just a simple key and will use indexes separately and enjoying it would look more than 150 per cent performance and so have so this is who looks a little nice but we can get 10 times and faster if we cover you another OPT lost for gene which uses only fresh the and and good very very nice number of soldiers the area uh altered falls Jerusalem and then of course
07:36
we could yes yes I will show all numbers yes and of course we can skip the comparison with 1 so we use 1 of the light soften of when being the 1st surprise that it was full full pull this thing is that the mean is still toward data into waves and is only just because for the different 2nd and so it's sequential scramble so model is the same symbolic 1 2nd if you remember said to 6 the final and there is a 2 6 1 but it's not the same as doesn't matter in such performance is when fast it's just 1 musical what Jason these and it faster no solid in principle it doesn't matter so we can say that we have in a way right and isn't because the foremost also the among the DB and let me and notes that uh that 1 beginning use the expression index so that kind of thing this is different among the and and boulders were carefully monolithic index which supports the search for all t from 1 beginning you need to create some sort of special interest for the sky and so all indexes moral universals and you would be if you have a comparable performance so
09:10
this is a summary so as we see that Jason is 10 seconds 6 colleges and the the 5 ms g of class positioning is that if you the Jews and the past operation which is essentially based indexing L + 4 gene with the but the most performance in this among the index i Jason made of what's there that 600 megabytes and thanks to the work we done for the gene because beautiful before our compression it wasn't just 815 megabyte so you of your from the adult form will have lost the will have a much but there was an index and here is the is it was saved from 160 megabytes and jason be possible relations just above 300 megabytes and if we create the expression in the old plants we will just 44 megabyte and for the the rule the would get just 1 of those 6 and device expression things we can query the number that you use for banks is just huge frequently they disagreed about and that the decision is a company of the common dataset 140 constant with 1 of 6 so it's so I don't know what 1 would give but this is ridiculous it is said that there was a site this is just a lot of good free and you might wonder is much much of Europe and in the performance incredible 1 word is a sudden minutes I don't know what I use this on the procedure 1 reading but common and I try to optimize using Google but can't find anything like this and so uh this is a state of what what we can afford 9 before and thanks for the company and you have to support this war at least all the so this is what we now the currently we can
11:20
search Jason data using the contains separated legacy contains country and if you miss a grave affairs index of what genus support heroes and this this is the but these should be specified from root there is no really really to rule of relativity wildcards keys you have to specify all keys so the full parts that would have cost so if have a clear winner separated Jason you wouldn't that would have existence at the rate of so we can check for the key or array of the NET and so on and it also has an index support but we also for the only on the so you don't evaluated against the chances that deep data only good level T so it the has and the soldiers in the past so we can use lenses of course we understand it doesn't look so little bit and the 2 way in functional indexes excessive can do not not a good idea so we decided to cool the work to continue all work on and the the something
12:44
like Jason nuclear language so we introduce something like this in the query language more operated from keys on will use bytes support probably the most of work and of course this language glucose support so we introduce duced created Jews communities said Jason query language which is actually consist of official that a pipe and much out of a parade of soul here the judicial means I don't know how to pronounce the name in English can at X I don't know if you laughter Wisley this elected elected governments that consisted yeah but follicle so much of much but at the stadium in
13:37
your decision and the other thing is so here is the DNA of
13:43
amount I will just for inference we knew that want to explain everything here you can deliver a presentation and look at so we just
13:53
explain some important part is that what is the past Olympics is and what importance we introduce some some science and like their number sigh how you say this is the number of samples I and all that the shadow of KTC with many different was say this is the so it means that any element of so this is the same as the used to so with the and we specify keys with the number of should shop number science people to follow the of tools also produced 2 and persons means NET so the same solution but we are specified just the here is a maybe key so it's like also peripheral uh cities or while got it means anything at all and every or key doesn't whether soldiers else'll produced through an and the dollar sign is a continent so it's very convenient when you go when you have a particle of complex expressions so can use just like a placeholder I can see all that was yes yes and they know it yeah so so 0 the chair of the union with this committee on the back of this is multiple this anything if you know anything any any level was In the multimodal just multiple levels just below the and nor is a complementary so it is convenient to use and complex expressions and of course we use a we didn't use the courts but in some cases when you want to be sure that this string you we suggest the use of double quotes because this can be it can be a pulsating books like like it will decay it is not known until it no but was that the this is the law of here so this is what we just did maybe so a lot of follows signal but I want you know and there's another expression so this will expression have something like scholar commit chips for use in in the in operator so it's also true all stressed produced through different teams so we can just just just the if there is a key labeled the is just people while copper so in simple for the commission and we use this expression and they overlap so you got should therefore elected and only it contains will so incomplete so this is due to the world what about the well was was a lot of work to be 1 of thought that I was or the Midwestern driven moon is not uniquely human diseases summarization they think that the general generated of just rules holes in the desert don't you use practical was much so it might my sister in the past and g really the out of the them in that OK
18:24
here's some examples called clicks on so it looks so if you uh if you want to find a products similar tools to this ID and therefore for the sales rank can range of you on query you you can write it like this and and I don't know how to write this morning stuff comes from this long relate to the Interpol survey those long so I just did knew that right in just give it looks looks like this and the number these looks also little quot like and the base layer of like a pool so this is just example of this and here
19:13
there's some great we share the same query l additional in the 1st part will be the 1 2nd and the we do Jews communities it looks very clear when you went ashore and is even faster some co away but it is impossible to use the data from the area of government has some idea that lies less than a year so we can have
19:48
some that colossus for gene is subsequently used to community 1 is the adjacency well you pass operations IDglosses them and we need have some suggestions from you because it means just that will yield the floor on top of sparse and the number is just passing will so will you under the past and this is used to mean the tuning fork matching so this is a very strong in the transformed to that we will you we presume will you and here is the 1 in blue submission of all keys so if you have this 1 we can use Wikinews so the blue if you didn't like did in the rational 3 in on undertaking forays and this is 14 matching so we support a welcome support and what will for the range queries and what it's worth what's not so good as this 1 or another of course which is much and this initiated translatable but official this to the past and the combined with the will also of course it has no welcome support what it says no problem with a range of users in the sizes so we see that this new way of colossus has the same size and in this from a slightly lower the 2 big and finish with old of class what we see that this is this is designed to support much more richer set of operations so we most of the work and all the work we will use these and here's some number of total
21:45
for example this layer created I sold you several times already know here with this this is just his when he was a good performance again like the 1 with the smallest all all over the and Hugh Taylor and of course we use will you pass will be possible class in the here we use a welcome example of welcome that using of studies theory and the performance of student what the needs because of well done then the next we the right it's
22:35
but we found the thickness of the process which is the the the number of reviews from articles I don't remember somebody Boston in the when increased solar farms yet well I found so far longer than we would just for a few years it's just a multimedia also use uh has the restructured and so it lies because we have wait in number so we control you all to work with numbers some range of parameters and that is the number of sources then the table sizes wondered Floyd G divided and we
23:17
around several curious here to find this because is very fast and just really fast but here's a way
23:33
slow and this is the problem we will discuss in on confidence because uh unfortunately Jason has no statistics and more more work when there's no will access to the bottom doesn't work ridges and so if uh so this this 1 this uh it's expression now we just had and this is not to initiate if this range to In principle that you could not the tool during the using the for good and now we have the the world 129 this equals 1 to where they want to do so
24:21
in principle you have the same problems as the very you know mainly because the the same optimization to take part of the is mainly used in the sky was full of the features of this single died in each of the counts based these problems faced this problem of areas and
24:51
longer just ready faster because it doesn't use the index on this the sale of its use in it's only for all these key and other and the image sizes a 400 megabytes just for this and similar products this but is much faster so if you have a plan of support we can
25:13
rewrite the willingness to and wants to emulate that the measurements of player has support for only so to emulate look at what is good numbers just call of musical so what doesn't show which means that we have this has
25:34
the potential to review committee for of musical when 1 good advice for for some so we just need statistics and you blind of support this is the sum of rewards where we will discuss among confidence and all people who want to to help us especially the way you have that in light of this the ice and there is a nice thing that's all this stuff is just contribution is a complete model so you can always it and usage for online for it is not included to the decision to the reason I I wanted to just walk away and we need your finger a especially we need some real simple latent queries the what is it that I actually want you can communication because everything here and not only did this happen this presentation what about
26:40
winning wants will always need something that we in the recommendation and genes for 1 and very efficient and and effective in oxygen efforts and the moral and we that it's will talk about using new neutral ones all the so I will stay here a while details but we want to use it because it's what you be good for the application funding for the beginning uh we need in the sink Fujitsu meeting with accumulation of past and will use of the past we want to not question we want to have past as is but Sterling passes and this requires a lot of stories we need to solve an experience we need to extract from the wall usually bookmarking extracted unique unique past and weighted between X so we have in the text part of the operation of lost is huge so that size of labels that line and many of my 100 megabytes and the size of the 3 indexes fossil no way to work it so we need something like is produced with him as genes which can be any good candidate for storing uh deprecated got there for storing passed on and eventually we need the indigent and with good small in the let's say something about it and there's still a long set us that the slide he he decreases size compliance will listen we just have had so we just have no time including the simple but the limited this number will be much much worse so withdrawing from this picture that we need want gene and we want something like this species so here's the
28:51
need to kind of look at so we need to provide interface change could In of which used in June 4 bands of the tool as it is here that will make it go even further for example we may change the in 3 and then we can get for something like GIS aware fulltext search if we use opting for example is that the 3 so we have decided to bully with new will create continue in the success methods which we cold what kind and so this is a very moral fault all and there now then most
29:43
probably spot and this work was done while adding some the current go solar he's continue last year the island also told so how potent you the microphone is this is left for him in the right have reported within the so called by the state behavior the age of the universe and the so the the the
30:36
the idea like about what what OK so if you want
30:48
the full busses go gene but the uh it would be here is a would be a quite a long so and the entity in June of would be a large so that's why we can it a place at least that's it
31:12
is business entity which will be a store prefixes and then the list of Boston doesn't contain the specifics but the this is used for resistance would be quite more complicated than just added to the cost and it's these would for he's lots for memetics leads us the binding of the tool for separate range queries 10 science and just created from all that some of some things other than a quantity frostings as then we explored pressures of things are in there also in Linux the and the rest of the structure is the same energy as they each couple in this reduced coreference who wasn't list all wasn't the sole yes as
32:19
this is in the face of what can call this world and the student the structure so how what the curve on thickness of a life has been used and as this specified entity or class of sources would be SPG stop glass gist of class and so on uh a period of course it would separated the door the dead duplicates so the use of the same outputs in the use of force the front come where it's not really a the same gene is in gene and extract related but not all of the term last about not just the entity itself but is there a period that this comment at the end argument was gone and at the this ability the candidate or not so I want all talk more people political and it's it's it's consistent and the countries that met the methods are seen as energy which solos size
33:43
of the index using worker is involved in it's a little bit larger than possible year in the past month and probably in it's still can be improved and because of dodo works this nice yes and there is also the constant in and be slower but it's also because prominent knowledge have just for the fact the
34:18
soul only simple grace and it show that involves the same performance as a student you and it faster
34:38
and all Nemonix instead various similarity relational for size and construction time as this is not the same for a lot of money and the this is a little
34:59
upgraded user would call plus which is a best nearly as faster than
35:09
yesterday about this when we fail we arrange queries in several the same problem responding yes but also in the media can even if you could do this condition in invented chat uh evidence the game winner fast as this example
35:39
does what can use the same 1st as you want the z use so fast on only when you have no quite so much but to what could have no notion of quite so much that you just say see if you have 1 handy for more than 1 entry of might and separated yes and in this particular case we care for Indian refer quite so much because of that because of range yes and what's in this particular case only 1 and they might as this condition and that's why you want can use fast on and this great these last class like this and it is about as them on monsters in 1st time because you can be as fast as water in this case
36:36
and also on World Cup also need some additional information as Union was inconsistent with increased because you know that in this query this is was loss is about the same element before a what's in it that's Z you can distinguish different elements of oral and then you have a lot of uh because you would have to give a lot of results from and that's where different elements of worry about my action corresponding walls so positional information to help us a lot because most part of time we spend nature chipped in sheep so
37:27
will also there is a different kind of what crap and they can be implemented with many useful of classes was what kind of
37:40
definitive of the beginning of for example as a case in which will spend the Phoenix in so imagine you feel really wants by nature and you want to find this point is used to be 4 so if you look carefully at ambient across each but in there were really large in beer his property you know and there
38:13
is then you will have to do almost as sequential strongly even you hear from an index so this so this budgeted DG's is costs for after the this is the
38:27
example in water like Europeans and yes you will have a really large and there was them but if you will fell from 0 to 1 and but multiple boxes as and you can see the flow but he lies lies with the governing results the space is also so much affects the space so what goes below you do the such a plus in extract meant that you the right 1 genetical objects in several and bounding boxes so it's botany
39:12
and yes entity would be if at the basic gist you we need some algorithm for community wanted wants music dongles about there are some researches on the validity words in France several articles and the convenience some part of users want reforms that's what felt also what
39:44
fellow so it'll problems yes as 1st we need to somehow update can point in entity using content in the face of access methods because it can only insert new and the new and new local someone about these white wine and expensive probably need a new method to lose is also if you will have a few real implements so what coclusters in extension now it's quite hard to get or it's over all clusters on open it's but it's I think it's not a problem yes current prototype were stored in an entity in separate files reset a seperate the future are relational data we should cut construct of the flight is this is just a prototype and the question is what the standard infrastructure to handle on but so we need to discuss it on the on conference and there another question is crucial for what kind the so called toward compliance deleted place in which interviewees arbitrary access methods as well not only entities but was 3 but in this situation you of differently should still store and multiple wasn't this is the same file and there is there is a lot of questions what ended in in the the fish samples is shown in the past access method it so the book number called to walk on this so a lot of questions in the summertime
41:37
this is just 4 904 itself just on the query language fusion of clusters and you can find it on my injured from research prototype of what counts as if there were a lot of ways of improving its infrastructure and our work was somewhat that's why the local when infants to them and the that's all I
42:07
need us in Monday's and many of whom you have met with them in the world will would be needed and what yes this was quite
42:33
lot you'll have throughout the year in and have this economy yes but we have to cut economy what comes to mind and then the foreign summary and the idea is this so questions about the animal and they're all the time and so only really good for the problem of poverty as will be available only you know some of the things that you don't years since possible lotteries I believe instructor will do just that just going much independent and which real have full what Jason I don't think that the life of the users click on that but with a lot of filters in the years got it because you know if you have for instance depleted Kissinger some but just created anyway in cancer and singers and so on and can distinguish the order of keys so Governor Gustarson of suggests that there exists in the world in the lab where you where you live your region level in the flooded with the right of public health and very very good in all the universe parameter all you have a visual query language for the full of so you have seen that have numbered in the language so you you're crossing the among the new language from you know like the shirt on there was also a kind of FIL uh so you want you can it based on more on all under no you're probably put that I just cost from dishonest mom did indeed exist I mean isn't just going through some parts also on the million again the answer once and try to capture the attention of all the the while we really will not really exist thank you for the 1st version of landmark and initial there is a lot what about the people you look the the good news is that we want to build and Of course there is 1 of the things do you use of the of I don't know where you would get really really is like exists at all but 1 just like the but not with the view of what we just call it the because we need to find a lot of time to of the 1st thing you that you would have the statement that there is of good points from government yeah you don't have this kind of the same what is In order to diversification and the rate of inflation this is the case and you question is whether 1 can have had a lot of people that that have happened so this is in this form of the optimized for fast access I mean you instantly for natural or storage unit a lot of them on the so this is just Jason these optimized for fast access so it stores a is sort of so you can use a binary search tool finds a particle keys so it's a really fast so let's discuss this on on on confidence this time