Merken

GIN in 9.4 and further

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the you yeah OK it started on 2 minus taking environments and I'm going to talk about sturgeon improvements that were made in my point for most of the code is written by Alexandra growth cone of I did a lot of work on refactoring them and discussed and list on so the store of these factors but I also spend a lot of fun working on them so that's why I'm speaking I might lose my voice at some point I did last nite so if that happens I'm going to chew over to him so most
resources so that those who don't know I'm taking the work for the American click on my work and all kinds of back and stuff and posters have been doing that for 10 years on this is a lot of stuff I work from most format for anyway we did 2 major improvements through June another for the cycle 1 is that what we call compressed posting lists which basically means that in mind when foraging indexes are going to be much smaller than before and are generally smaller is better when you thought about databases because smaller means this faster and the scan can them fit more fit in RAM takes less space back out so offensive benefits when your data is smaller became so that's a simple significant thing we did is what we call the fast scan that list of the patters called I tried to avoid using that name anywhere in the comments or documentation because I hate that name of fast can't see anything on what will the feature is really all about is that when you have to search through search for multiple keywords I so that 1 of the key words just happens brokers were frequently and 1 of them is very rare we do have a lot smaller right now so that that's faster before would scan the posting lists of both items which means that you get the performance of you know you're bugged by the frequent item you have scanned items of all those things that happen all of them were now we just you know stand the rare items and verify that the frequent item access to I'll come back later that so part 1 of this presentation is going be compressed posting on just to give you an idea of what this does migrants simple table could numbers has 1 column and I am sure that a bunch of numbers that which goes from 0 to 9 on 1st of all I use the extension between Jennifer not familiar with that is that you should become familiar with that it's really cool 1 false you to use Union indexes on 1 images and text and all that kind of regular of this allows also to use gene which is as a replacement for between anyway once you do that I create a created of normal B-tree index on this and I the Gini index and
just to give you an idea of a table looks like of contents numbers from 0 to 9 and there's 1 million of each of these roles OK so there's a lot of duplicates that's the point of this margin images very at and duplicates that's what I'm trying to demonstrate here so there's only 10 distinct values in the table but then Israel's
on the outside 89 . 3 you can see that it's much better than a B-tree at storing the integers only 58 megabytes were B-tree indexes over 200 the table table size was what 350 megabytes of member all 9 . 4 it's only 11 megabytes for the gene index over there so that's that's that's a big improvement I think this is pretty much the best case scenario for for this fact so I wanted to show up but in any case never going to be worse than it was before it's always going to be the same size in the worst case but is used going be somewhat smaller or even a lot smaller happens to be effective on your so can explain how this
works to select basically this is what the gene index looks like 1 way to think about the gene index is that it's a normal but it's heavily optimized for storing duplicates are so we have a normal B-tree what we call an engine tree and whenever there's a unique value that means that the the thing that this be treated just like any other he works like a normal between index of but if the 1st through items with the 2 2 rows of the same value then instead of inserting the 2nd entry into the the military we just create a list of all the ones that have and if that least euros big so that it doesn't fit in the page anymore we create another page are just to hold hold the list of items that all have the same key OK and when we feel that creates 2 group we create another 3 of 4 pages that all content just duplicate items that all have the same key value that's that's what this looks like on this so each of these
items that is stored index is what we call internalize pointer by life consists of the block number which block on the the actual cable this role is and those a 2 byte offset within the table where where and that of where that faces us so we require a normally 6 bytes per every role in the store the pointer from the index on this is the user location we use a few of these key ideas from the system we use the notation we block number and of the problem and I use that efficient on so in a normal B-tree index we repeat the key for it in the world have 4 rows in the table and they all have a key value for bar and all of the tree is going to contain for also all had the same key for which is 1 pointing to 1 of the whole thing and also because the Gini index we just in store the key once and then we have a list of items which makes perfect sense and why would you store this on and actually i've been wondering for a long time why don't we do this for the normal between index that makes perfect sense but we don't know why but gene index does this and that's what makes it so much smaller but so for example in
this case where you have 4 roles in the heat that have the same key along the index when it's going to require 37 bytes to store the list of pointers are you have some fixed couple overhead and yet the key which is full and 24 bytes to store all of sudden interest and the form that
we use in i . 4 1st of all some guy and this is the list of item pointers sorted by the physical position the which becomes important because now we can take advantage of the fact that if you have on pointers that all point to the same page we can compress them very efficient because they will have similar values on base with reduced the store a list of pointers absolutely for use for the 1st buy them as usual but the others are just stored as delta lost from the previous 1 but the difference from the previous and that itself doesn't help or much but then well this they'll loss but
typically very small integers if for example if you have 100 couples on the same he page you can have 1 1 1 1 1 1 1 that's the difference 1 fact and those are small integers and there's a lot of algorithms on how to compress a list of small integers and the the the compression algorithm we chose something that's called for by encoding all it's basically means that there a number of smaller than 128 we have the 1st bit set 0 and the rest of is the number if it's more larger than that then you add the bytes with the 1st bits into 1 and so forth and you can encode encoding of these things but it just so happens that in this can coding this it just happens that these so that you can encode the full range of values that we need in 6 bytes even though they know every time you compress there's going to be something that for some values but it just happens that old format was so propose and the rows of free bits available that between never need more than 6 bytes and in and went you always need 6 by stories of this up so you're never going to lose with this compression mechanism could go compared to what we we had 3 which is very nice is that indexes are never going to be larger for so I
mentioned that if this list grows very large that doesn't fit in this page and more than we create multiple pages and each page is just a large list of items compress the way shown for all it's not actually just 1 list it's a splitting the multiple segments so that you have to update you don't need to read court all hold page just the part that you updated on we do this for both posting lists that are in the book lists and that can and should be 3 and also this all of which are about the compression
algorithm I spent quite a lot of fun reading the literature on on different algorithms that they're bobwhite encodings what Alexandre implemented originally and so that's there in the 1st version of but then I went and the fact that the code heavily to make it easier to experiment with different different algorithms and unattractive you once 1 of the ones that I tried was called Simple Minds not forget how it works but it it's supposedly faster to operate in a more compact than the war by encoding and I think it was somewhat more compact but it didn't really wasn't a big difference along interesting point here is that we could use any encoding for example you could use a bit mac and that DSS effectively give us a bit mapping discussed this looks exactly the same as a bitmap index dust except that the users worldwide encoding and other bit-mapped store them to put on so the input what I made earlier is that with the target which shows the so this is another point of with this algorithm whenever you remove an item from the middle of the list of all the total is never gets larger many other algorithms don't have this property which means that if you delete the role in the middle of the vacuum if you would have to make the list longer just encoding just works that way for example if you did run length encoding and you remove something in the middle of you have to split into 2 things that are to get a larger than what had origin which is kind of annoying because that means that when you vacuum you might need to do pay splits evacuation so forth but this out that have from this of the of the of the group OK so for example if you wanna let's say that you have they're facets 1 2 3 4 5 and you want compress those all 1 1 way to implement that would be to have a big mac or world world said encode that on all those in the UK central Brooklyn e-mail explaining he came up with a pretty good prove that that is the case but basically if you have both
sets of war but encoding I
probabilities work of let's pretend yet you can not be explained currently related to keep all of this is what we would like to know if he was 1 of the so I what did know what was going on all of the use of 1 on what they're going to get the fact so this was a very
nice property of this this property was really nice and that immediately ruled out many other algorithms I think simple line had this problem and the worldwide sold on this immediately ruled out many algorithms that we might have to on but if the code still kind of model and it's it would be produced 2 experiments deal with different algorithms and you can even use different algorithms depending on whether it's 80 or not but then it gets really difficult to ensure that you keep that property of a someone who was going to be yeah maybe quantity with the size of the of the whole the editor 1 idea I've had is that they could keep a separate bit-mapped of problems that have been deleted or something like that and you can get around that way that gets a lot more complicated so no even if you can guarantee that it can only increase by some value then as soon as it can increase of all you need is that you have to be able to speak during backing which kind of sorts because it means that during which might grow when you're trying to make room for example if you run out of disk space you might then backing to make some space and now you can just vacuum on things because there's no that's based on there are actually at some other code path through he held that can still happen for example the fast optical we did earlier which means that when you insert the gene tree we just put it in a list and then we go and merge it later at that actually has the problem that it's possible that the run of the space when you back you use you would have to require temporarily some more space but that's a different problem I avoided of so that's
part 1 of my presentation that this was the postings wasn't really just a summary of the gene images will be much smaller smaller their code can still read the old format which means that PC operates still works but obviously you're indexes not magically gets magically gonna get smaller renewed PT operate on there will be converted as you update them so when feel if you insert new new rows to the table we will convert the pages on the fly ash as necessary our blood really using gene exists and you upgrade I highly recommend that you read index sometime after the upgrade just to get this benefit the much smaller didn't have to if you don't want to really can do it we can draw some on the this compression makes updates Randall updates of anything in some was somewhat more expensive because though format we did was just plain list of flight and pointers and it was pretty simple to just insert something in the middle and make room on the new format on you have to be called to segment and the nascent with back again we could optimize that on so the orginal version in that election as submitted did something more clever at split the the compressed list insert 1 value in the middle and then form and back again on don't do that currently but we could if if it becomes a problem but in general Jimmy's isn't June isn't very fast in front of updates a laser on thing it's a huge problem and it's not that much slower and yes the all yeah from the dust is are going be much smaller now so you want to choose his genes in more cases than before in the world I couldn't give you any numbers but by tested it but I tried to watch made it as fast as it was before my gut feeling it's like 10 % lower in the worst cases and so not too bad but I tell if you have a new scale if you have a test case it would be great if you could guess that water still in beta and if there's a huge regression somewhere in some corner cases be it's still possible to fix it just sensitive the bolting all no don't think there's any difference there it's small at the beginning so if you have some below that's OK does not occur so that you have meeting in the gene you need indexes you can't you can't build a normal unique constraint on engine in it but you can do an exclusion constraints that is the same but if you want if it's a unique index this this stuff helps just the point of the optimization is to optimize the lists yeah the question is if you can do a unique indexes origin yeah but it wouldn't you wouldn't want to do that because in if you needed unique index than the B-tree is much chose gene is optimized for the non unique case so that kind of thing OK any more questions about the posting list compression before I move on to the best of this 1
and this level now so the 1 thing I'm going to come back to that the Gini index the B-tree just indexes 1 value per row can the gini index can index you can have can values for every row in the table for example you build text search window 1 to see that the so
Eric him out to continue come back to that so there are 3 fundamental things that the Gini index dust of 1st of all when you build the index for every row in a table is going somehow extract the index keys from the roles in normal BG you have a B-tree over an integer for example is that the images that index but the Gini index if you build a text search index for example and you have to start extract all the words and it's every 1 of those roles in the index so if you have 10 words you can have 10 entries in the index for every row in the table and that's why usually end up with a lot of that gets to so that's 1 thing that the gene doesn't extracts these keys the G 1 index then stores among and that's where the will of the compression posting optimization comes that it's much more efficient at doing that now I and then the 3rd thing that dust so once you when you search and you figured out which he's you're searching for it then combines of the matches and it tries to do that as efficiently as possible and that's where the 2nd path comes in OK
so here's an example if you have a like this on an an advanced PostgreSQL well open source database what is going to do it can extract these words and then it's going to do an index surgery searches for posters and it searches for advanced and open and source and database and then it combines the results so that it only returns the worlds that matched all of along so so yeah
that's that's so that's 1 difference between gene and animal between you have multiple but at the on this level there is no difference if you could be answered multiple I consider normal B-tree index and the code wouldn't care basically with our plenary do that for animals between that's all they point to the heat couple yes so
when you run a query like that so what we do is that we collect these lists for every 1 of these keywords by doing an index can induce catalog that we actually do 5 scans of the beta and X 1 form of words in your query and then we combine the results in this case the the query will look like this the
green means that the customer at all of these words for the role mass so the way it works get this lists of items for interest and they're nobody compressed and compact but and then we walk through all of this lists starting from the meaning mind and the smallest by which in this case is 0 . 1 and then we check dust dust dust that satisfy the query became 0 . 1 and only match 1 of these roles we call over over the classes of consistent function which in this case it's the same note this doesn't match because it only matters 1 of these of keywords then we will move on to the next 1 which is 0 . 2 and now we find OK that matches open-ended source and then we call again the consistent function and it says no it still doesn't matter it's the loan amounts to of this in what to the next 1 0 1 3 again only 1 match and a consistent return false and finally we get the whole thing that matches with 0 . 8 to find that OK we have a match for all of roles and in this case the constant function will say yes and the gini index will say greater than going return this to the clients and you finally get the result and the thing is pretty obvious that this is a this is a way to do it when you need to think it's
quite obvious more how it should work in retrospect so in . 4 fewer case like that where you have the you only have 2 matters
for the work posters and if you know that because all of these words it kind of makes sense that you should only matter you just stick those 2 roles why do you bother going through all these other lists and that is exactly what we did in point 4 1st of all the
order of these words of by an estimate of how many items there are so we begin with the word posters you take that 1st row and we check is there a match for all of these other ones and it turns out yes there is so we immediate return that a then you move on to the next next for over all that matters PostgreSQL and we check history maps for all of these other rows and we find that there is and then we need to give up and downstream country that's for explicitly know that there are no more matches and we return this is all there is a lot faster than going through all of this list some of which can have thousands of entries of so there's a trick here
I mentioned that the says it has to match all of these roles but Jin index doesn't actually know that it has to call the consistent function without combination of things that the match and you will return true false saying well was the matched doesn't automatically know that it has not all of these rules if this was an awkward custom at any of these words we obviously
couldn't do this we couldn't steep entries and can be would have this on so for that purpose
we introduce the new function that the over the class can defined as a sculptor try consistent I of try that no consistent value you just give it the true and false for it's either and this 1 you true false or maybe which means that the gene can ask the crime consistent function that OK I know that this match Pope stressed but I don't know if this rule matches any of these other things is it even possible that this row can match regardless of if it matches the lower and the consistent try consistent function can return yes this is definitely a match no matter what you have all these other keys within the downfall saying you know who there's no way this can match or it can be turned maybe it says I I don't know you need more information and we use that to determine
whether it's safe to do this keeping a of the simpler case which I
actually committed 1st because it just makes more sense this 1 doesn't require the consistent function if you do the crew like that we test the match if you use the and operator directly in your query and not hidden inside the T a search query the search for post-disco and open source database then the gene machinery knows immediately that it's in and caught has multiple things and are this you can do even without the try consistent function I that's
what can and questions yes that's a good question so how do we know
how many interest there on the spot all the these scans 1st so we have the sent the tree the entry tree and there immediately if if the if you immediately have there the list of items that matter and it fit on the same page 3 which just kind of know or already how many there are but if it's a longer list so that you end up with the earlier if it's if
you have so many entries with the same key like can pass and that's the end up with this post injury then what we do is that these go true the posting tree and traverse the bottom layer level and estimated OK if if it was 3 levels deep we use that as an estimate of how many many matches there are it's not totally accurate but it's it's good enough do basically trying to do is to separate the very rare items we have got to matches from the ones that match 10 thousand entries and its is accurate enough for the day all I can be it can be used for indexing search as it has been implemented or has had no way of thinking about what or we provide true but if I'm also somewhat full-text search now but used to be changing for example in it dust so that the overall know about so long as you want and you will not mind was there's a good point only supports bit mapping extensive and here you know what you want to know but aside from that problem that I don't see any if this was not possible in every case but there are cases where it would be possible like the example I used the earlier which is kind of silly just and it's of numbers but they do have those numbers in the index you could return them on J. just hasn't been implemented for the special case where it's possible for somewhat full-text search you can't get Peter yeah what this was the
much of this kind of works
like a marriage joined that is 1 way to put this on the problem with this method of right yes OK so I guess 1 way to put it is that the 9 . 3 way was like a merge join the scan through all of the lists and in order they are already in order to read and so on so it's like merge join and and the the as and fast scan mechanism is actually more like a nested loop changes you just take 1 of them and you then go check the others so that's as well as for this in general that is you can on the so
just 1 more thing you're along the way I did all of refactoring of just the existing code and making this readable and possible and 1 thing I also changed by Mary Jane the crash recovery works of splits and I did the same thing for B 3 and 9 . 4 so that the very internal this gearbox fixed to be honest but such a rare obscure Bach and back that that up until I'm just trying to say is that if you have workloads that you can test with please don't hold 9 . 4 be done and test judging indexes and you'll be trained actors on so yeah there's
some more stuff we do 1 thing imagine this like a joint and 1 thing i've been wondering for a long time is that gene is actually like a mini planner N-termini executor of it's all this kind of silly and so you would be nice to actually pull some of this fast scan staff and could just pull it in the exit main executor give you joined together with other indexes and all kinds of interesting started with 1st didn't they call of this into the gini index code on another thing where you can be couple things is you had all like and others talk about what kind axes and I don't know how many people understood what that was all about home if you were and what kind of it but if you how many understood what was a lot so long as the other 2 there is not that I expected
even less as so what what what guys above all 1 aspect of that is that they mentioned that this is a B tree and this is a the 3 but all of these stuff that we do there's no reason why it has to be a big victory it could could this upper layer into tree could be just index or it could be an SPG standards or any other index type you come come up with and we could still do this compression stuff with any other index and when gene extracts these keys like splits so full-text search into a words and index all of them you can do the same thing with index so what what guys all about this to separate these concepts so that you can build gene index over is produced origin index over something else combined it's different combinations it's interesting there's a lot more use cases and just gazing that they cover because like built to a very efficient 3 of points that's really good at storing duplicates if that makes sense I don't know if it was on there's all kinds of interesting stuff you tend to give the couple these things by values you have shown earlier how interesting have thought of that yet this practical in the in the in the middle of the 1 of the things you can do yes there's actually another point I want to make and I think I have a slide that on definitely
yeah my final gene yet this is are lot smaller than the 2 index indexes funeral of duplicates normally be normally people think of gene Hassan and access method for full-text search for kinds of beer stuff it's actually very useful just discipline replacement for the 3 you have people to because of same kinds of use cases that people use bin packing boxes for said area if we replace the compressed compressed posting lists with bit-mapped we would have exactly was a bit mapping at yeah you could make use of the considered the user-controllable there's also no fundamental reason why all you have to use the same mechanism for all of the entries so you could have a bit-mapped for stuff that Bovary contact and list of some other kind of stuff and the gene machinery could just to deal with that just fine yeah you can do that in the all and there is that even then there is no if you use it just as a replacement for tree in the upper enter trees just a B-tree it's just trans implementation of the trees so we just worked on that it could be there's no reason why it has to be smaller than 1 so that is 1 future direction we could go and enhance that and add all of those concurrency optimization and all of them are stuff that we work over the years from all regions that are the way to go would be to just throw these away and make the normal Beecher codes smarter 1 storing duplicates and there is no fundamental reason that you couldn't store multiple-item pointers for the same properly normal the tree index to be trained X code doesn't care that you could when you have multiple gets rid of the it's the normal B-tree code to then do this list stuff and then the compressed list stuff and you make a generic and again that is kind of when you end up with this what kind accessory have yet yes that's kind of yeah I few patterns more compactly than you have comparison with that so you don't want to so both your indexes just through this convention it but the and this is a good thing especially . 4 but even nine-country if you have those column like status that only has a few values on or off and then you should use the tree for that managing for that there's a not only 2 of the that leads to the for following the rate of growth for that that that world yeah so there's some more smarts in the B-tree could that be just hasn't been implemented in Jindigo because of the way that it will ya that's not the my wife we will be immersed in or we could become post these things and do what color rather that I'm open to ideas I I do looking forward to discuss this with William Alexander and interpret tomorrow what are we going to do what kind it is because it seems like this as a set the gene index is like a maniac executor and plenary itself and I'd really like to speed up the process really whereas changed my life or change in original independent yeah I think we I think we always do it so I don't think it's ever going to be it's quite significantly slower there's like this was a bad ontology it's not exactly it's like a merge join with keeping the no it's not the same algorithm and they had actually skip entries and not 1 of that and have all at the same as for a more normal merge join you have to stand both of the lists from beginning to end to find matches but even again this is 1 reason we must put this stuff in the numerical executed this is a trick the normal merge join but also do you have a 5 and 10 David immediately steep everything after 5 up to point 10 on the 1 side so if you're if 1 side of the emergence coming from an index indexical filled the index to reposition needed the tenants to call this fall in between no no that I guess the room generally and yeah I I guess the way to put it is that it's not really a nice little chance it's like merge joined with skipping that but the characters remember through anything you want just like other terms that would also like to add
by the way the here in this kind of thing and so is
that the roads were so I don't know if you want to hear what we will also
hold it and you know and also we will restore their uh how to hold a certain borders is very flexible but it's not very fast because of the use of full full stop the basic which is only a 3rd of the year here and we go out of focus which the soul of the union of the people in the following way so we must also go with the group all full-text search so we can compare all of the little and this is 1 of the
Greece I should go wrong and this resulted In this which is very fast the people so so he was a whole lot of musical and scholars would have us so if you were going this
book so we just isn't music and it was always the reason for all of June in moment you know comparison of the old approach and you will see that a little fast and then we give a
real what we this was the the over the years like queries that will be needed the thing is that although the rebels in which we really want to and know all the boarding it and the use real life we all around with all the all of of the University of all with you will see that that since the last 15 million queries and how would you June provide us with people who would have thought that it would be interesting to important because we use you don't move alterations in all of the search and experience in the region of
the world and because of so that it has all along with of of what they all these 2 of students and the case where there are several other questions still only for the about what we will call that and we all know and that's why I only invited by looking at the number of books like looks like very slower the building next now that true yeah yeah it's the just the of the rule of law mining and the 9 . 3 plus hatch sense that what can I didn't know or see that broken and the relative that had a historical review and here there was nobody interesting
because original called you know this last column is and what it up I will show you the institution like case that was the frequently cited the regret is at peace the is that well so we felt would be used in his of all of that we have to know what what the father and the father of the of so that occasional false so it's usually the user from medieval times glass you this procedure Oscar of goes beyond the standard usually heats up quickly and the will use this
is true examples of all that actually work with all the the and
now all of was that this is what it's
all over the world all the do you
want but just the slides feels that you have both like this doesn't look
right here have right because of the fact
that and technical problems can
just saying you want to hear those lies above 0 his but it goes out up work the full screen there are
the system in large the or with the
required to remove the word of the of the of involved in the almost of the we have to solve for like the and and
that is what the user wants to go with all the time the world will tell you or the of the genes that many of the of the use of what knowledge what will show that in a it's not a it's not what kinds of new rule I get the following you don't want to hear so it during the middle of a word is
just the of the goal of this also holds in the old age you know is that we can with the help of the of the of the weight of the wall at of the age of 18 years who was that there was a lot of of of the city and again and you
Punkt
Dualitätssatz
Vorlesung/Konferenz
Sprachsynthese
Speicher <Informatik>
Programmierumgebung
Teilbarkeit
Code
Drucksondierung
Randverteilung
Punkt
Mereologie
Zahlenbereich
Maßerweiterung
Kombinatorische Gruppentheorie
Raum-Zeit
Computeranimation
Multiplikation
Regulärer Graph
Zählen
Inhalt <Mathematik>
Ordnung <Mathematik>
Maßerweiterung
Bildgebendes Verfahren
Serviceorientierte Architektur
Tabelle <Informatik>
Automatische Indexierung
Datentyp
Schlüsselverwaltung
Datenhaltung
Singularität <Mathematik>
Programmverifikation
Mailing-Liste
Arithmetisches Mittel
Mereologie
Dreiecksfreier Graph
Dateiformat
Wort <Informatik>
Ext-Funktor
Tabelle <Informatik>
Netzwerktopologie
Datensatz
Ganze Zahl
Gruppenkeim
Speicher <Informatik>
Mailing-Liste
Computeranimation
Fitnessfunktion
Eins
Homepage
Tabelle <Informatik>
Videospiel
Mailing-Liste
Physikalisches System
p-Block
E-Mail
Zeiger <Informatik>
Computeranimation
Homepage
Netzwerktopologie
Bildschirmmaske
Datensatz
Zahlensystem
Overhead <Kommunikationstechnik>
URL
p-Block
Overhead <Kommunikationstechnik>
Speicher <Informatik>
Zeiger <Informatik>
Versionsverwaltung
Bezeichnungssystem
Schlüsselverwaltung
Tabelle <Informatik>
Subtraktion
Bit
Decodierung
Deltafunktion
Punkt
Ortsoperator
Freeware
Zahlenbereich
Computeranimation
Homepage
Datensatz
Spannweite <Stochastik>
Mailing-Liste
Ganze Zahl
Algorithmus
Speicher <Informatik>
Zeiger <Informatik>
Quellencodierung
Normalvektor
Kraftfahrzeugmechatroniker
Kolmogorov-Komplexität
Mailing-Liste
Ähnlichkeitsgeometrie
Zeiger <Informatik>
Variable
Einfache Genauigkeit
Ganze Zahl
Dateiformat
Decodierung
Versionsverwaltung
Zentralisator
Router
Subtraktion
Bit
Decodierung
Total <Mathematik>
Punkt
Gruppenkeim
PASS <Programm>
Code
Computeranimation
Homepage
Eins
Homepage
Mailing-Liste
Ganze Zahl
Algorithmus
Kompakter Raum
Speicher <Informatik>
E-Mail
Physikalischer Effekt
Dicke
Kolmogorov-Komplexität
Kategorie <Mathematik>
Mailing-Liste
Ausnahmebehandlung
Zeiger <Informatik>
Bitmap-Graphik
Ein-Ausgabe
Ausgleichsrechnung
Gerade
Mapping <Computergraphik>
Hochvakuum
Mereologie
Heegaard-Zerlegung
Decodierung
Mini-Disc
Fitnessfunktion
Mailing-Liste
Decodierung
Deltafunktion
Ganze Zahl
Menge
Vorlesung/Konferenz
Zeiger <Informatik>
Versionsverwaltung
Variable
Computeranimation
Einfache Genauigkeit
Normalvektor
Nebenbedingung
Einfügungsdämpfung
Subtraktion
Decodierung
Große Vereinheitlichung
Punkt
Wasserdampftafel
Minimierung
Versionsverwaltung
Gradient
Kombinatorische Gruppentheorie
Code
Raum-Zeit
Computeranimation
Homepage
Netzwerktopologie
Datensatz
Bildschirmmaske
Informationsmodellierung
Ganze Zahl
Algorithmus
Code
Lineare Regression
Mini-Disc
Kompakter Raum
Zeiger <Informatik>
Bildgebendes Verfahren
Gerade
Tabelle <Informatik>
Physikalischer Effekt
Softwaretest
Zentrische Streckung
Kolmogorov-Komplexität
Kategorie <Mathematik>
Betafunktion
Eindeutigkeit
Disjunktion <Logik>
Mailing-Liste
Quick-Sort
Portscanner
Texteditor
Verbandstheorie
Mereologie
Heegaard-Zerlegung
Dateiformat
Mini-Disc
Normalvektor
Tabelle <Informatik>
Physikalischer Effekt
Mereologie
Decodierung
Kolmogorov-Komplexität
Schlüsselverwaltung
Matching <Graphentheorie>
Minimierung
Speicher <Informatik>
Computeranimation
Übergang
Fundamentalsatz der Algebra
Datensatz
Ganze Zahl
Ganze Zahl
Bildschirmfenster
Kompakter Raum
Wort <Informatik>
Mini-Disc
Normalvektor
Schlüsselverwaltung
Bildgebendes Verfahren
Tabelle <Informatik>
Offene Menge
Subtraktion
Chirurgie <Mathematik>
Open Source
Datenhaltung
Wort <Informatik>
Quellcode
Code
Computeranimation
Übergang
Resultante
Offene Menge
Lineares Funktional
Matching <Graphentheorie>
Wort <Informatik>
Betafunktion
Klasse <Mathematik>
Abfrage
Ruhmasse
Mailing-Liste
Online-Katalog
Quellcode
Arithmetisches Mittel
Metropolitan area network
Client
Bildschirmmaske
Vorlesung/Konferenz
Wort <Informatik>
Gravitationsgesetz
Portscanner
Widerspruchsfreiheit
Schätzwert
Offene Menge
Punkt
Matching <Graphentheorie>
Wort <Informatik>
Mailing-Liste
Computeranimation
Eins
Mapping <Computergraphik>
Portscanner
Datensatz
Mailing-Liste
Vorlesung/Konferenz
Wort <Informatik>
Ordnung <Mathematik>
Ext-Funktor
Data Encryption Standard
Lineares Funktional
Offene Menge
Matching <Graphentheorie>
Wort <Informatik>
Schaltnetz
Klasse <Mathematik>
Schlussregel
Aggregatzustand
Nichtlinearer Operator
Knoten <Statik>
Computeranimation
Physikalisches System
Datensatz
Funktion <Mathematik>
Ein-Ausgabe
Wort <Informatik>
Information
Schlüsselverwaltung
Widerspruchsfreiheit
Offene Menge
Nichtlinearer Operator
Lineares Funktional
Matching <Graphentheorie>
Wort <Informatik>
Datenhaltung
Open Source
Abfrage
Vorlesung/Konferenz
Widerspruchsfreiheit
Computeranimation
Netzwerktopologie
Offene Menge
Mereologie
Wort <Informatik>
Code
Mailing-Liste
Addition
Systemzusammenbruch
Portscanner
Computeranimation
Wiederherstellung <Informatik>
Homepage
Schätzwert
Bit
Punkt
Schlüsselverwaltung
Matching <Graphentheorie>
Zahlenbereich
Speicher <Informatik>
Computeranimation
Eins
Übergang
Mapping <Computergraphik>
Metropolitan area network
Fundamentalsatz der Algebra
Minimum
Maßerweiterung
Schlüsselverwaltung
Portscanner
Offene Menge
Kraftfahrzeugmechatroniker
Loop
Wort <Informatik>
Mathematisierung
Vorlesung/Konferenz
Mailing-Liste
Ordnung <Mathematik>
Mereologie
Stab
Systemzusammenbruch
Oval
Patch <Software>
Information
Systemzusammenbruch
Code
Computeranimation
Wiederherstellung <Informatik>
Ganze Zahl
Bit
Code
Addition
Softwaretest
Automatische Indexierung
Dualitätssatz
Knoten <Statik>
Zeiger <Informatik>
Systemaufruf
Beanspruchung
System F
Funktion <Mathematik>
Ruhmasse
Wiederherstellung <Informatik>
Overhead <Kommunikationstechnik>
Refactoring
Bit
Prozess <Physik>
Punkt
Datenparallelität
Minimierung
Substitution
Computeranimation
Richtung
Netzwerktopologie
Algorithmus
Datenmanagement
Mustersprache
Interpretierer
Kraftfahrzeugmechatroniker
Automatische Indexierung
Singularität <Mathematik>
Gebäude <Mathematik>
Speicher <Informatik>
Bitrate
Dialekt
Rechenschieber
Schlüsselverwaltung
Standardabweichung
Subtraktion
Quader
Mathematisierung
Schaltnetz
Implementierung
Maßerweiterung
Term
Code
Mailing-Liste
Schweizerische Physikalische Gesellschaft
Binärdaten
Datentyp
Speicher <Informatik>
Zeiger <Informatik>
Videospiel
Kolmogorov-Komplexität
Ontologie <Wissensverarbeitung>
Matching <Graphentheorie>
Stochastische Abhängigkeit
Mailing-Liste
Paarvergleich
Chipkarte
Mapping <Computergraphik>
Flächeninhalt
Codierung
Wort <Informatik>
Binäre Relation
Kantenfärbung
Boolesche Algebra
Normalvektor
Automatische Indexierung
Desintegration <Mathematik>
Gruppenkeim
Dualitätssatz
Fokalpunkt
Atomarität <Informatik>
Computeranimation
Wiederherstellung <Informatik>
Gewicht <Mathematik>
Attributierte Grammatik
Overhead <Kommunikationstechnik>
Spider <Programm>
Drei
Ext-Funktor
Magnetbandlaufwerk
Data Encryption Standard
Schlüsselverwaltung
Momentenproblem
Dualitätssatz
Paarvergleich
ROM <Informatik>
Ranking
Computeranimation
Inverser Limes
Portscanner
Metropolitan area network
Gewicht <Mathematik>
Gruppe <Mathematik>
Ordnung <Mathematik>
Datenfluss
Tabelle <Informatik>
Automatische Indexierung
Videospiel
Schlüsselverwaltung
Sechsecknetz
Gebäude <Mathematik>
Dualitätssatz
t-Test
Abfrage
Zahlenbereich
Schlussregel
Patch <Software>
Nichtlinearer Operator
Dateiformat
Gesetz <Physik>
Computeranimation
Data Mining
Lesezeichen <Internet>
Gewicht <Mathematik>
Funktion <Mathematik>
Reelle Zahl
Gruppoid
Grundraum
Automatische Indexierung
Schlüsselverwaltung
Gewicht <Mathematik>
Stichprobe
Algorithmische Programmiersprache
Computeranimation
Sechsecknetz
Desintegration <Mathematik>
Ablöseblase
Patch <Software>
Information
Nichtlinearer Operator
Computeranimation
Metropolitan area network
Gewicht <Mathematik>
Gruppoid
Ordnung <Mathematik>
Gammafunktion
Touchscreen
Tabelle <Informatik>
Automatische Indexierung
Data Encryption Standard
Schlüsselverwaltung
Dualitätssatz
Varianz
Speicher <Informatik>
Dateiformat
Atomarität <Informatik>
Ranking
Inverser Limes
Portscanner
Rechenschieber
Lesezeichen <Internet>
Funktion <Mathematik>
Overhead <Kommunikationstechnik>
Spider <Programm>
p-Block
Lie-Gruppe
Binärcode
Data Encryption Standard
Video Genie
Landau-Theorie
Dualitätssatz
Schlussregel
Inverter <Schaltung>
Atomarität <Informatik>
Ranking
Computeranimation
Inverser Limes
Portscanner
Metropolitan area network
Datennetz
Attributierte Grammatik
Wort <Informatik>
Overhead <Kommunikationstechnik>
Ordnung <Mathematik>
Spider <Programm>
Datenfluss
Gammafunktion
Portscanner
Binärdaten
Metropolitan area network
Mereologie
Gewicht <Mathematik>
Typentheorie
Fächer <Mathematik>
Analogieschluss
Dualitätssatz
Nichtlinearer Operator
Einfügungsdämpfung
Hilfesystem
Computeranimation

Metadaten

Formale Metadaten

Titel GIN in 9.4 and further
Alternativer Titel GIN - Stronger than ever
Serientitel PGCon 2014
Anzahl der Teile 31
Autor Linnakangas, Heikki
Korotkov, Alexander
Bartunov, Oleg
Mitwirkende Crunchy Data Solutions (Support)
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/19101
Herausgeber PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
Erscheinungsjahr 2014
Sprache Englisch
Produktionsort Ottawa, Canada

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract This talk presents set of GIN advances in PostgreSQL 9.4 and further which brings GIN to new level of performance and extendability. Most important advances are: posting lists compression, fast-scan algorithm, storing additional information and index-based ranking. This talk presents set of GIN advances: Compression posting lists. Indexes become 2 times smaller without any work in opclass. pg upgrade is supported, old indexes will be recompressed on the fly. Fast scan algorithm. Fast scan allows GIN to skip parts of large posting trees during index scan. It dramatically improve performance of hstore and json search operators as well as FTS "frequentterm & rareterm" case. In order to use this improvement three-state logic support required in "consistent" opclass method. Storing additional (opclass defined) information in posting lists. Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. Index based ranking. This improvement allows GIN to return results in opclass defined manner. Most important application is returning results in relevance order for FTS which dramatically reduces IO load. But there are other applications like returns arrays in similarity order. We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.
Schlagwörter Alexander Korotkov
Oleg Bartunov

Ähnliche Filme

Loading...