Logo TIB AV-Portal Logo TIB AV-Portal

Explaining the Postgres Query Optimizer

Video in TIB AV-Portal: Explaining the Postgres Query Optimizer

Formal Metadata

Explaining the Postgres Query Optimizer
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
The optimizer is the "brain" of the database, interpreting SQL queries and determining the fastest method of execution. This talk uses the EXPLAIN command to show how the optimizer interprets queries and determines optimal execution. Examples include scan methods, index selection, join types, and how ANALYZE statistics influence their selection. The talk will assist developers and administrators in understanding how Postgres optimally executes their queries and what steps they can take to understand and perhaps improve its behavior.
area Computer animation terms Forest feedback level organization normal optimal Coloured system call frame
server code decision time open subsets part production Kraft internet optimal systems relation server Development interfaces code plan bits Databases applications Computer animation Query Right sort Results
man flow parsing Actions token traction sheaf code plan part rules words Computer animation Query rewrite statements utilizes utilizes pattern statements sort optimal optimal Results
area man focus information systems decision schemes water lines bin DoS scans Types orders specific Computer animation terms orders Indexable diagrams testing sort table systems
constantly relation server real cloud platforms routine limitations catalog scans bits orders Computer animation Query Indexable table systems
Slides relation table sampling Content functions part Terminal orders sample Computer animation ring Indexable website table optimal systems
point man presentation bottom functionality table files distribution bits scans counting bits orders sample Computer animation Query functions Indexable des Right table
area distribution table distribution counting variables fields Types orders sample Computer animation case Indexable Acoustic coupler model table fitness record systems
man distribution singularities images maximal scans counting number Indexable sample Computer animation Query Indexable Right
man Vacuum digital filters statistics ones help scans total sequence Indexable mathematics sample Computer animation real vector case ring table optimal
point man table mapping distribution digital bits counting scans Indexable orders sample Computer animation Void's PEAS case Indexable sort table record
man statistics existential quantification matchings Development time singularities scans applications Indexable odd sample Computer animation Query topology Indexable Right Sum key optimal record
web pages Slides digital filters functionality presentation bottom statistics lines singularities sources Continuation limitations counting rules number orders goodness root case Indexable selection Gamma descriptions optimal thumb man scale files bits lines scans inclusion Indexable Types words Hause kernel sample Computer animation Query PEAS orders des sort table record
man mapping singularities scans effects limitations lines scans counting sequence system call Indexable orders sample Computer animation retrieval Indexable sets des Gamma Kraft
point web pages randomization singularities limitations functions counting MACH orders Indexable sets Authorization FAQs Gamma systems man scans Indexable sample Computer animation case Query des odd Kraft
choice web pages point homology statistics runtime table states time high resolution singularities Clusters combination counting number estimates goodness Correlation memory operations Indexable scratch Perfect number Gamma optimal systems area mobile generate information Managementinformationssystem sampling variance effects plan scans system call Indexable words sample Computer animation case Right table Inductive record
time singularities inner ones limitations scans counting applications orders sample Computer animation hash Indexable sets des diagrams Gamma loop Kraft systems
man statistics table Super mid sampling drop limitations scans number category Indexable orders kernel loop sample Computer animation case Query Indexable classes table loop classes
man digital filters table recognition Super equal images inner sampling Databases scans Hauptspeicher round sample Computer animation hash memory case hash sort table loop record
man matchings table inner effects scans goodness sample uniformity Computer animation case Indexable differential equations key ideal table Results record
relation inner scans number versions position orders mathematics sample configuration Computer animation case Relation Sum key table optimal record
point statistics response inner meetings scans effects words sample uniformity Computer animation hash memory hash loop optimal
man decision singularities inner scans fields tablets Indexable words loops loop sample Computer animation case Indexable sort table loop optimal
point man bottom singularities inner scans degree Indexable words loop sample Computer animation logic case Indexable table loop record
man digital filters statistics table singularities drop limitations scans applications effects orders sample Computer animation Query hash Indexable classes key table loop
domain man digital filters statistics Actions matchings Haare necessities singularities scans number effects words sample Computer animation terms different Indexable Right loop Results
man digital filters table Sequel time counting scans heads limitations number sample Computer animation hash causal Query case hash orders optimal report optimal record
man maximal limitations scans limitations Arm Sans effects MACH Indexable loop Computer animation hash Query hash Indexable table Jump record
filters functionality statistics presentation Actions code decision time ones maximal limitations part Sans versions estimates mathematics Average hash model optimal man Super information mapping applications category message-based words structured data loop processes Computer animation case Query website sort procedure table Results record
the I think the the the the the end of you OK so us talks a lot of the day and of the conference for exciting and it looks like that of the text cut off their I'm not sure what's causing method as well this problem yes no it's normal and she knew how that's going to get really ugly 11 frame I don't have anything really the model so that the final I'll know what's making us forest start something about going someplace so that you may have to be included in the way way of the the size of the area and this is what we were going let you know that the parents and on the last year and don't know today by being the a great great thank you so high everyone increase mongin of workfare the medium or the quality members who were composed since 1996 I mistakenly giving this talk and I find that I know about any of you but this is the terms of quality of talks this call this PU caused in the best PG color like remember out pretty much all of our low level or maybe the left were by yourself I don't know the quality of their hands in previous years we have been working with 1 talk just like not yet my on this 1 and 2 would virtually every away with very high quality very pointed exactly what I was interested in amateur I put some of the people they sort of focusing the white so the interesting for organizers to see we get that feedback so please do give feedback to some things we can do to prove I know were all interested in finding out so
today although I wanna talk about is effectively the PostgreSQL optimizer are this is a very interesting topic to me in fact this is the reason I got involved post-stressed because in 1996 I had been writing of financial applications for 7 years and you know if you write French interapplication about 7 years you're really not learning a whole lot anymore and as a professional I felt like I wanted to learn more I and understand more about what I was doing and because I was writing database applications occasionally I would get to look inside of the database to see how is a query that I sent to the server actually executed what how is it making its decisions and and and how is it returning those results as efficiently as possible and at that time I was using Informix and of our interest and I was not able to see really inside of how they made their decisions I could kind of see a little bit of the plan that came out if you're familiar with those products are the products that generate a plan that by could actually look at the code understand the process of was using to make these decisions and 1 of the real attractions the proposed stressed and 1 I don't bother sort starting in Internet development 1996 because of that openness of could provide production the standard non that can actually be talking about the internal code of what goes on in PostgreSQL too much today but what I am going to do this
kind of walking through basically how post resonates those decisions that give you a little more appreciation of what is going on inside that server on a lot of people are sorted keen on you know non-relational systems these days and and sort of looking at things that don't have an optimized everything 1 of the things you're going to get out is understanding how much of that optimizes helps you as an application developer not have to worry about a lot of the intricacies that but you normally would have had work deal with before relational systems came around and so on simply when you're writing an application as you would be here but usually you have some kind of interface layer that since the query across to the server and then sends the result back to the interface layer and then up into the application were basically be focusing on what happens inside that server that part to the right
so internally upper stress effectively has sort of any internal flow that goes through to figure out how to execute a query
and in fact the most important part of that love is what we call the optimizer of the planet no that's not the 1st section that actually occurs when it's query comes in the 1st thing that
happens right again coming in from the top here the 1st thing
that happens is the statements parts it's broken down into words and then the words are then treated as tokens of lexemes which are then passed to something called Alexa which effectively generates actions based on the pattern of tokens are key words that come in on and they have come to the to the poorest of there's been a traffic cop impose stress which figures out of its utility commander not it's not going to command it goes into something called rewriter which is where we have things like use and of the rules of the down after the rewriter is that optimizes on effectively what it's doing it's trying to figure out how can I get that uses the results as quickly as possible and what comes out of an executable optimizes something called a plane the plant effectively since the executed and that's the part that actually executes the query so this is sort of a matchmaker the 1 who lays out a plan of what we're going to do and that's effectively passed into the executer which is really just a something that that that that that follows whatever plane came out the optimized so it's really just executing a recipe of that's come out so we're going to
focus on that optimizing when I think of as the brains of the system whether things that
optimizing have to to do what things that have to decide there's 3 major areas that has to to make decisions on line of what is this scheme method it's going to be used to get at the data and wouldn't go into specifics about what scan methods are available in PostScript secondly what during methods are 1 eastward culturally specific examples with diagrams and show you exactly what you we method to their and finally joining order that sort of thing you have to a joint Table 8 could be in C or a joint he will be in C and and a you know saying different waters of bringing things together and when she's specific examples you know you might not be using posed as you might be using posters plus some other relational systems these kinds of very generic so in most tests most relational systems you will see the same concepts the terms might be slightly different there might be a couple other things that on there but this should be distribute these are very generic relational concepts so the 1st thing that skin methods the others basically 3 types of skin that is supported on imposed and again deciding which 1 we use is something of a show you in in a couple minutes of the 1st 1 is called but but but but let's let's
let's 1st before we do that let's set it up so 1st all we have
to do is we have to create a little like a temporary table so I can illustrate all the things of talk about so effectively what I've done here is just 1 little query that goes into the system catalogs and just pulls out some of the relations names in the system on and and then what I did not actually took the relations I chopped off just the 1st letter just chop of everything yes so the 1st letters got rid of that and then I created temporary
table with this with this contents so effectively what I did was I took the the the the system Table then I chopped I just took the 1st letter of the relation name and then I put a bunch of junk at the end and I created a table called sample output in excellent that's all I did affect somebody uses example for the next probably 10 slides in this competitive idea of how this is gonna work now you wanna run this yourself you the part pink here is actually this is actually the SQ also if you wanna download as you can run it and you can actually see it running on your terminal and you can actually see the exact same output and a bunch of slides on my website so
I forgot to mention it and the reason I did mention that I realize it was chopped off at the bottom of but
effectively if you go to this website right
here and get rid of the press the point there this this right here but if you go on presentation of prob this presentation is there and the story of the presentations also during pdfs and you know the body was chopped off so you if you miss the bottom there you cannot you can definitely check that out on that website so that this is a temporary table there the the
function that I used a little bit that display things a little better again this is also in here so far all so let's see what what is this file look like that created temporary file and the middle query I said just tell me what this looks like this temporary tables
so effectively I have I I think there is about 2 100
and 50 3 rows in this table I can tell you when I created that table of
199 of them having the just the letter P and and then another field which just want to jump on I have 9 of them which is an S and then the which of the city and and then and here I have 1 which is an ally I 1 rows OK and the reason I created this is because it gives me a nice distribution of a variable kind of case where that 1 letter which is really common in the system couplers usually come in there and that 2 others which you would want to live at a very rare and this is the type of case that you have in relational systems is 1 of the areas that the money does very well is it kind of can adjust things based on how selective your data it's our model and how that happens in a minute OK so 1 of mean by selective OK so if
I just do this query right here as they give me up he OK I was actually in the doing is something called an index which like OK so what use at the very best number of my most common values right so what doing the which happens to be of kind of in the middle here and therefore and that's just an index and what do we head or that's
also that's my value and doing the same thing I'm doing a gun index in for all 3 values for my most popular value Emily's value for all returning the same thing why is that well the
reason is because only statistics on the table up with statistics all are all out with special the legal optimizes statistics and optimize the statistics tell the optimizer how common certain letters or a certain values or with table so normally there's something called auto and auto vacuum which 1 and analyzed for you but that ones like every minute so might not wanna wait for the military to run and in this case using temporary tables which that contain the same so that doesn't help me so I gotta wonder allies myself I just when I ran and analyzed with the park and says I wanna analyze you see a change right away that he would use to be index is now sequential scan OK as
sequential is a vector respectively saying started to be in a table read all already to the in the table very straightforward very there's you know phrase is the is is modernist you know why can't use it in and it a sequential him back because we showed
up here it's 78 per cent our roads were peas so you know
if some digits anywhere the peas it doesn't make any sense to do all sorts of index look up to somehow eliminate 20 % euros doesn't make sense so we're just gonna read straight through the table that's the way to go thank what happens when you would be
the i've for those OK did as it can be something different gives what's called a bit-mapped sheepskin sounds Connolly here but what effectively is doing is it's going through the index and creating a bit-mapped that tells me where can potentially find the matching rows in the heat and I'm going to put a 1 every place I can't and is 0 any place I don't I'm not gonna point so effectively what I do is instead of doing for accesses of good the index was going to give me make a little bit map and then we walk through the bit that now is probably overkill for this case but again because there were 4 it decided CreateBitmap and then use that kind of aggregated all the lookups into the heat but if I do
that can a which would have only 1 of I actually get in index came
straight index scan where there were no bit-mapped could use money from that it knows there's only 1 of these this statistic to telling it there's only 1 match to go through the trees find a row where so this is the 1st example showing how the optimizer statistics are effectively changing how we go at our data right and again as an application developer you don't have to worry about all of my querying something's very common mind not and this happens all the time if you know if i of the farm and you know in Ottawa and query all my customers and although 1 may probably the below customers and so prominent to sequential scan of all my customers but if I have an index on this on the the problems and I tried you know like try inquiry somebody in of Manitoba which actually knows the province town amazingly from my education years ago my right but it is 1 of the best I don't hear about very much a much reward but but if I want to see how it has resigned Manitoba and then index on provenance odds on the news index because problem of the couple is because that the people who live in any way for them to be somebody by the sum of the supported anyway but somebody can probably give me that history was after told what is there Of because I'm always present like Canada but then went on knowledge about things really really great country
so let's take a look at a better description of this OK so what I did rather just did some random letters registered 3 letters come you so what I did was I created a little kind query here and I'm actually using a little function that I used and getting this out
but if I can be queried a little differently get all these queries are fire I get In
this to me is the required the the quintessential slides for presentations whose was illustrated is that as I Aichi as as I go from the most common values at the top of the least common values at the bottom of the optimizer is automatically and changing the type of access method he uses to get the data automatically with me not having to do anything at all to the query itself because and this is all driven by the analyzed statistics that I created early on are there any questions so far in the search so the question is why is the sequential scan actually doing it not only for 199 all way down to the which is number 7 and then that is a very common question because most people think that the index should be used if it's selective source so basically say well you know I understand of 70 per cent of my Rosa peas and probably don't want you which you know if it's a 3rd person I betting that sounds pretty good but then when you realize that the index has a Root page and then different like some pages to get to the thing and then you realize that you know that you have to randomly go through you after randomly hit the he pages now which use of bit you can kind of doing in order between the hopping between the bottom line is that random I always really slow at sequential Ireland's really fast so that the the the sort of dirty secret is that like that index always has to be like under 5 per cent on selected before it gets starts to be used and that's effectively design because of the way that the cost to set up a post breast and and look at the type of the way we know we have a new mobile reads to get through that index to get to the right words we do sequential we know we're just like not only were reading like straight through but the at the kernel is prefetching ahead of us that no you were going sequentially so source 5 % a good rule thumb at any scale in that really I mean the number actually goes down at the table gets bigger but usually 5 7 per cent is about the number or a seat now that's kind of breaks pretty close because this the 5 I said there were 250 rows of the table so to a 53 5 is what that's 2 per cent yeah so that's actually breaking I would say much lower than that of the 7 is probably around 4 per cent to 3 hand something like that so again I think that the numbers breaking a little even lower than 5 per cent or maybe even generous and and it continues to surprise me how low that number really is now if you think I'm making that up which you could OK I well actually act as a way I'm doing is fake it but I wanna show you something what if I
tell it what it forces to do an index k and I said I don't believe you optimize the angular trial sequential scan with that top line of monitoring that maps the next like an and 1 of Boston and in effect on retrieval of bit-mapped skin and sequential scan and another 1 that again and what
you say is that when you look at the call us the cost down here was around 80 which is the same the index before anyway but then you get out of 12 accused 15 against the nite ends up when get 13 on
the original output which is this once again starts at 8 goes to 11 it really doesn't go anywhere because of authority now if you think that much doing something wrong you all of you
using these commands but the top and you can rerun your query in climate and the was is all we want or not in almost every case when somebody actually we have it I think you would have been in talks about why is my index not being used this is a very common question I actually showed you these commands often it's as 1 with that 1 without it tells what you get and the questions magically disappear it's amazing we put something FAQ and within 6 months but we don't get the questions what is is the system political turned but but overall the costing is pretty accurate and we don't seem to have too much it's almost counter-intuitive how low that great point is on and you can adjust it if you have this is the you can adjust the random page cost so long some things you might need to adjust if if if these numbers into if the doesn't affect all on but in most cases it gets pretty close and so question over here Mr. yeah you know that's what I was I was
getting the matching cheating and achieving is that these numbers are actually the costs the the optimizer is giving us these are not actual execution on so I could have expanded this and kind of done before execution time on this I I didn't do that of course is the numbers of float around and they also partly because you start caching effects when you have multiple because when you run once it's not caching away again is in cash so the numbers tendency for that it's covered get really accurate numbers in that the set so do we find that the percentage increase in with much wider in fact there is on which you can see over here I mean that you need to change my resolution and what resolution is this we don't know what kind of anyway what what's over here but it's actually with estimate so the optimizer knows how and why the railways and estimate of how why the railways and it will actually just based on that so a shorter well when it has to be looked up differently what questions to search for men so this the optimizer take into account clustered index desert dust off when you issue the clusters can and then the next analyze operation will actually sample were cluster ratio forward table and because the cluster is not mean clean all you have to re cluster at the clustering effectively every time analyzes is wrong there is a special index that is more as the clustered index in the system tables and it will actually randomly sample and all of the that amount of clustering ratio in this in the statistics and that will be used to determine how will how localized a lookup is particularly if you're looking up like a P or you were you you think is it spread in 5 different places on the table as it may be all on 1 page of so in fact that not only it's not only do recorded we actually keep it updated and the ratio will decrease as the the clustering gets dispersed within the state other questions OK but I'm just going to be the Beatles song that choice of yeah this is the 2nd he's saying it's kind of contrived and I will admit yes that's that's true that would normally I would have taken a huge datasets and 1 includes but then the problem is the numbers are so huge that you can't really conceptually against the yeah but it's already memory so close and yet so many the size of your so if you go in and out and they would be all the all values the so the statistics have combinations evaluated is reviewable column indexes it does that look at the combo that unfortunately well fortunately actually indexes every call so whether the composition and induction not in index it's good index of all there's a lot of times you're going to say a equals 1 B equals 12 and that the elevated not being index would be very important and and we need to know how selected that is to determine how many rows of come out which is actually a portable later for we talk about how we joined it so we index all the we don't index and becomes specially 1 of the big flaws in the PostgreSQL optimizer is the lack of information about correlations between mobile call we've talked about that is 1 you is that something that we have not really understood how to efficiently generate such correlation and use of that was 1 of the 1 of the floors I think of years we have we have to prove that is the correlation between columns sometimes you say this is 1 this is 12 that may be a very rare case and we're looking at each 1 individually which which is not so the question is do we get limited better from all the common you still get a big benefit from them it's just that you don't the statistics are perhaps not as smart but you're still going to be able look up and filter across more columns in fact if you look all index of a B and C then you you give us a and C will still use the index on a and C in some trickery that we have to kind of spin Jackson online and going to but I think will the column indexes certain variance it not as useful for planning with which it collided with it probably would probably still well and the other thing that that you get into in this case is that none of these numbers of perfect all orthology numbers and if I could had the perfect number that right now as as far as you know we all so the point is that these these estimates they don't have to be perfect this is good enough to pick a plan that's probably closest to the best player on so that's probably good enough all trying to provide numbers would they be efficiently would be probably very good that's correct more indexes you have more possible combinations has before you write or note that the little quality the right the the you right what would you do you read in the in the area that you see in the in
in the time like this is shown combined together so that you can combine together with the general which what you want to be on the word and the you will work with you know
it itself but the time of the year so you know we're all of your brain but that's not going to stop discussion there are because we do want we you wanna finish the of so the 2nd thing that we want to talk about is what we call on again this is fairly common for most of the of most of the relational system several she for different join methods of 3 and a half different doing that you can use all of the 1st ones nested-loop would to take that 1st segments have going through ones merge what I wanna show you examples of that and diagrams of each of those OK
so we have to create another sample table this upholsterers from PG property happen to be numbers Altamira create
a sample 1 which pulls rose from PG Procter and then we create another table called simple kernel which pull some all all-white he's from PG class again nothing super exciting you can you know you know this is a matter still fall I'm assuming understand that with no indexes and no statistics because some federal absolutely the worst case that we can right so if I do this
query with a where clause that comparing sample 1 constant and an example 1 simple to I get what's called a nested loop with a sequential scan and sequential scan it looks
just like this it's as bad as it looks it means compare every round every other row and the only reason this could be a win is if you have a very small tables and there's no set up quot OK so very small joints potentially would use this because it's super fast right it looks like that
what if I join what of i sample be greater than some number of equal or greater I get what's called a hash joint OK and hash
join effectively says take 1 side and passion and then you walk up from the other what we table into back this is super super popular it's sort of the vanilla bean ice cream of all databases we love doing this case know a road kind of joint as long as 1 of the tables fits in memory you that kind it looks like that but what
I find I joined with no restriction at all then I get something called merge joins and have to store it in fact in this case is would emerge when looks like it says take the 1st row of the 1st table converted the 1st 2nd then actually go to the keep the 1st row and 1st it the 2nd of doesn't match then you can move down and to bring the 2nd row to the 2nd the 2nd the 3rd row that matches the going second-order for role that doesn't match good out the 3rd row the 1st table you got a match what you're effectively doing is you're walking the 2 tables simultaneously OK so you kind of advancing the wanted woman that is notable current moving down 1 scan through each side of the table this is great for large tables because it does not fit in memory and you're not multiple scanning it as you would have you know in in a in a nested effect on so this is really really useful if again you see it from big result that looks like
that now what if I
change the workers before I did a simple 1 simple to number simple to a simple 1 doesn't matter does not matter in most cases the most the small relation most stricter relation is on the outer side and then I'll just run your version 1 of
the same way now statistics OK and would expect to see a change because now the optimizer has some or what's going on there some clue on how many rows are there has be all kind of knew how big the table was didn't go home he wrote that was select the sum of the costs so so for
example here now this 1 used to be a merge join this joined when with nowhere causes was emergent now actually because it realizes you know these cables and the big it's going to fit in memory that statistics to prove that I'm going to have an effect and
also if I do that we even if I do an outer join which you think could use a hash joint we can still do something called a hash left join which is kind of Europe but not lot databases supported we do new impose response on whatever cost you when you probably never seen that before all quotes joining is basis and joint every word every other relevant what possible optimizer join method this use it's gotta be the nested because that's the 1 the comparative and everything else there's nothing else that could do across point and so you can even see when you do something that you would think would use across when in fact does a cost now let's where
some on tablets create an index on both field so now we created an index from both tables of sort of what people with with tables and then we're going to get some different decisions from the optimizer that we didn't get before what also
here we have all the case were comparing it to a concept of as what now happens is we thought of a nested loops but the nested loop is much faster because we're now able to look up the 33 with the index so instead of having to compare every word every other now I can zoom writing to each table isn't just give the 33 out there about the other and it looks
exactly like this so it basically uses the index to find all the 33 is here and then joins it using an index lookup for 33 is on the other and so again nest logical C index of nested loop there was well government that's there no is necessarily because you're not joining every word every other row to just every with matches the indexed and the other with matched in the and again this is a double index
can you can cases we have a sequential scan here and index Q here here the point is you want index the and 1 of the inner side of the bottom it would it from the top or not it's better to have it your have to have all again you don't have an index at the bottom of the nudity skin every table every well defined imagine that that slope degree for large tables on OK but that's about
so our what about a query restriction of this is gets a little more in the statistics you remember way back when we started with the 5 30 minutes ago I put some junk in the column you remember that and I put like 200 axes I just the original table design idea back to the back to this 1 great where
created here on this
back here and so this is 250 axis here and in the 2nd column 2050 axis here so
far only in and I say
OK tell me how
many AIDS given me I join and this has to be all the nodes remember put x is in C and optimized to actually know that there's no is there but you it it kind of takes my word for word and says OK would to assume those with 1 a matching here and if you 1 a matching the statistics domains 0 but also 1 OK so you even know those for a non-indexed columns how selected that and those affected what it ends up doing in terms of joint action terms indexed if I do the same thing by a 4 x is that's completely different right number necessary users wouldn't know that you're not getting the results will be small but when
I'm doing this with the axes then it says OK under review a hash joint and I'm working on a report like heads is going to be in order to get some rows here and you can see the row counts are pretty significant OK so again it's it knows all lot of stop and just you know
of unless they don't want cover all 4 key questions is the idea of a linear clause of many of you probably seen limit cause originally came from my sequel very common now but it effectively allows you to tell the optimizer how many rows you want now before the limit clause people would use although they would effectively use cursors if you've ever use curses before but the reason we don't recommend curses is because curses effectively on effectively held query to but they don't really tell the optimizer how many rows in the future US war but when you use a limit cause you're telling the optimizer of you know I want you to run a query and I really and telling you at the time you get the query not and later on academic Edward Island what X number of so here 1 and the same query again nowhere across hash join you know knows it's gonna return a lot rose 260 in this case however what placing
limit what I say in what that query but not because I only want 1 row from that all of a sudden got the magic can use an index that exists on the table and put things up instead of a
hash joint now I'm doing a nested loop what is pretend it's still a nested loop it's still thinks it's faster to do a nested loop with an index can pull those rows based on order by OK so some of what I say 100 of 100 of OK now at now we're back to where we were before so saying 1 was a
nested loop 10 was a nested loop of 100 OK we're back to have and I think we're trying to do so we never again another example of how that optimizes sort of using those small to adjust what it's doing to figure out basically how to get all of the answers you as quickly as possible OK
so that does finish my talk I have plenty of time for questions of fact 4 minutes of the turn the lights on I wish there was a lack of at the American use on this but it's it's kind of scary but but I I know I know I know I prefer that is walking around 4 from the mapping in all my kids get out of that school on average at the beginning questions people have just set part of the of C. we have a new review questions so OK so the question is and I'm sure the people about the value of is that he's had trouble with that of explaining or getting up and optimizers to properly processed petition tables we got here and you got children now are there have been some improvements in recent versions of PostgreSQL because in the the the problem is that you can often or querying just the parent table the purity of above the me statistics on where is that the actually shoulder the ones that have data and have the statistics so we made some improvements in that in trying to figure out like how selected it is but I would say we're still not all the way there are many things are still need for improvement there so if you're having trouble I I decision it has to to do with the way that we the way this statistic know what version oppose this was OK so you're all you have everything we've released out again I'm words and improvements for proficient in from so that we can go over the course of the year that it you know what you would want you all you have explained is going out it shows you all the filtering of the year 1 year of the what of actually the query you see up here that little function I didn't actually does explain enough function and the filters the rows coming out so you might want like create a custom functioning you're like site where you actually 1 explain in a function and again you can grab the code and you can filter through Mr. that yes I sure right so we have a long discussion over the value of him in some vendors support them some vendors don't some vendors have supported them and then don't wanna supportive anymore but because they they they they can be used very efficiently in in cases where the optimizer just doesn't understand enough about the data model but also a cases where it can be very misused and actually could toward the thing so we've always had a debate about that we we do have some ability to more columns in the table based on the selective and so normally when you're looking at query that's not running right the problem is that the query putting him in the query is not the problem the problem is really that the optimizer doesn't have property data and statistics so we're trying to move to a model where we basically or you are giving back detailed information the analyst can pick up on the specific columns versus playing them out all 3 at the application and ?is that we're still in the process of working through that and work were were really hesitant to add hints in a way that would be prepped misuse before we get that place because actually a problem for 1 more question somebody was there to get yeah so when you can you tell from explain which users citizen that's a great question I don't think you can and that's actually an interesting concept that you would perhaps give a message to say hey I don't have any of the world you know the kind of people yeah but that's the that's not that's kind of the of medical question of topic thank you so how do you store procedural ones of function so problem present change what's going on so effectively when you're sort procedure and you want a clear what are you worried about the statistics of the rows returned from the function yeah yeah so all the procedure itself when it's running queries the full optimizer is is in is in place so every query that's in that's for procedure is getting fully optimized exactly with this process is that the stored procedure returns a result there were very poor statistics of what is actually returned from that function it's given to us as an optimized when you create the function you can actually specify an estimate of how many rows of a come out of it and that action can be useful for the optimizer If you are having trouble with a query that has a function that is being optimized properly I think there's still a lot more we can do that that's what we have now so I want thank you very much I am and for which we have time I appreciate