Explaining the Postgres Query Optimizer
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 31 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/19076 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsort | Ottawa, Canada |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
PGCon 20146 / 31
1
9
13
14
18
19
20
21
22
26
27
28
29
30
00:00
Globale OptimierungNormalvektorSystemaufrufSelbst organisierendes SystemRahmenproblemWald <Graphentheorie>RückkopplungGraphfärbungFlächeninhaltMAPTermVorzeichen <Mathematik>Klasse <Mathematik>E-MailTabelleUnternehmensarchitekturSpeicherabzugFreewareBitVirtualisierungWellenpaketPlastikkarteXML
02:35
CodeServerSyntaktische AnalyseBefehl <Informatik>Metropolitan area networkGlobale OptimierungSystemprogrammServerAbfrageMusterspracheKartesische KoordinatenToken-RingOffene MengeParserQuick-SortSichtenkonzeptResultanteInternetworkingInterface <Schaltung>SoftwareentwicklerCodeAutomatische HandlungsplanungBefehl <Informatik>Prozess <Informatik>Wort <Informatik>KraftGruppenoperationGlobale OptimierungProdukt <Mathematik>Physikalisches SystemSystemprogrammMereologieDatenflussGarbentheorieReelle ZahlMAPDatenbankBitTermersetzungssystemSchlussregelRelativitätstheorieDruckspannungEntscheidungstheorieRechter WinkelMultiplikationsoperatorComputeranimation
06:46
BinärdatenIntegriertes InformationssystemMetropolitan area networkZustandsdichtePortscannerOrdnung <Mathematik>Automatische IndexierungBitKoroutineKonstanteServerCloud ComputingReelle ZahlInverser LimesTabelleEinfacher RingStichprobeGlobale OptimierungPhysikalisches SystemFokalpunktFlächeninhaltRelativitätstheorieVollständiger VerbandOnline-KatalogZahlenbereichDifferenteFunktion <Mathematik>TabelleTypentheoriePortscannerQuick-SortStichprobenumfangRadikal <Mathematik>RechenschieberWeb SiteDiagrammTermMinimumGlobale OptimierungAbfrageUmwandlungsenthalpieOrdnung <Mathematik>EntscheidungstheorieMereologieSoftwaretestWasserdampftafelInhalt <Mathematik>GeradeNummernsystemComputeranimation
09:32
PortscannerAutomatische IndexierungBitStichprobeTabelleOrdnung <Mathematik>Metropolitan area networkFunktion <Mathematik>DistributionenraumZählenData Encryption StandardAusgleichsrechnungSingularität <Mathematik>Lokales MinimumSpezialrechnerDigitalfilterTotal <Mathematik>Einfacher RingCASE <Informatik>TabelleVakuumCoxeter-GruppeTypentheorieFunktionalAutomatische IndexierungGlobale OptimierungElektronische PublikationDistributionenraumVariableStatistikDatenfeldFlächeninhaltMathematikAbfrageRechter WinkelDatensatzMereologieBitPortscannerPhysikalisches SystemZahlenbereichEndliche ModelltheorieMinimumHilfesystemAkustikkopplerEinsPunktComputeranimation
12:57
PortscannerZählenOrdnung <Mathematik>TabelleDistributionenraumMetropolitan area networkTabelleDatensatzFolge <Mathematik>VektorraumAdressierungQuick-SortAutomatische IndexierungDigitalisierungUltraviolett-PhotoelektronenspektroskopieComputeranimationDiagramm
13:40
PortscannerAutomatische IndexierungStichprobeTabelleOvalSingularität <Mathematik>SchlüsselverwaltungMetropolitan area networkElektronische PublikationDigitalfilterZählenGammafunktionData Encryption StandardOrdnung <Mathematik>Inverser LimesGeradeHausdorff-RaumCASE <Informatik>Inklusion <Mathematik>KraftSchnittmengeBrennen <Datenverarbeitung>Machsches PrinzipGlobale OptimierungBrennen <Datenverarbeitung>Automatische IndexierungGüte der AnpassungCASE <Informatik>Deskriptive StatistikWeb-SeiteCheat <Computerspiel>AbfragePunktPortscannerMultiplikationsoperatorSchätzfunktionTopologieDatensatzPhysikalisches SystemKartesische KoordinatenKontrollstrukturSoftwareentwicklerTabelleStatistikBitmap-GraphikGeradeFolge <Mathematik>MatchingSpeicherverwaltungZahlenbereichThumbnailLaufzeitfehlerKernel <Informatik>Wurzel <Mathematik>SchlussregelTrennschärfe <Statistik>Quick-SortTypentheorieDifferenteCoxeter-GruppeFunktion <Mathematik>RechenschieberFunktionalWort <Informatik>MAPAutorisierungMinimumFAQRandomisierungSoundverarbeitungSystemaufrufBitMapping <Computergraphik>Information RetrievalOrdnung <Mathematik>SummierbarkeitQuellcodeAdressierungRechter WinkelZentrische StreckungExistenzaussageComputeranimation
22:26
PortscannerStichprobeAutomatische IndexierungSingularität <Mathematik>GammafunktionTabelleManagementinformationssystemInverser LimesKraftZählenData Encryption StandardOrdnung <Mathematik>SchnittmengeKartesische KoordinatenLoopInnerer PunktHash-AlgorithmusMetropolitan area networkTropfenKlasse <Mathematik>MIDI <Musikelektronik>DigitalfilterSchriftzeichenerkennungHauptspeicherTabelleSoundverarbeitungDatensatzZahlenbereichAuflösung <Mathematik>Kernel <Informatik>InformationVererbungshierarchieAutomatische IndexierungKlasse <Mathematik>KorrelationsfunktionMobiles InternetWort <Informatik>StatistikSystemaufrufUnrundheitGenerator <Informatik>MultiplikationsoperatorStichprobenumfangLoopHalbleiterspeicherSchaltnetzDiagrammKategorie <Mathematik>Statistische SchlussweiseRechter WinkelGüte der AnpassungDämpfungFlächeninhaltHash-AlgorithmusVollkommene ZahlCluster <Rechnernetz>Physikalisches SystemEinsWeb-SeiteCASE <Informatik>PunktAggregatzustandAuswahlaxiomSchätzfunktionAutomatische HandlungsplanungHomologieGlobale OptimierungDifferenzkernAbfrageVarianzNichtlinearer OperatorDifferenteVollständiger VerbandCachingNebenbedingungTeilbarkeitBitmap-GraphikMusterspracheStochastische AbhängigkeitMultiplikationEindeutigkeitElektronische PublikationMailing-ListeMereologiePortscannerUltraviolett-PhotoelektronenspektroskopieSchnittmengeComputeranimation
31:12
Metropolitan area networkSpezialrechnerTabelleSchlüsselverwaltungStichprobePortscannerUniformer RaumAutomatische IndexierungInnerer PunktIdeal <Mathematik>Relation <Informatik>Ordnung <Mathematik>OrtsoperatorKonfiguration <Informatik>Hash-AlgorithmusSoundverarbeitungLoopSingularität <Mathematik>DigitalfilterInverser LimesTropfenKlasse <Mathematik>Haar-MaßGlobale OptimierungARM <Computerarchitektur>Machsches PrinzipWechselsprungLokales MinimumSpeicherbereichsnetzwerkAutomatische IndexierungDatensatzWort <Informatik>MatchingVollständiger VerbandDomain <Netzwerk>DifferentialgleichungTabelleAbfrageMinimalgradCASE <Informatik>Exogene VariableHalbleiterspeicherLoopHash-AlgorithmusSummierbarkeitInverser LimesGlobale OptimierungMathematikOrdnung <Mathematik>Quick-SortDatenbankMinimumVersionsverwaltungStatistikKartesische KoordinatenZahlenbereichVererbungshierarchiePunktFigurierte ZahlSoundverarbeitungFortsetzung <Mathematik>Mathematische LogikProgrammschleifePhysikalischer EffektMultiplikationsoperatorZählenEntscheidungstheorieResultanteVerkehrsinformationRelativitätstheorieSchreib-Lese-KopfGüte der AnpassungDatenfeldModallogikTermGruppenoperationTablet PCDifferenteRechter WinkelFlächeninhaltStichprobenumfangKonstantePlastikkarteCursorPortscannerComputeranimation
39:58
Lokales MinimumDatenmodellPartitionsfunktionGlobale OptimierungFunktionalResultanteStatistikAbfrageTabelleKartesische KoordinatenVersionsverwaltungCodeProzess <Informatik>DatensatzAlgorithmische ProgrammierspracheEndliche ModelltheorieNichtlineares ZuordnungsproblemTrennschärfe <Statistik>InformationCASE <Informatik>HIP <Kommunikationsprotokoll>MultiplikationsoperatorVererbungshierarchieEinsMessage-PassingWeb SiteSchätzfunktionKategorie <Mathematik>Mapping <Computergraphik>MittelwertMereologieFilter <Stochastik>Wort <Informatik>EntscheidungstheorieCoxeter-GruppeMathematikQuick-SortGruppenoperation
Transkript: Englisch(automatisch erzeugt)
00:22
Okay, the last talk slot of the day and of the conference, very exciting. It looks like I have a little bit of text cut off there. I'm not sure what's causing that. Did other people have this problem?
00:40
Yes? It's normal. That's going to get really ugly, I'm afraid. I don't think I have anything really at the bottom, so hopefully it'll be fine. Noah wants to make an announcement before we start, something about a drawing of some type, so. So many of you may have had an opportunity to look at the Enterprise DB table at your registration
01:03
and notice the sign in the jar to enter yourself for a free virtual training class to be announced here. So a couple of you put your business cards in and I'll go ahead and pick one out.
01:20
And if you're not here right now, you'll get an email. The winner is Faisal Akbar of VMware. So Faisal is here anywhere? Okay, well, he'll find out another way. Thank you. Great, great. Thank you. So hi, everyone. My name is Bruce Momgen.
01:41
I work for Enterprise DB. I'm one of the core team members. I've been working on Postgres since 1996. I'm excited to be giving this talk. And frankly, I don't know about any of you, but this is, in terms of quality of talks, this PGCon has been the best PGCon that I can remember pretty much, I don't know, forever,
02:03
maybe for the last four or five years. So I don't know, the quality, there have been some previous years where there's been, like there was one talk that just like knocked me out of my seat. This one, it was like virtually every talk I went to was very high quality.
02:20
Very pointed at exactly what I was interested in. I'm not sure, I talked to some other people, they sort of felt the same way. So it'd be interesting for the organizers to see if we get that feedback. So please do give us feedback. There's some things that we can do to improve. I know we're all interested in finding that out. So today, what I want to talk about is effectively the Postgres query optimizer.
02:43
This is a very interesting topic to me. In fact, this is the reason I got involved with Postgres, because in 1996, I had been writing financial applications for seven years. And you know, after you write financial applications for about seven years, you're really not learning a whole lot anymore.
03:01
And as a professional, I felt like I wanted to learn more, I wanted to understand more about what I was doing. And because I was writing database applications, occasionally I would get to look inside of the database to see how was a query that I sent to the server actually executed. How was it making its decisions,
03:21
and how was it returning those results as efficiently as possible. And at that time, I was using Informix and Ingress, and I was not able to see really inside of how they made those decisions. I could kind of see a little bit of the plan that came out,
03:40
if you're familiar with those products or other products that generate a plan. But I couldn't actually look at the code or understand the process it was using to make those decisions. And one of the real attractions of Postgres, and one of the reasons I got involved in sort of starting it in Internet development in 1996, was because of that openness of code, because I could actually understand it.
04:01
Now, I'm not going to actually be talking about the internal code of what goes on in Postgres too much today, but what I am going to do is kind of walk you through basically how Postgres makes those decisions and give you a little more appreciation of what is going on inside that server. I know a lot of people are sort of keen on, you know,
04:24
non-relational systems these days and sort of looking at things that don't have an optimizer, but I think one of the things you're going to get out is understanding how much of that optimizer helps you as an application developer not have to worry about a lot of the intricacies that you normally would have had to deal with
04:43
before relational systems came around. So, you know, simply when you're writing an application, as you would be here, usually you have some kind of interface layer that sends the query across to the server and then sends the result back to the interface layer
05:00
and then up into the application. We're going to basically be focusing on what happens inside that server, that part to the right. So internally, Postgres effectively has sort of an internal flow that it goes through to figure out how to execute a query, and in fact the most important part of that flow
05:22
is what we call the optimizer or the planner. Now that's not the first section that actually occurs. When a query comes in, the first thing that happens, again, coming in from the top here, the first thing that happens is the statement's parsed. It's broken down into words, and then the words are then treated as tokens or lexemes,
05:42
which are then passed to something called a lexer, which effectively generates actions based on the pattern of tokens or keywords that have come into the parser. There's then a traffic hop in Postgres, which figures out if it's a utility command or not. If it's not a utility command, it goes into something called a rewriter,
06:02
which is where we handle things like views and rules. But down after the rewriter is that optimizer. Effectively, what it's doing, it's trying to figure out how can I get that user the results as quickly as possible, and then what comes out of that optimizer is something called a plan.
06:21
The plan effectively is sent to the executer, and that's the part that actually executes the query. So this is sort of the map maker, the one who lays out the plan of what we're going to do, and that's effectively passed into the executer, which is really just something that follows whatever plan came out of the optimizer.
06:42
So it's really just executing a recipe that's come out. So we're going to focus on that optimizer, what I think of as the brains of the system. What things does that optimizer have to do? What things does it have to decide? There's three major areas that it has to make decisions on. Number one, what is the scan method
07:02
that's going to be used to get at the data? And I'm going to go in specifics about what scan methods are available in Postgres. Secondly, what join methods are we going to use? We're going to show you specific examples with diagrams and show you exactly what join methods are there. And then finally, join order, that's sort of figuring out,
07:21
do I join table A to B and then C, or do I join table B and C and then A? You know what I'm saying? Different orders of bringing things together. And I'm going to show you specific examples here. Now, you might not be using Postgres, or you might be using Postgres plus some other relational systems. These concepts are very generic. So in most relational systems, you will see the same concepts.
07:44
The terms might be slightly different. There might be a couple other things that aren't there. But these are very generic relational concepts. So first thing, scan methods. There's basically three types of scan methods supported in Postgres.
08:02
And again, deciding which one to use is something I'm going to show you in a couple minutes. The first one is called, but let's first, before we do that, let's set it up. So first what we have to do is we have to create a little like temporary table so I can illustrate all the things I'm going to talk about.
08:20
So effectively, what I've done here is I've just run a little query that goes into the system catalogs and just pulls out some of the relation names in the system. And then what I did next is I took the relation names and I chopped off just the first letter. I just chopped off everything else
08:41
so the first letter just got rid of it. Okay? And then I created a temporary table with these contents. So effectively what I did was I took the system table, then I chopped off, I just took the first letter of the relation name, and then I put a bunch of junk at the end and I created a table called sample
09:00
and I put an index on it. That's all I did. Okay? Because I'm going to use this example for the next probably ten slides and this can kind of give you an idea of how this is going to work. Now if you want to run this yourself, the part in pink here is actually, this is actually the SQL. So if you want to download this, you can run it and you can actually see it running
09:20
on your terminal and you can actually see the exact same output. There are a bunch of slides on my website. In fact, I'm sorry I forgot to mention it. And the reason I didn't mention it now that I realized it is because it's chopped off at the bottom. But effectively if you go to this website right here and get rid of that rest of the part there, this right here,
09:41
if you go under presentations, this presentation is there and then there's 30 other presentations also there. They're in PDFs and you know the bottom isn't chopped off. So if you're missing the bottom there, you can definitely check that out on that website. So this is the temporary table we're going to use. And then we're going to also use a function
10:01
that I used a little bit to display things a little better. Again, this is also in the SQL file. So let's see. What does this file look like? I created a temporary file and I made a little query and I said just tell me what this looks like, this temporary table. So effectively I have,
10:20
I think there's about 253 rows in this table. I can tell when I created the temp table. 199 of them have a P, just the letter P in it, and then another field which has a bunch of junk on it. I have nine of them which is an S and then eight of them which is a C
10:42
and then down here I have one row which is an I and one row is a K. And the reason I created this is because it gives me a nice distribution of a variable kind of case where I've got one letter which is really common in the system, a couple letters which are really common, and then I've got two letters which are,
11:01
a bunch of letters which are very rare. And this is the type of case that you have in relational systems. This is one of the areas that optimizer does very well is it kind of can adjust things based on how selective your data is. And I'm going to go into how that happens in a minute. Okay? So what do I mean by selective?
11:22
Okay? So if I just do this query right here and I say give me a P, okay, what it actually ends up doing is something called an index scan which like, okay, so what if I use, that's a very, that's my most common value, right?
11:42
So what if I do a D which happens to be kind of in the middle here. I've got four of those. And that's also an index scan. And what if I do a K? Well that's also, that's my rarest value and it's doing the same thing. I'm doing, I got an index scan for all three values
12:01
for my most popular value and my least popular value. They're all returning the same thing. Now why is that? Well, the reason is, is because I don't have any statistics on the table. What statistics are, we have a special thing called optimizer statistics and the optimizer statistics tell the optimizer how common
12:23
certain letters are or certain values are within a table. So normally there's something called auto vacuum which will run and analyze for you, but that runs like every minute so I might not want to wait for the minute for it to run. And in this case I'm using temporary tables
12:40
which auto vacuum can't even see so that doesn't help me. So I've got to run to analyze myself. And I just went and I ran that analyze right there at the top. And as soon as I run that analyze you see a change right away. That P which used to be an index scan is now a sequential scan, okay? And a sequential scan is effectively saying start
13:00
at the beginning of the table, read all the rows to the end of the table. Very straightforward, very just you know read everything. This is the easiest one to understand. Now why choose a sequential scan? That's because we showed up here that 78% of our rows were Ps.
13:21
So you know if 78% of your rows are Ps it doesn't make any sense to do all sorts of index look ups to somehow eliminate 22% of your rows. It doesn't make sense. So we're just going to read straight through the table. That's the way to go, okay?
13:41
What happens if I do a D? Now the D I have four of those, okay? And the D ends up giving me something different. It gives me what's called a bitmap heap scan. Sounds kind of weird. But what it effectively is doing is it's going through the index and it's creating a bitmap that tells me where can I potentially find matching rows in the heap?
14:05
And I'm going to put a one every place I can and a zero, any place I don't, I'm not going to find it. So effectively what I do is instead of doing four accesses, I'm going to go through the index ones. It's going to give me, make a little bitmap and then I'm going to walk through the bitmap. Now this is probably overkill for this case, but again
14:21
because there were four it decided to create a bitmap and then use that to kind of aggregate all the lookups into the heap. If I do the K which there's only one of, I actually get an index scan, straight index scan with no bitmap because there's no need for a bitmap because it knows there's only one
14:41
or at least the statistics are telling it that there's only one match. It's going to go through the B tree. It's going to find the row. We're done, okay? So this is the first example showing how the optimizer statistics are effectively changing how we go at our data. All right? And again, as an application developer you don't have
15:02
to worry about oh am I querying something that's very common or am I not? And this happens all the time. If you know if I, if I'm you know in Ottawa and I'm querying all my customers in Ottawa, well I may probably have a lot of customers in Ottawa. So I'm probably going to just do a sequential scan
15:21
of all my customers. But if I have an index on the province and I try and, you know, I try and query somebody in Manitoba, which I actually know is a province in Canada, amazingly from my education years ago. Am I right? It is one of them, yes. Don't hear about it very much.
15:41
I'm not sure why. But if I wanted to see how many customers I have in Manitoba and I've got an index on province, odds are I'm going to use an index there because there's probably only a couple because I don't think there's that many people in Manitoba. Do we have anybody from Manitoba here? Somebody, I'm disappointed. Anyway, somebody can probably give me a good history lesson
16:02
in the after and tell me what is there because I'm always fascinated by Canada. But anyway, no, I think it's really a really great country. So let's take a look at a better description of this, okay? So what I did was I just did some random letters, right? I just did three letters and I kind of showed you.
16:21
So what I did was I created a little kind of query here and I'm actually using a little function that I used and I'm getting this output. But if I kind of queried a little differently, again, all these queries are in that SQL file, I get this. And this to me is the quintessential slide
16:40
for the presentation because what it's illustrating is that as I go from the most common values at the top to the least common values at the bottom, the optimizer is automatically changing the type of access method it uses to get at the data automatically with me not having to do anything
17:01
at all to the query itself, okay? And this is all driven by the analyzed statistics that I created early on. Are there any questions so far? Yes, sir?
17:23
So the question is why is the sequential scan actually doing it not only for 199 but all the way down to the R which is number seven and then down? That is a very common question because most people think that the index should be used if it's selective sort of.
17:46
So they'll basically say, well, you know, I understand if 78 percent of my rows are Ps, I probably don't want to use an index. But you know, if it's 30 percent, I think that index sounds pretty good. But then when you realize that the index has a root page
18:04
and then different like sub pages to get to the thing and then you realize that you're going to have to randomly go through, you're going to have to randomly hit the heat pages. Now, of course, if you use a bitmap scan, you can kind of do them in order but you're going
18:20
to be hopping between. The bottom line is that random IO is really slow and sequential IO is really fast. Okay? So the sort of dirty secret is that like an index almost has to be like under five percent selected before it gets, starts to be used.
18:40
And that's effectively designed because of the way that the costs are set up in Postgres and the type of the way we know we have to do multiple reads to get through that index to get to the row. Whereas when we do a sequential, we know we're just like not only are we reading like straight through but the kernel is prefetching ahead of us
19:01
because it knows we're going sequentially. So is five percent a good rule of thumb at any scale? Not really. I mean the number actually goes down as the table gets bigger but usually five, seven percent is about the number we're going to see. Now that's kind of breaks pretty close
19:22
because the five, I said there were 250 rows in the table so 253. So a five is what? That's two percent? Yeah. So that's actually breaking I would say much lower than that. Now the seven is probably around four percent, three
19:41
and a half, something like that. So again, I think the numbers breaking a little even lower than five percent. I might be even generous. And it continues to surprise me how low that number really is. Now if you think I'm making that up, which you could, okay, well actually the way I'm doing it is going to fake it
20:02
but I want to show you something. What if I tell it, what if I force it to do an index scan? I say I don't believe you optimizer. I am going to turn off sequential scan with that top line. I'm going to turn off bitmap scan with the next line. And I am going to force an index scan.
20:21
I'm sorry. I'm going to turn off bitmap scan and sequential scan. And I'm going to run that again. And what you see is that when you look at the cost, the cost down here was around eight, which this is the same because these were index scans before anyway. But then as you get up it goes to 12, it goes to 15,
20:42
it goes to 19. Ends up when you get up here you're at 39. The original output, which is this one, again starts at eight, goes to 11 but it really doesn't go any, it goes up to 13, okay. Now if you think the optimizer is doing something wrong you can turn it off using these commands at the top
21:03
and you can rerun your query and time it and tell us are we wrong or not, okay. In almost every case when somebody actually, we have an FAQ item that even talks about why is my index not being used. This is a very common question and it actually shows you these commands how to turn it
21:22
off and it says run it with it, run it without it, tell us what you get and the questions magically disappear. It's amazing. We put something in the FAQ and then within six months we don't get the questions anymore. Is the system perfect all the time? No. But overall the costing is pretty accurate
21:42
and we don't seem to have too much, it's almost counterintuitive how low that break point is and you can adjust it if you have SSDs, you can adjust the random page cost. So there are some things you might need to adjust if these numbers, if these estimates in fact are wrong. But in most cases it gets it pretty close
22:02
and I saw a question over here. Mr. Haber. You know that's what I was getting to, that I'm actually cheating and the cheating is that these numbers are actually the costs that the optimizer is giving us. These are not actual execution times.
22:21
So I could have expanded this and kind of done the full execution time on this. I didn't do that. Partially because the numbers would float around and then also partially because you start to get caching effects when you have multiple,
22:41
because when you run it once it's not in cache and when you run it again it is in cache. So the numbers have a tendency to float around and it's kind of hard to get really accurate numbers in that case. Yes sir. So do we find if the percentages increase if the table gets much wider?
23:00
In fact there's, what you can't see over here and maybe I do need to change my resolution, I don't know, what resolution is this thing? We don't know. Okay. Anyway, what's over here is actually a width estimate. So the optimizer knows how wide the row is, an estimate of how wide the row is
23:21
and it will actually adjust it based on that. So a shorter row will have a tendency to be looked up differently. Other questions? Yes sir. Oh ma'am. Sorry. Does the optimizer take into account a clustered index? Yes it does. When you issue the cluster command
23:42
the next analyze operation will actually sample the cluster ratio for the table and because the cluster is not maintained, you have to re-cluster the cluster. Effectively every time an analyzer is run there is a special index that is marked
24:02
as the clustered index in the system tables and it will actually randomly sample and it will update that amount of clustering ratio in the statistics and that will be used to determine how localized a look up is,
24:23
particularly if you're looking up like a P or a D where you think is it spread in five to four different places on the table or is it maybe all in one page. So in fact not only do we record it but we actually keep it updated and the ratio will decrease as the clustering gets dispersed within the table.
24:45
Other questions? Okay. Oh I'm sorry. Yes sir. I was just going to say that I think the table is so compact that the choice of scan doesn't really matter for this case. Yeah this is a contri- Everything fits on one page.
25:01
He's saying it's kind of contrived and I will admit yes that's true. Normally I would have taken a huge data set and run a huge thing but then the problem is the numbers are so huge that you can't really conceptually get at it. Yeah but it's already in memory so it doesn't matter.
25:21
Yes. So the statistics have combinations of values
25:41
but if you have a multi-column index does it look at the combo of that? Unfortunately, well fortunately it actually indexes every column. So whether the column is in an index or not in an index it's going to index them all because a lot of times you're going to say a equals one, b equals 12 and that b although it may not be indexed
26:02
could be very important and we need to know how selective that is to determine how many rows are going to come out which is actually part of the later part where we talk about how we join the data. So we index all of them. We don't index any of the columns specially. One of the big flaws in the Postgres optimizer is the lack of information
26:21
about correlations between multiple columns. We've talked about that. It is on the to-do list. It's something that we have not really understood how to efficiently generate such correlation and use. That is one of the flaws I think and one of the areas we have to improve is the correlation between columns.
26:41
Sometimes if you say this is one and this is 12 that may be a very rare case and we're going to look at each one individually which is unfortunate. No. So the question is do we get limited benefit on multi-column indexes? You still get a big benefit from them. It's just that you don't. The statistics are perhaps not as smart
27:01
but you're still going to be able to look up and filter across multiple columns. In fact if you have an index on A, B, and C and you give us A and C we'll still use the index on A and C in some trickery that we have to kind of span gaps and I'm not even going to go into it but multi-column indexes are certainly very useful.
27:23
Not as useful for planning but it probably still will. And the other thing that you get into in this case is that none of these numbers are perfect. They're all fudgy numbers
27:41
and in fact even if I had the perfect number today right now, five minutes from now it would be wrong. So the point is that these estimates they don't have to be perfect. But they're just going to be good enough to pick a plan that's probably closest to the best plan. So that's probably good enough. Trying to get perfect numbers wouldn't
28:01
A be efficient and B wouldn't be probably very useful. That's correct. The more indexes you have the more possible combinations it has to look for. You're right on C.
28:29
You decide it's useless C. But what you do instead is you create independent indexes on A, B, and C and then you're able to actually use the index on A and the index on C at the same time.
28:41
You can, like this is showing it combined together. You can combine them together. So that's where you would generally want to, that's what you'd want to generally do I think. I mean there are specialized cases where you will want an ABC index, uniqueness constraints and things like that. Yes it can be useful for those factors in bitmaps,
29:02
you know the bitmap itself, but at the same time if you're not sure exactly what the query pattern is going to look like, you're not sure that you're always going to be querying with all three columns, you ought to be querying the index independently. Okay, now I'm going to stop the discussion there because we do want to finish the material. So the second thing that we want to talk about
29:21
is what we call join method. And again this is fairly common for most of the most of the relational systems. There are actually several, actually four different join methods or three and a half different join methods you can use. The first one is nested loop with two types of that. The second one is hash join
29:41
and the third one is merge join. I'm going to show you examples of that and diagrams of each of those. Okay, so we're going to have to create another sample table. This time I'm going to pull some rows from pgProc that happen to be numbers, and I'm going to create a sample one which pulls rows from pgProc and then I'm going to create another table called sample two
30:00
which pulls some OIDs from pgClass. Again, nothing super exciting. You can, you know, this is in that SQL file. I'm assuming, I'm going to start out with no indexes and no statistics. Okay, so I'm starting off with absolutely the worst case that we could do, right? So if I do this query
30:21
with a where clause that's comparing sample one to a constant and I'm joining sample one to sample two I get what is called a nested loop with a sequential scan and a sequential scan. It looks just like this. It's as bad as it looks. It means compare every row to every other row. Now the only reason
30:41
this could be a win is if you have very small tables and there's no setup required. Okay, so very small joins potentially will use this because it's super fast. Alright, and it looks like that. What if I join, what if I restrict sample two to be greater
31:01
than some number, not equal, greater? I get what's called a hash join. Okay, and a hash join effectively says take one side and hash it and then do look ups from the other table into that hash. This is super super popular. It's sort of the vanilla bean ice cream
31:21
of databases. They love doing this. Okay, middle of the road kind of joint and as long as one of the tables fits in memory you're good. Okay, it looks like that. But what if I do a join with no restriction at all? Then I get something called a merge join.
31:41
And I have to sort in fact in this case, this is what a merge join looks like. It says take the first row of the first table compared to the first row of the second table. Then if that matches keep the first row of the first table compared to the second row. If that doesn't match then you can move down and compare the second row to the second row and then the second row to the third row. If that matches keep going
32:01
second row to fourth row. If that doesn't match go down to the third row of the first table. You got a match. What you're effectively doing is you're walking the two tables simultaneously. Okay, so you're kind of advancing the one table and then advancing the other table kind of moving down. One scan through each side of the table. This is great for large tables
32:21
because it doesn't have to fit in memory and you're not multiple scanning it as you would you know in a nested loop. Okay, so this is really really useful and again you see it for big results and it looks like that in pseudocode. Now what if I change the order
32:41
because before I did sample one join sample two. Now I'm doing sample two join sample one. Doesn't matter. Does not matter. In most cases the most smaller relation, the most restricted relation is on the outer side and then you just run your merge join the same
33:00
way. Now if I add statistics okay I'm going to expect to see a change because now the optimizer has some clue of what's going on. It has some clue of how many rows are there. It has I think it already kind of knew how big the table was. It didn't go how many rows and it knows how selective some of the constants
33:20
are. So for example here now this one used to be a merge join. This join remember with no where clause this was a merge join. Now it's a hash join because it realizes you know these tables aren't that big. It's going to fit in memory. Now I got statistics to prove it. I'm going to do a hash join. Okay. And if I do even if I do an outer join which
33:42
you think couldn't use a hash join we can still do something called a hash left join which is kind of weird. Not a lot of databases support it. We do. New and Postgres not in one. Okay. What about cross join? You've probably never seen that before. A cross join is basically saying join every row to every other row
34:02
and what possible optimizer join method could this use? It's got to be the nested loop because that's the one that compares everything to everything else. There's nothing else that could do a cross join. So you can even see when you do something that you would think would use a cross join. In fact it does a cross join.
34:22
Now let's layer something else on top. Let's create an index on both fields. So now we've created an index on both tables. I'm sorry on both tables. Both tables. And again we're going to get some different decisions from the optimizer that we didn't get before. So here we have
34:40
the case where we're comparing it to a constant. And what now happens is we still have a nested loop but the nested loop is much faster because we're now able to look up the 33 with the index. So instead of having to compare every row to every other row now I can zoom right into each table
35:02
and say just give me the 33's I don't care about the other rows. And it looks exactly like this. So it basically uses an index to find all the 33's here and then it joins it using an index look up to all the 33's on the other end. So again a lot of people see index nested loop and they're like
35:22
well if you got an index there nested loop isn't necessarily bad. Because you're not joining every row to every other row. You're just joining every row that matches the index to every other row that matches the index. And again this is a double index scan. You can get cases where you have a sequential scan here and an index
35:40
scan here. The point is you want the index scan on the inner side, on the bottom. Whether it's on the top or not eh, it's better to have it but you don't have to have it. But again if you don't have an index at the bottom then you're going to be scanning every table, every row to find a match and that can get slow particularly for large tables.
36:03
Okay. That's what that looks like. So, um what about a query restriction? Now this gets a little more into statistics. Do you remember way back when we started 25, 30 minutes ago I put some junk in the column. Do you remember that?
36:21
And I put like 200 x's. I just, the original table design I did. I'm back to the I'm back to this one right where I created it here. Just back up here. So this is 250 x's here and the second column is 250 x's here. So if I go in and I say, okay
36:41
um, tell me how many a's. Give me a join and this has to be all a's. Now as you remember I put x's in there. And the optimizer actually know that there's no a's there. But you know it kind of takes my word for it and it says okay I'm
37:01
gonna assume there's one a matching here and I'm gonna assume there's one a matching there. Statistics tell me zero but I'll assume one. Okay. So it even knows for a non-index column how selective that is. And it does affect what it ends up
37:22
doing in terms of joins and in terms of index lookups. Okay. If I do the same thing but I ask for x's that's completely different. Remember nested looping uses when it knows that. I'm not gonna get any rows of that. It's gonna be small. Right. But when I'm doing this with the x's then it says okay
37:41
I'm gonna do a hash join and I'm gonna pull up my pants. This is gonna be you know we're gonna get some rows here. And you can see the row counts are pretty significant. Okay. So again it knows a lot of stuff and it adjusts for you. Last thing I want to cover
38:00
before I take questions is the idea of a limit clause. Many of you have probably seen limit clause. It originally came from MySQL. Very common now. It effectively allows you to tell the optimizer how many rows you want. Now before the limit clause people would use they would effectively use cursors if you've ever used cursors before.
38:22
The reason we don't recommend cursors is because cursors effectively effectively tell the query to run but they don't really tell the optimizer how many rows in the future you're gonna ask for. But when you use a limit clause you're telling the optimizer you know I want you to
38:41
run that query and I really am telling you at the time you get the query, not some later time, at the time you get the query I only want x number of rows. So here we're running the same query again nowhere clause, hash join it knows it's gonna return a lot of rows 260 in this case however what if I say limit one
39:01
I say run that query but optimizer I really only want one row from that. All of a sudden the optimizer can use an index that exists on the table can pull things up and instead of a hash join now I'm doing a nested loop. What if I ask for 10 rows? Still a nested loop. It still thinks
39:21
it's faster to do a nested loop with an index scan, pull those rows based on the order by okay what if I say 100 100 okay now we're back to where we were before so you see what I'm saying? 1 was a nested loop, 10 was a nested loop, 100 okay we're
39:41
back to hash join now I think I'm returning too many rows. So again another example of how the optimizer is sort of using those smarts to adjust what it's doing to figure out basically how to get the answers to you as quickly as possible okay so that does finish my talk
40:01
I have plenty of time for questions in fact 4 minutes I'm going to turn the lights on I wish there was like a dimmer I could use on this but it's kind of scary but I know I know I know I feel like I've just woken everyone from a nap you know my kids get up we got to go to school I can take any questions people
40:21
have yes sir Stephen we haven't
40:40
even repeated the question yet so okay so the question is and I'm sure the people in the back did not hear it is that he's had trouble with explain or getting the optimizer to properly process partition tables where you've got a parent and you've got children now there have been some improvements in recent
41:00
versions of Postgres because in the problem is that you often are querying just the parent table and the parent table doesn't have any statistics on it whereas the actual children are the ones that have the data and have the statistics so we've made some improvements in that in trying to figure out like how selective
41:21
it is but I would say we're still not all the way there and I think there's still a need for improvement in there so if you're having trouble I understand it has to do with the way that we the way the statistic now what version of Postgres was this? okay so you have everything we've released I don't think a 9.4 has any
41:40
improvements for partition tables so not in the way you're talking about I think so that's what I'm trying to get at to clarify the question is the problem that you're having that the explanation looks like a huge execution it's actually running a lot faster but it means that
42:02
us doing performance tuning where we're trying to use explain repeatedly to get oh yeah the explain is gonna it shows you all the filter yeah yeah
42:20
actually the query that you see up here that little function I did actually does explain in a function and then filters the rows coming out so you might want to like create a custom function in your like site where you actually run explain in a function and again
42:41
you can grab the code and then you can filter through it mister hip yes sure right so we have had a long discussion over
43:02
the value of hints some vendors support them some vendors don't some vendors have supported them and then don't want to support them anymore because they can be used very efficiently in cases where the optimizer just doesn't understand enough about the data model but there also are cases
43:21
where it can be very misused and actually contort the thing so we've always had to debate about that we do have some ability to mark columns in a table based on their selectivity so normally when you're looking at query that's not running right the problem is not the query
43:41
so putting the hint in the query is not the problem the problem is really that the optimizer doesn't have proper data in the statistics so we're trying to move to a model where we basically are giving that detailed information that the analyzer can't pick up on the specific columns versus
44:01
spreading them out all through the application and I think we're still in the process of working through that and we're really hesitant to add hints in a way that would be perhaps misused before we get that in place I think is the short answer I probably have room for one more question somebody hasn't answered yes yeah
44:24
so when can you tell from the explain whether it's using statistics or not that's a great question I don't think you can and that's actually an interesting concept that you would perhaps give a message to say hey I don't have any
44:44
yeah but that's yeah that's not that's kind of goofy let me take the one question up top there thank you so how does doing a stored procedure that runs a function so how does it
45:00
change what's going on effectively when you're in a stored procedure and you run a query are you worried about the statistics of the rows returned from the function yeah yeah so the procedure itself when it's
45:22
running queries the full optimizer is in place so every query that's in that stored procedure is getting fully optimized exactly with this process if that stored procedure returns a result there are very poor statistics
45:40
of what is actually returned from that function that's given to us as an optimizer when you create the function you can actually specify an estimate of how many rows are going to come out of it and that actually can be useful for the optimizer if you're having trouble with a query that has a function that isn't
46:02
being optimized properly I think there's still a lot more we can do but that's what we have now so I want to thank you very much I am unfortunately out of time I appreciate it thank you for coming