Voila, Indexes! A Look at Some Simple Preventative Magick
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 87 | |
Number of Parts | 94 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/30716 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
RailsConf 201587 / 94
1
4
7
8
9
10
11
13
14
16
17
19
21
24
25
29
30
33
34
35
36
37
39
40
42
47
48
49
50
51
53
54
55
58
59
61
62
64
65
66
67
68
70
71
77
79
81
82
85
86
88
92
94
00:00
PlanningSubject indexingQuery language2 (number)Human migrationMultiplication signEntire functionTable (information)Key (cryptography)Row (database)DatabaseStructural loadMatching (graph theory)Programmer (hardware)AlgorithmQuicksortInformationBasis <Mathematik>MaizeBlogBit rateMoment (mathematics)Sound effectEstimatorWordMereologyInternetworkingMathematicsGreatest elementAuthorizationControl flowGroup actionFunction (mathematics)Disk read-and-write headChainElectronic mailing listBitProduct (business)Library (computing)Spherical capVideo GenieFigurate numberResultantType theoryPort scannerArithmetic meanNumber8 (number)Marginal distributionGeometryRule of inferenceCASE <Informatik>Direction (geometry)Point (geometry)Stability theorySocial classSlide ruleState of matterOnline helpSystem callOrder (biology)WaveDifferent (Kate Ryan album)Inheritance (object-oriented programming)MultiplicationWater vaporAreaAnalytic continuationOperator (mathematics)Hill differential equationDecision theoryProcess (computing)Buffer solutionField (computer science)Goodness of fitTrailComputer clusterPower (physics)Musical ensembleSimilarity (geometry)RippingPlastikkarteFreewareCrash (computing)Condition numberRight angleSampling (statistics)Wave packetParameter (computer programming)Content (media)FinitismusBuildingPhase transitionSoftware testingComputer configurationTerm (mathematics)RankingKeyboard shortcutCartesian coordinate systemCentralizer and normalizerExecution unitLine (geometry)1 (number)Endliche ModelltheorieFormal languageMaterialization (paranormal)CircleLibrary catalog3 (number)WindowWeb pagePhysical systemSoftware developerOpen setFingerprintAngleAlphabet (computer science)Graph coloringThumbnailCountingWeightComputer fileVideo game consoleBoom (sailing)Discrete groupPointer (computer programming)Single-precision floating-point formatMiniDiscReading (process)MultilaterationComputer animationLecture/Conference
Transcript: English(auto-generated)
00:12
I'm going to tell you a story today about a wizard named Basil Smock-Whitner. He styles himself a gentleman as well, but his manservant kind
00:21
of takes issue sometimes, a little quirky. This is Basil on the left. Ignore the similarity to Mark Twain. The artist has issues. But that's Fabian there on the left. They're currently vacationing in the Sahara, trekking across the beautiful, rippling golden sands.
00:42
Over dune after dune, Basil's having a blast. Fabian's not so happy about being the one to carry the luggage. But when you're the manservant, that's kind of how it works out. Now, you know, they're doing all the touristy things. They're there to see Basil's old school friend Ptolemy. And so, you know, they're spending some time doing the stuff. They're doing the things tourists do.
01:00
They go to the pyramids and get their portraits painted. They've visited all the good restaurants. They've sampled the local cuisine, visited all the flea markets, found all the touristy baubles. And somewhere along the way, Basil discovers an old brass lamp, oil lamp. You know the type, right? Super tarnished, really ugly, battered, and beat up. But you buff it up a little bit, and it brings out a nice shine.
01:23
Not only that, but it brings out a nice genie. Now, that's a little bit cliche, but a cliche that offers you three wishes. Like, who's going to turn that down, right? Basil certainly wasn't. But when you're a wizard, you've got a lot of power. Like, what is a wizard going to wish for? That he can't already conjure.
01:41
So, you know, Basil thinks about this, and it's a tough decision. He spends a lot of time. He finally remembers, though, that when he was a kid, before he started dabbling in all this magic stuff, he realized that there was a time when he was passionate about librarians. He knew all of the great librarians by name. In fact, he had all of the trading cards. His Eratosthenes card was signed.
02:02
He was that hardcore. Now, the problem is that back then, there weren't a lot of libraries. There were few and far between. And so, if you want to be a librarian, what are your options? You've got to wait for a librarian to die. Or, you know, you could, you know, off them yourself to make room.
02:21
But you've got to be pretty bloodthirsty. I mean, honestly, let's face it, librarians are a cutthroat bunch. That's kind of the way it works. But Basil wasn't either patient enough to wait for one to die, nor bloodthirsty enough to kill one. And so reluctantly, reluctantly, he followed his father's advice. Traded in his dream and his collection
02:40
of cards for a career in magic. But now it all comes rushing back as he's staring this genie in the eye. And he realizes that the thing he wants more than anything else is a library. So he wishes for one. And not only that, I mean, being a wizard, he thinks big. So he's like, I don't just want any library. I want the biggest, the most magnificent,
03:04
like world wonder library that there is. It's going to have more content than anyone else's. Eratosthenes, his idol from childhood, no, Eratosthenes is going to envy this thing. He's serious. And so to make sure he has more content than any other library, he stipulates a very specific condition to the genie.
03:23
He says, genie, I want this library, but for every minute that the sun is up, I want one new scroll to appear in the library. So every minute during the day, a new scroll, and the genie's like, all right, whatever, snaps his fingers, and poof, there's a library.
03:40
Now Basil is stoked. He is so excited about this. Fabian has never seen him like this before. Basil is seriously jumping up and down and running in circles. He's so excited. He runs inside the library and immediately begins writing out a schema to describe how the data is all going to relate, like how the scrolls are going to relate to each other. And he's really excited.
04:01
And now because he's excited and in a hurry, he leaves out a few, you know, minor details. We've all done it, right? Things that, you know, we can take care of this later. It's not a big deal. But he rushes through it, gets his schema ready, and this is what he comes up with. You know, he's got a bunch of tables up there. He's got languages and nations, you know, people,
04:21
scrolls, obviously, and how they relate to subjects and then the materials for each scroll. He's really serious about this. He's obviously spent a lot of time thinking about libraries. But he's left something off. You caught it? No indexes. But who's going to need them, right? Not a big deal. In this little, you know, world of imagination here,
04:42
he can go and grab a scroll really easy off his shelf. It's not a big deal. Now, how many of us have ever shipped a production table without putting indexes on it? Yeah, I mean, we do it. We're not proud of it, but it happens. How many of you have lived to tell the tale? I mean, obviously, if you're here. Those of you who aren't here obviously did not live
05:02
to tell the tale. You are the fortunate ones. So Basil, to his chagrin, is going to learn soon why you don't do this. Because, though it starts small, the first couple of minutes, it's a few scrolls, big deal. But by lunchtime, he's got more than 200 scrolls on his shelf.
05:22
And by the time the sun sets that night, he's got more than 700 scrolls. Seven hundred scrolls? I mean, that's a lot of scrolls, let's face it. And the only way to satisfy the requests that are coming in, because people have already heard about this world-renowned library, even in the first few minutes. They're coming in and getting in line and saying,
05:40
I want this scroll. I want Aristotle. I want Plato, right? I want, I want, I want, I want. And so Basil and Fabian both are starting to have to answer these queries, and the only way to do it is to go scroll by scroll by scroll. Is it this one? No. Is it this one? No. Not a good situation to be in, because by the end of the next day, there's more than 1,500 scrolls.
06:00
By the end of the week, we're talking five, 6,000 scrolls on the shelves. And he has to scan the entire thing. Now, this is exactly what happens in our databases when we are foolish enough to leave off an index. Our poor, beleaguered database has to go row by row. When we say, give me any scroll by Aristotle, that poor database has to go, okay, is it this row?
06:23
Is it this row? Is it this row? Is it this row? Which is not a big deal when you're in development and you have 10 rows in your database. And when you first deploy your database to production before your users get their grubby little fingers all over your clean, pristine data, things run really fast, too. But as soon as those users are in, man,
06:41
things go south in a hurry. As you all, I'm sure, know, you're sitting there one day, doing just fine, and you start getting alerts that your system is slowing down. And you look at it, and you're like, your database is like working really hard. What in the world is going on? And people start to panic, and the requests come in, and you're working your brains out,
07:00
and it's just not enough. And pretty soon, the whole system goes down in a blaze of panic and flame, right? Ugly situation. This is the case here. It was just too big of a job for Basil and Fabian to handle alone. They couldn't do it. They couldn't keep up. Patrons were queuing up in line. They were starting to get restless. I mean, you think librarians are cutthroat? They're cutthroat for a reason, because their users are merciless.
07:23
These patrons are lining up. The line goes out the door, down the street, and over the next two sand dunes. And no one wants to wait in a line that long, at least of all the patrons. And so they've had enough. They torched the library. Basil and Fabian barely escape with their lives.
07:40
Now, you have to give it to Basil. He, you can't keep that guy down, especially when a genie owes him two more wishes. So singed and a little fire-shy, he's like, okay, what did we do wrong? Well, obviously, we forgot indexes, right? If we'd had indexes, it would have all been okay, because we would have been able to look up stuff. And so Basil, he's taking no chances.
08:01
He indexes all the things, right? If no indexes are bad, all the indexes have got to be better. So, I mean, he's got indexes on, like, the ink color, and he's got indexes on the word count of the scroll. He's taking no chances. He's like, this is going to work.
08:21
So eagerly, he jumps into his new library. Now, how many of you remember card catalogs? Legacy of other days. I'm glad to see so many hands going up. It makes me feel less old. Basically, these card catalogs were these great, big, enormous cabinets with these little tiny drawers filled with thousands and thousands of cards.
08:40
You'd go to the author cabinet, find, if you're looking for Aristotle, you'd pull open the A drawer, and you'd thumb through until you find A, Aristotle, in alphabetical order in the cards. And then it would tell you all the different books that Aristotle had written that existed in your collection. Pretty straightforward, right? Same for title. If you wanted to find a book by title, you'd go to the title collection. For subject, you'd go to the subject collection.
09:02
Really not much to it, but this is exactly what an index is. It's a sorted list of data that points to where something exists. That's all an index is. And so Basil's indexes were implemented as these card catalogs, and they were enormous because remember, he had to account for a new scroll coming in every minute, and he was planning to be in it for the long run.
09:21
Here's an example of what Basil's title index might look like. When a user came in, and, you know, they come in and say, you know, I'm looking to kill Medusa. I need something that lets me kill Medusa in 21 steps. And so Basil's like, okay, I have one right here, and it's over here on the shelf. And he pulls it off, and the query is answered in seconds. Super easy.
09:40
In fact, Basil and Fabian together, they have lots of downtime in the mornings, especially when things are slow. Scrolls come in, you know, they have to fill out a few dozen slips to file in all the different cabinets. But when it's slow, it's not a big deal. They're able to answer these queries so fast, Basil's like, yeah, totally nailed it this time. And he has this spare wish in the background. He's like, I'll have to think of what to do with that one later.
10:03
Sadly, load is not always constant. In the mornings, it's really slow. When people come in for lunch, though, people want to check out a book, do some light reading during lunch. After work, huge spike because people come in by the droves trying to get something to read for the evening. And when you've got this huge rush that lasts for an hour,
10:21
and you're working hard to answer those, I mean, even when it's a fast query, it still takes time. And people are lining up. They don't have time to deal with these scrolls that are coming in. That's lower priority. We're serving requests right now. And so these scrolls are piling up. At the end of an hour, there's 60 scrolls that have piled up, and each one has dozens of slips that need to be filled out for it.
10:41
So suddenly, they finish that rush. Now they've got this huge backlog of work, and they're working through it. Patrons are still coming in at a lower rate, but those patrons are now having to wait while things are indexed, and it's just not pretty. On top of that, patrons do ruin scrolls. I mean, sometimes they'll fall asleep in a carol and drool all over a scroll, and the ink runs, and Basil has to throw the scroll out,
11:01
and it's really tragic, and he's upset about it. But that's not even the worst part. When you ruin a scroll, you have to go through to all those indexes and pull the card, rip it up, and throw it away. It's a lot of work. And on top of that, the worst thing of all, and this drove Fabian nuts, how do you know what index is to use? Oh, my gosh. I mean, something like, I need a book named X. Well, okay, that's easy.
11:22
You go to the title index, and you've solved it. But what if someone comes up and says, you know, I'd like a book by a Greek author about linguistics published more than 200 years ago in blue ink on yellow parchment. And Fabian would just get the deer-in-the-headlights look. He's like, I have no idea where to even begin.
11:40
And Basil would like, it's okay, you go handle the other ones. I'll take care of this one. He's chugging through it, trying to find the indexes. That's a lot of indexes to choose from. You know, they did their best. Things were backing up. Basil did his best to work through the backlog. He summoned a small army of homunculi, these cute little things, you know, up to about your knee that would work like mad and could get a lot of stuff done, but holy cow, they have an appetite.
12:02
And they were eating Basil out of house and home. They were breaking the bank. He finally had to dismiss them before they totally ruined everything. But that bought them a little bit of time, but it was just a Band-Aid. In the end, they did not have enough power to process everything. Patrons were lining up. It went, again, down the road, up the hill, over the dunes, and again, they torched the library.
12:23
Now, Basil's getting a little despairing at this point. You can't blame him, right? What am I doing wrong? Now, Fabian's not despairing so much. He's used to Basil's escapades. Like, this is par for the course, pretty much. Being almost burned to death twice in a row, it's kind of a bummer. But he's like, you know, whatever, Basil, you figure it out
12:41
and let me know what we're doing next. But Basil's racking his brain. Okay, no indexes is bad. All the indexes is bad. What do we do? I mean, you think about it. If this is your database and you have indexes on everything, your database has to work really hard, too, right? Every time something comes in, your database has to go through and file something in every single one
13:02
of those indexes that correspond to it. If there's a deletion, it has to go through and pull them out. If there's an update, if the title of a scroll changes, your database has to go in and update every index that is affected by that. And every bit, I mean, work is a finite resource, right? We all know that.
13:20
Every bit of work that your database is doing not serving queries is work that it can't spend serving queries. That sounds obvious, but it's an important point. The more work your database has to do that's not related to serving information, the less work it can do serving information. And Basil learns this to his chagrin.
13:41
Now, as he's staring at the pile of ashes that was once his pride and joy, he realizes, okay, I have this list, like about four days' worth, you know, two days here, two days there, four days' worth of queries from previous patrons that he can analyze to figure out, you know, maybe all the indexes was a bad idea.
14:01
Let's index just the things we need. So he's like, okay, I can do that by looking through this data. But this is a small snapshot of data. Like, he didn't have the libraries open for very long before everything came crashing down. How do you know the window, the snapshot you have of your queries is sufficient to say what indexes you really need?
14:20
Who's to say that the next day a patron wouldn't have come in and queried something totally that wasn't in here? I mean, obviously, you've got, let's look up by title, let's look up by author, let's look up by nationality, you know, these different things. But what if there's something else? How do you know? That is the crux of the problem.
14:40
How do you know what indexes your application needs? It's not this magic wand kind of a problem. Basil had to rack his brain. And it turns out that the key is a foreign one. No, that's bad. The important thing here are foreign keys.
15:00
That's better. Foreign keys, what are foreign keys? Well, a foreign key is a column, is a field in a database that refers to, or field in a table, excuse me, that refers to a record in another table. In this case, we have scrolls that have an author ID, and we have people that are referenced by this ID.
15:20
So the scroll has an author, points to the other table, like foreign, right, external, outside of that table, foreign key. Foreign keys are a great place to start with your indexes. Now, this makes sense when you think of it in terms of Rails, because with Rails, belongs to and has many correspond to these foreign keys. You put belongs to on the child table, the one that has the foreign key in it, and you put has many
15:42
on the parent table, the one that is referenced by it. So a scroll belongs to an author, and a person has many scroll. Easy reciprocal relationship, and bingo, there's a foreign key for you. And you can just look at all of your belongs to and all of your has many queries and be able to infer a bunch
16:00
of your foreign keys. And that is a great place to start for your indexes. But should you index every foreign key? Well, it depends. Let's look at this example, where we have a person and a nation, two models, right? A person belongs to a nation, and a nation has many people. Easy enough. Person dot nation, that's just,
16:21
that's not really using a foreign key, right? It's taking the value in the nation ID and then doing a look up by ID in the nation's table. And since we get primary key indexes for free, right, indexes on the ID, that's not really helping us a whole lot. We've already got that one down. But the other one, if you're saying nation dot people,
16:42
give me all of the people in Germany. Give me all of the people in China. Yeah, you know, that's going to go through and that's going to look at the nation ID column in the people table. But is that a useful query? What are you going to do with millions of rows from a single query? You're going to crash is what you're going to do. That's not a very helpful thing.
17:02
And so that may not be the kind of query you want to do. Maybe you don't need to index nation ID in that case. Maybe. Okay? We'll see in a bit how maybe that's not the case. But consider the direction of your queries in order to determine which foreign keys you really need to index. Because you might not need to index them.
17:21
Now here's another one that Rails does for you. It has many through. Also has and belongs to many, right? This is the type of query that gives you two models with a join table in the middle. And it lets you do, you know, the many to many kind of relationships where a subject has many scrolls and a scroll has many subjects. All through that intermediate join table.
17:41
Now you look at this and you're saying, okay, so where is the foreign key in this one? With belongs to and with has many, the foreign key existed on the child table. But where's the child table in this case, right? It's not scroll and it's not subject. It's that join table in the middle. It has both of the foreign keys on it. It has the scroll ID and it has the subject ID on it.
18:02
All right, so you index both. Again, that depends. Let's take a look. A scroll has many classifications. That's that central table that connects subjects to scrolls. A scroll has many classifications. And it has many subjects through that classifications table. If you were going to say scroll.subjects, it's going
18:22
to go through that classifications table and find every classification where the scroll ID matches the ID of the scroll. That foreign key, that scroll ID is the one you'd want to index for a query going this direction. Now go in the other direction, it's the other one, right? This time we have a scroll with many class, or subject with many classifications and many scrolls
18:42
through that classifications table. And so if you're going to find classifications by subject, now you're looking at that subject ID column. That's the foreign key you're going to index that time. Now in Basil's case, he's looking at it. He's like, yeah, people are going to want to know what subject the scroll has, and people are going to want to know which scrolls belong to a particular subject.
19:01
So I'll index both of them. Totally reasonable, right? Fabian now, he thinks he's caught his master out. This never happens, so Fabian's pretty chuffed about it. He's thinking, okay, I finally have one upped this guy. He says, master, you know, you say that we need to cut
19:20
down the number of indexes, and then you go ahead and you create two indexes. Can't you just do one with two column? Isn't that the same thing? Can't you just put an index on scroll ID and subject ID and get the same result? And here we have to disappoint Fabian because, no, that's not the same thing at all. Remember, an index is a sorted list of data.
19:41
When you have more than one column, that data is sorted first by the first column, then by the second column, and then the third column, and the fourth, and however many columns you have in your index. So in this case, we had scroll ID and subject ID. You could totally look something up by scroll ID here, right? Because it's sorted. You could go in and say, okay, scroll number 1,027, bam, you can find that in a hurry, and you can find exactly
20:01
which subjects correspond to it. But if some patron were to come along and say, okay, I need everything corresponding to subject number seven, that second column is not sorted by subject ID. And so it's not going to help you one bit. You're going to have to resort to a full table scan to answer that without a specific column. So, sorry, Fabian, you have to index both of them
20:23
if you want the two-way query support. So Fabian's like, all right, then, well, what is the point of a multi-column index, right? If it's not saving you anything, why use it? I'm so glad you asked, Fabian, because that's the very next slide. Let's say, remember I said people dot nation was a bad
20:40
idea because it could return, or nation dot people was a bad idea because it could return millions and millions of rows? Well, you combine that index with another column and suddenly it becomes much more useful. The second column acts to refine the query. So you say nation first, it's going to be sorted by nation. You say, okay, I want to find all the Greek authors named Aristotle. Great, the database says.
21:01
It goes out and it says, okay, here we go, Greece, Greece, Greece, there it is, Greece, Aristotle, Aristotle, Aristotle, and it comes back with however many happen to be in there, and it's a very fast query. So that index that maybe wasn't useful by itself, now suddenly it's useful when we combine it with another column. Does the order matter?
21:20
Does it matter that we put nation first and first name second? Not particularly, but if we were to swap those, we might be able to get two useful indexes for the price of one. Because if we swap it, now we can answer questions like, I need all of the people named Plato, and that's Plato, the philosopher, and not the toy, right?
21:44
We can find all of them just like that, regardless of where they're from. Or we can say, I need all the people named Plato from Greece, and we can answer that query, too. So with a little bit of thought, you can engineer your indexes to satisfy multiple types of queries.
22:00
This kind of multi-column thing is especially useful when you're dealing with data that is continuous, as opposed to discrete. Discrete like IDs, one, two, three, four, five, right? Versus continuous, like a time value. You're not going to go in and say, okay, if we had a published at column on our scroll, you're never going to have someone come in and say, I need every scroll that was published on the 23rd
22:22
of April, 38 BC, at 942 AM. You're never going to have that, because it's too specific, and it's not the kind of information people keep in their heads about the kind of scroll they're interested in. However, you add author ID to that, sort things by author ID first, and then publish that, then you can answer queries like, I need every scroll published by, every scroll by this author
22:42
in order of publication date, and your index will give that to you, because your indexes are just sorted lists. In fact, when it uses an index like that, many times, I think it will even return the data in that order, whether or not you provide an order by, which is a nice, interesting side effect.
23:02
Another example of this, here we have a little query that's going to say, give me the 10 most recently added scrolls, the scrolls that were added most recently to the collection. Notice there's no criteria here. There's no where clause. We're not saying, give me all the scrolls where this is the case. We're just saying, give me the 10 most recent scrolls.
23:21
Now, without an index on added on, what is your database going to have to do? It's going to, without that index, it's going to have to basically reorder the entire table just so it can return the 10 rows that correspond to the most recent. Not a happy thing to do when you've got a million rows in your database, but if you add an index on that,
23:41
suddenly that query becomes trivial, because all the database has to do, in this case, we're sorting the sending, right? So all the database has to do is go to the very end of the index and return the 10 rows at the bottom, and you've got the 10 most recently added scrolls, just like that. So please, don't do like Basil did at the first, right, and forget your indexes, because indexes on sorted,
24:03
on sort column are as important as pretty much anything, too. Make sure that if you're doing sorts in the database, that you're indexing the column. So Fabian's like, okay, one more question, and Basil's like, dang it, we're so close, stop interrupting me. Fabian's like, well, but how do you know
24:21
that these indexes are actually going to be used? How can you know that the queries you're going to be getting are actually going to be satisfied by these indexes? Well, that's a head scratcher. Basil thinks about that for a minute, and he's like, you know, let me check my spell book. And so he flips through it a little bit, looks at the index in the back,
24:42
and he stumbles upon one of a special little incantation called explain. And explain is an incantation everyone in this room can and should learn. It's really not hard to master. Let's take a look at how it works. Here we've got a query that says, give me all of the scrolls by Greek authors, right?
25:00
It joins on the author and the nation table and finds all the nations named Greece and then returns all the matching scrolls. Not too bad, especially if you have an index, right? But now let's say we want to see what indexes, in fact how the database in general is going to satisfy this query. You ready for this? Boom. You just put dot explain on the end of it,
25:22
and Rails will automatically run the explain on it. Now, a little side effect of this, Rails will actually execute the query first and then return the explain, which I don't entirely understand, but I do understand that there's a rationale behind it. If that's a problem, like if your query is one that's going to take like three hours to run, then you might want to go directly to a database console and key
25:43
in the explain yourself. But this is a quick way to get familiar with how the explain works. If we were to run this explain without indexes, it would spit out something like this. Now, this is really arcane output. Very typical of the kind of thing Bazel breathed.
26:03
And this is MySQL output, MySQL output as well. It's going to be different depending on the database you're using. So let's talk through this quickly. On the left, well, first you see there's three rows in the output. I've kind of had to break the table in half because it was so long, so bear with me on that. The top is the left side of the table,
26:22
and the bottom is the right side of the table. So scrolls, the database is saying first thing I'm going to do is I'm going to look at the scroll table. And under the type column, you see that word all in all caps? That is bad news. That is MySQL saying I have no other recourse but to scan the entire table row by row. A bunch of nulls after that, we'll talk about those
26:42
in a minute, but it's saying under the rows column on the bottom there, it's saying I expect we're going to look at about 43,000 rows. But that number could be much, much higher if that's the number of rows in your table, right? If you've got 4 million scrolls in your database, that's going to mean that it's going to say, yeah, I estimate we're going to look at about 4 million rows. Not good.
27:00
So next to the database it says, okay, for each scroll that we see, we're next going to look at the people table. And this time I have a key to look at, because the primary key index is given for you for free. So it says, okay, I can use that one. I'm going to use the scrolls.author reference for foreign key, and I estimate that there's going to be one matching call, one matching record, which you'd expect because each scroll has one author.
27:21
And then it says, from that, then we're going to join on the nation's table. Again, we can use the primary key, and we're going to look at the people.nation ID and compare that with the nation ID on the nation's table. And I estimate there's going to be one row there. So for every single row in the scrolls column, the database is saying, okay, get the person.
27:40
Get the nation. Is the nation Greece? Nope. Throw it out. Let's do it again. Scroll, people, nation, yep, match, keep it over and over and again for the entire scrolls table. And that is what kills performance. So once we add indexes, though, suddenly the query plan changes. Notice it's not even starting with scrolls anymore. Now the database is saying, aha, if I start with the nation's table, this is going
28:02
to be a lot more efficient. We're going to, now it's suddenly saying, oh, there's possible keys to use here. I have a primary key, and I have this nation's name key. And I think we'll go with the nation's name key because that lets me use the name column. It estimates that there will be one matching row,
28:21
and it's using where and using index. Using index is a lovely sound because that means that the database is going to be looking at the index to solve it, and it can actually use just the index. It doesn't even have to go to the data on disk. Using where says it's actually comparing the information in the where clause to the index in order to come up with it.
28:42
So it finds every nation named Greece, and then for every nation it finds there, it's going to use the people table. And here it's using the people nation index because now it's going in the other direction, right? It can't use, it's not using the primary keys. Now it's looking up people by nation ID. And again, it says, I estimate we're going to find about one.
29:01
Sometimes it's just an estimate too. Using where, using index. And then finally scrolls, and we'll look up scrolls by author ID, and the result comes out, hey presto, I'm thinking about 107 rows we're going to have to look at to get this. And that is a whole lot better than 40,000.
29:20
So see how explain can help you answer these questions like why is this query going so slow? You can look at it and see, oh heck, it's doing a full table scan on this table that used to have just a few but now has thousands and thousands of rows. A useful little spell to know. So this is Basil's new schema.
29:42
He's ripped out a bunch, you know, dozens of indexes that he doesn't need. Significantly slimming down his schema, and he's ready to try again. He summons the genie, makes his wish, hey presto, there's a brand new library, and he gets to work.
30:02
Scrolls start coming in, one a minute. He's filling out just a few indexes now for each one. He and Fabian are answering the queries. A few hours go by. They're biting their nails, but it seems to be working. They're actually staying on top of it this time. The queries come in. They're handling it just fine, and every once in a while you get the weirdo that's like,
30:22
you know, I need every Egyptian scroll by, or about geometry that was written between the 5th and 8th century B.C., and that needs to be in blue ink on yellow parchment. And they're like, you go wait over there. I don't have time to deal with you right now. Next, and then they'll keep dealing with it.
30:41
Another person then goes wait, so we're in the corner, until they have time to deal with them, which is usually going to be after the library closes and the scrolls stop coming in, and things are calm, and they can go in and deal with these one-off obscure queries. The thing is there's only going to be like one, two, maybe three of them each day, so it's totally doable. Sadly, and who knew, but Alexandrians have this thing about libraries,
31:03
and so on general principle they torched it. Basil and Fabian, you know, Basil was pretty crushed. Fabian's like, whatever. I almost died again, but that's far from the course. And Basil, but Basil has learned that he can do this. He's found a system that's going to work, and he's going to go now someplace
31:21
where they will appreciate his libraries and not burn them down on general principle, and he is going to be happy. So now it comes back to you. How are your databases looking? What's the health of your database? How have you been treating your database? What indexes are you missing? What can you do to increase the speed
31:41
with which you can respond to the customer queries? Look at your foreign keys. Look at the belongs to, has many, has many through. Look at your sorting. See what columns you're sorting on. Use explain to figure out whether or not your queries are actually using indexes. It's a lot better to discover that your queries aren't using indexes
32:02
when the load is still low. Like, when your queries still seem fast, that's the time to be checking to see whether or not they're using indexes, because you don't want to wait until suddenly these queries are taking four or five seconds to do, and the requests are backing up, because what do you do to add an index? You have to do a migration, and a migration to add an index does a full table scan,
32:22
which is not going to help your situation one little bit. Take my advice. Add the indexes as soon as you can, and if that means adding them at the moment you create the migration for the table, please do. That is the way to go. So if you've enjoyed Basil and Fabian at all, last summer I wrote a little novella online
32:41
at blog.jmasdoc.org. I encourage you to go read it. It has nothing to do with databases. It's about them discovering pathfinding algorithm, and totally unrelated to this at all, but because I have the opportunity to do a shameless plug, I'm also writing a book about maze algorithms for programmers, mazes for programmers, and if that sounds interesting to you at all, you can go to pragprog.com slash book slash jbmaze,
33:04
and it's in beta, and I'd encourage all of you to go out right now and buy it, so thank you.