We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Optimizing your webapp by using django-debug-toolbar, select_related(), and prefetch_related()

00:00

Formal Metadata

Title
Optimizing your webapp by using django-debug-toolbar, select_related(), and prefetch_related()
Title of Series
Part Number
34
Number of Parts
44
Author
Contributors
License
CC Attribution - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This talk explains how to perform SQL query analysis and how to rewrite your views to reduce the number of queries Django uses in evaluating your model objects and their attributes. Special emphasis will be given to the powerful methods "select_related" and "prefetch_related." I will highlight the problem with a naive use of the ORM, how to target code for optimization, and the beneficial result.
XMLUML
Query languageExpected valueSet (mathematics)Projective planeLevel (video gaming)Cross-correlationSequelWrapper (data mining)Data managementTheory of relativitySubject indexingCASE <Informatik>Functional (mathematics)Condition numberMusical ensembleElectronic mailing list1 (number)GenderArithmetic meanInheritance (object-oriented programming)Multiplication signOnline helpOperator (mathematics)Symmetry (physics)CodeObject (grammar)ResultantCategory of beingEndliche ModelltheorieFilter <Stochastik>Model theoryPoint (geometry)BlogView (database)Structural loadServer (computing)Configuration spaceDatabaseAdditionImplementationParameter (computer programming)Information securityAbstractionGroup actionBoilerplate (text)Similarity (geometry)Pattern languageQuicksortTemplate (C++)WindowProduct (business)BitSoftware developerLibrary (computing)MomentumProof theoryKeyboard shortcutMoment (mathematics)CollisionTerm (mathematics)Fault-tolerant systemSoftware engineeringKey (cryptography)Web pageSampling (statistics)Mobile appInterface (computing)Computer programmingAttribute grammarCache (computing)Data dictionaryLatent heatPerformance appraisalTwitterRange (statistics)Lecture/Conference
Software developerDifferent (Kate Ryan album)Power (physics)Projective planeBlogInformationMeasurementBitElectronic visual displayView (database)Interface (computing)Term (mathematics)Query languageLetterpress printingFunctional (mathematics)SequelRight angleProcess (computing)Computer animationLecture/Conference
CuboidInformationWeb pageLevel (video gaming)Projective plane
Electronic mailing listBlogWeb pageLine (geometry)InformationCASE <Informatik>Multiplication signVisualization (computer graphics)Social classExpressionSequelObject (grammar)Dependent and independent variablesSet (mathematics)SoftwareFocus (optics)Overhead (computing)Process (computing)Event horizonRight angleAdditionElectric generatorEuler anglesMachine visionEndliche ModelltheorieQuery languageInjektivitätBit rateMathematical optimizationKey (cryptography)Statement (computer science)Term (mathematics)Theory of relativityCodeRoundness (object)Server (computing)AuthorizationDatabaseBitDebuggerComputer animationLecture/Conference
Field (computer science)Lattice (order)Object (grammar)BijectionKey (cryptography)InformationResultantTerm (mathematics)DatabaseQuery languageSoftwareMultiplicationBlogAuthorizationMultiplication signStatement (computer science)CASE <Informatik>Selectivity (electronic)Set (mathematics)Group actionEndliche ModelltheorieTheory of relativity3 (number)Likelihood functionLoop (music)Semiconductor memory
Mathematical optimizationNumbering schemeQuery languageWeb pageBlogBlock (periodic table)Reduction of orderProgrammschleifeComputer animation
BitInterpreter (computing)Query languageTheory of relativityKey (cryptography)Pattern languageGroup actionExistential quantificationCASE <Informatik>Symmetry (physics)IdentifiabilityElectric generatorComputer animationLecture/Conference
CodeObject (grammar)Key (cryptography)BijectionCASE <Informatik>Mathematical optimizationRevision controlTheory of relativityTable (information)Set (mathematics)Line (geometry)AdditionSelectivity (electronic)Operator (mathematics)Parameter (computer programming)Default (computer science)DatabaseMultiplication signResultantCartesian coordinate systemOnline helpEndliche ModelltheorieInformationStability theoryStudent's t-test
Query language
Query languageView (database)InjektivitätReverse engineeringTheory of relativityBridging (networking)Cycle (graph theory)Template (C++)Pattern languageCodeKey (cryptography)
Point (geometry)
Transcript: English(auto-generated)
So just an introduction as was already done. I'm a software engineer at Venmo. My name is Chris Adams. My Twitter GitHub is Adamc64. Interestingly, I'm not the Chris Adams who's also a developer, Python developer, Django developer, and Django contributor
who works at the Library of Congress, ACDHA. That's not me. And neither of us are the gentlemen Chris Adams, a 90s era professional wrestler who's here for disambiguation's sake. There he is. That's neither of us just in case you're wondering. Yeah, it's kind of interesting having a very common name
and seeing the collisions that can possibly happen. So we're here to talk about Django. Django is great. We all love using Django, I'm assuming. And Django, but Django is really a set of tools. And those tools work together in interesting ways
and helpful ways, but at the end of the day, what we're using when we're using Django, when we're writing Django code is a set of tools that work a certain way. So tools are great, but tools can be used in good or bad ways, okay? This, in one sense, is very obvious.
I mean, but when you're actually getting into things, it might not be so obvious to you when you're using them, especially for the first time. So the Django ORM is one of the tools that Django provides to us, object relational model, easy wrapper over SQL queries,
which used to be much more tedious to be writing. So one thing to remember is that whenever we're using a tool, especially a new tool, is we should manage our own expectations for tools. So many people approach a new tool with a broad set of expectations
as to what they think the tool will do for them. However, this may have little correlation with what the project actually has implemented. So this is just kind of a long way of saying that tools can be misused, especially if you're new to them. So, you know, as amazing as it would be, if they did, unicorns don't exist.
Tools are not really perfect. They never can be perfect in this imperfect world. So we have to really understand them and not deceive ourselves, especially when encountering new tools, that this tool is the one panacea
that's going to really get everything. It's going to understand me. It's going to know me. It's really going to, you know, I'm going to be understood by this tool that I'm using. Actually, that's not true. You responsibly understand the tool and to use it correctly. So Django ORM, among its other, among other things you could say about it, is an abstraction layer.
So not only is it a tool, but it's an abstraction layer. Abstraction layers are great because they take us away from messy details, but also risky for the same reason. They take us away from messy details. Same reason, okay? So we don't have to do many things that we used to have to do before
because we have the ORM, but we might forget or not understand that the ORM's doing things that we don't expect it to do. So don't forget you're far from the ground when using the ORM. You have the ORM API itself. You have Django's implementation. You have Python. You know, maybe you misunderstand some way
about how, you know, dictionaries or keyword arguments work. You have lots of layers when you're using the ORM. And then you have SQL itself, and you have your specific database backend, which might have its own quirks, might, you know, manage indexes in a particular way, might do things slightly differently,
load things from, return results from cache that you're not expecting, things, you know. And that depends on your configuration. So you have a lot of layers here. So one thing that I really like about Django is the query set, the programming interface
that query sets implicitly give us, and, explicitly give us. I mean, we can use, we can use the query, we can define query sets, we can do many things with them, we can tell them we want certain things to happen when we define query sets. And so query sets have two features
that's worth keeping in mind when you're thinking about them. One is that query sets are lazy. And the other is query sets are immutable. So this is a review for people who might know this already. So what do these terms mean? Lazy means that a query set doesn't evaluate until it needs to.
So many people, you know, don't get the, you know, really don't realize this. And when you do realize it, it's a great aha moment because you're realizing what's going on behind the hood. When you define a query set, you know, you know, model.objects.all, what you're doing is you're defining an instruction which is deferred.
It doesn't get executed until it needs to. And there's certain cases when it does execute and many cases where it doesn't execute. Query sets are also immutable, which means that if you chain them together, you say, you know, I have a query set and then I say .filter, what's returned from that operation is a brand-new query set. However, that new query set, in as much as possible,
inherits all the features and the properties that you wanted of your old query sets. You can chain filters together. You return a query set. You apply conditional, have another filter that only happens in the case of that conditional. You can do all sorts of things. And every time the query sets are returning
and you're getting a new one. So you don't have to worry about the references to the old query set. You can store one, you know, return one to another, send one to another function. So query sets are always new. And so, you know, the quick examples here,
you know, each of these operations returns a new query set and does not actually hit the database. So worth keeping in mind that what's not happening, and it's the same level as what is happening. These, however, are situations where query sets are evaluated.
There are a few of them. They're in the Django documentation. These are just a sample of three. You know, there are other ones also. Query sets are evaluated when you make a list out of them, when you use an index or range operator, and when you iterate over them.
But the interesting thing to think about here is that you can define a query set very, very early in your view function. It's only at the for loop, for example, in the third case here, that the query set is actually evaluated, the SQL is sent to the database server. So, anyway, things to keep in mind when dealing with query sets. Okay.
So I want to highlight some patterns which are problematic. And oftentimes people don't even know that the pattern is problematic, because they're really doing things that look very straightforward. I'm writing my view function.
Everything looks great. I mean, you know, everything looks very straightforward. It looks like I'm using the query sets the way I'm expected to. But actually, when I understand what's going on, certain things, it's easy to miss. So let me kind of point these out here.
So we have a sample app, which is a blog app. It's kind of boilerplate. And I've taken away, like, various other attributes to these models. These are just the relations between the models. And one of the features of Django model objects is that when I act...
Well, I'll just go through the examples so we can see. So we have basically a blog post and a comment on a post. Submitter, which is a foreign key to the user model. Post, which has a foreign key back to the blog, so you're posting on a blog.
And, you know, likers, many of the people who like that post. And comments on the post, which are done by a particular user. And, you know, the foreign key back to the post that they're the comment on. Pretty basic model.
Pretty straightforward. So here's a view, which can be used to just generate a list of the blogs that are on the site. Maybe this is a management view or something like that. Looks really simple. Here's the template that renders that view.
Looks really simple. We have a, you know, a reference to the blog detail page. We put the blog name there, and we say submitted by the person who submitted it, right? Very simple. This seems like just just home run, like, really easy thing to write, push it to production, get this thing working. And it generates something like this,
okay, as would be expected. All right, so, and I just want to talk about the detail view for a little bit. The detail view is a way of seeing a particular blog, and I, you know, get the blog or 404 if the blog
doesn't exist. I request the posts that are for that blog, and then I render the blog detail HTML. This is a little bit more complicated because there's a lot of different features that maybe my users want to see. They want to see who likes the blog.
They want to see the comments that are on the blog. And so this is a little bit more complex. So, and it generates something like this, okay, you know, I just used some sample, Lorem Ipsum generator. So, you know, we have comments, we have people who liked it. Okay, so this looks a little bit more complex
in terms of how this is generated. So, now, as general developers, we're left thinking, okay, I've done my job, right? I've gotten something out. I've completed, completed the work, right?
It works, right? But really at the end of the day, as I said, the GenGram is a tool, and the tool is functioning a certain way. One of the main interfaces that we have towards our data is through the SQL query language. And Django's generating,
the ORM is generating SQL. So how do we find out what SQL is being generated in a particular view? And the problem is if you can't measure it, you never know if there are problems, right? You know, how do you know? There's information lacking. And we can theorize
based on what we think it's doing, right, this magical, I know what the ORM's doing. Like, you know, it's just super easy. You know, I'm using .all, so it's doing my query there, right? No, it's not. There's lots of false assumptions we can make. So really, in general, and this goes not just for Django, it goes for most tools and most technology,
you need to measure and you need to get an actual printout or a display and transparency into how the tool's functioning and what it's doing. And as precise as possible. The more information you can get here, the more power you have as a developer. I think everyone would agree with me here.
So how do we get this information? You know, we can't be naive with our tools. How do we get this information? Well, one really cool project is the Django Debug Toolbar. If you're not using it, I highly recommend you use it or have some way of getting information that's equivalent to the kind of things
that this provides out of the box. So I'm not gonna go over how to install it. They have good documentation. You can do that. It's relatively straightforward. So let's use the Django Debug Toolbar to take a look at our pages. So first we'll take a look at the blog list page.
So here's our blog list, and notice that one of the cool things that the Django Debug Toolbar is it gives us a panel, which is... it basically injects HTML, JavaScript into the page, and also the data about the execution of the page. And it gives us these tabs that we can click on. So we need information about, you know,
the timing that it took to generate the HTML, details about the settings, the HTTP headers, the request object. Okay, these are... I can speak for a long time about each of them. I'm gonna focus for this talk on the SQL. And notice that there's 52 queries being generated.
And being executed to generate this... to actually generate this page, which seems like a lot for just a list, right? So, like, we did the thing, right? We used the OM. It looked really straightforward. We used it, and what happened here is
we actually... we didn't realize it. We generated quite a lot of queries. And this is a bad thing because there's a network round trip that has to happen. The server has to send the query. And also the... receive the response. And the database server has to process
each individual query. So there's a lot of extra overhead for having all these things happening. And, sorry, if we click on the SQL button, the page actually... this Django debug tool actually gives us a list of all the queries that happen,
which is kind of cool when you think about it. Here it is. Right here, visualization. It gives us a timeline. I cut off a little bit more of the things just for the sake of displaying, but you should definitely check it out. So, you know, now our next question is, great, we have the queries here. Where are they being generated? And actually you can click on the plus
and you can get that information. The plus to the left side will give us the full query and also the actual line of code where the query is generated. And I should say here, this is not a query set that's generating it. That's already happened. In this particular case,
these individual select statements are being executed through the foreign key, blog.submitter. So Django, the query sets are lazy, and also the model objects are lazy. The foreign key relations aren't gonna be accessed, aren't gonna be loaded, until they need to be or if they need to be. So that's why it saves in terms of an optimization.
Maybe you never reference the submitter object, okay? And so you don't need to have information about the auth user model object. So here it is. Blog.submitter is being referenced. That's generating these many, many, many individual queries, and the more objects you have, the more queries you have.
So this is actually a very bad situation, and there should be a way to do this much easier. So this is where select-related comes in. Select-related works by creating a SQL join and including the fields as a related object in the individual select statement for the model itself.
So the auth user is gonna be joined with the blog objects in this case. So SQL's gonna return back the blog multiple times as many auth users as there are. And you think, well, is this bad?
Well, it's actually much more efficient, because you're batching it. You're doing it in one shot, and so there's only one network back and forth to get that information. So it's actually much better, much fewer queries. So it gets a related object in the same database query. And, however, to avoid much larger results set that result from many relationships,
so like a many-to-many field as opposed to a foreign key, or maybe a reverse foreign key where you're getting the opposite side of the relation, select-related won't handle those cases. It'll just handle... Think of it in terms of foreign keys and one-to-one fields. It's helpful, and when you're gonna be iterating over them in a loop. So here we use select-related.
We tell it we want to select the submitter of the blogs, and you can have commas and have multiple fields, which it will join together and get all that information. And so now we do it, and notice two queries,
and there they are, okay? And notice the inner join on the query. So we've just completely reduced the number of queries, and even if we have much, much more users and much, much more blogs, this isn't gonna increase. It's just gonna be two queries. So this is really, really great optimization and a great tool,
especially when you're using loops. So, all right, the blog detail page now is a little bit more difficult. So now notice we start with 44 queries, and if we take a look at the queries in this case, it's a lot less straightforward, right? You don't just have a list. And what I usually do is I start from the top,
open my code, and I actually go down and see where these things are being invoked, and I just try to be intelligent, try to identify patterns, patterns are good. The same thing's happening over and over again. Might be a group of things happening over and over again. And in this case, I'm just gonna reveal it for you here. The patterns that you identify if you take a look at these things
are that there is a submitter foreign key being accessed on the comment, okay? So now we're talking about comments that are being generated and who submitted that comment. And there's also a query on the likers attribute, which is, if you remember, is a many-to-many relation. There's people who like the post. So we have comment.submitter and postLikers.
So prefetchRelated is an optimization that's used for this case. And instead of using an inner join, what Django's gonna do is it's gonna prefetch those relations, those many-to-many relations, and then it's gonna do the operation we expect of our model, and it's gonna join them in Python.
And that's actually more efficient than having a database do it for you. So there's a reason why selectRelated doesn't do this by default. PrefetchRelated is gonna prefetch those many-to-many relations, because it could just result in a massive, you know, result set. So it's much, much better to use it this way. So you remember, prefetchRelated is for a relation that has many-to-many
many-to-many related members. It does a separate lookup for each relationship, does a joining in Python, and this allows it to prefetch many-to-many and many-to-one objects in addition to the foreign key and one-to-one that is in selectRelated. That selectRelated helps you use,
helps you manage. And it also supports generic relation, generic foreign key. I don't know what version of Django that is, that it does support that. I would check that. But you can use it for those kinds of things. So we put it in here. You can use double underscore syntax. In this case, this is the most efficient
set of arguments that, you know, that will help us in our particular example. There might be more that I overlooked. I didn't spend that much time trying to find the optimal optimal. But this simple optimization, this one line of code is gonna save us.
So if you see it's gonna prefetch through the comments table and get the submitter, it's also gonna have the comments information, which is gonna be helpful for us. It's gonna implicitly do that. And it's gonna select the likers. And one line and we're done. Well, you know, we only have seven queries now.
I'm sure you can get this even smaller. But, you know, one line, just amazing amount of work that you can do. If you notice, you know, it prefetches certain things before others. So, you know, it's just really, really helpful pattern to think about and to bring into your own toolkit
as you're working to make your views better. So in summary, the query set API method select-related and prefetch-related automates some of these best practices to avoid extra queries in views and in templates. And you select-related for one-to-many
or one-to-one relations and prefetch-related for many-to-many or many-to-one relations, including the reverse foreign key relations. So, yeah, that's basically it. So thanks. This is pretty straightforward. I hope this will allow you to start using these tools more effectively.
I have a code on GitHub, so please clone it. And, yeah, thank you very much.