We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

I Didn't Know Querysets Could do That

00:00

Formal Metadata

Title
I Didn't Know Querysets Could do That
Title of Series
Part Number
9
Number of Parts
52
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
QuerySets and object Managers are a core part of Django, and can be extremely powerful. But I didn't always know about some of their more advanced capabilities. BASIC METHODS You have likely used filter(), exclude(), and order_by(). You've even probably used an aggregation method like Sum() or Count(). Less common, however, are query(), only()/defer(), and select_related(). F EXPRESSIONS / Q OBJECTS For some more complex queries, those basic functions and filters won't cut it. How do you construct a query that needs to check for field A or field B? What do you do if you need to multiply two fields together and then sum them? Look no further than F() and Q(). RAW SQL / THE EXTRA() METHOD As a last resort, it's entirely possible to use raw SQL queries to get the database results that you need. The sky's the limit, but there are definitely downsides to this approach; pitfalls include SQL injections and database backend portability issues. MANAGERS A talk on QuerySets would be incomplete without mentioning Managers, and how to leverage Manager customization to make your life easier. Writing methods on existing Managers, and creating custom ones can go a long way towards being DRY and reducing the potential for errors.
13
Thumbnail
42:32
Set (mathematics)Query languageWordWeb 2.0Object-oriented programmingOrder (biology)PRINCE2DatabaseEndliche ModelltheorieLetterpress printingRight angleSequelInstance (computer science)InformationMultiplication signIntelligent NetworkShared memory1 (number)Moment (mathematics)Point (geometry)Electronic mailing listReal-time operating systemLine (geometry)Reverse engineeringField (agriculture)Product (business)Water vaporExpressionCategory of beingMappingResultantParticle systemTheoryWeb pageNetwork topologySocial classExpert systemForm (programming)NumberParameter (computer programming)Black bodyMathematical optimizationTable (information)Key (cryptography)Functional (mathematics)AverageCountingBitCustomer relationship managementElectric fieldPhysical systemFitness functionMereologyMobile appDampingExecution unitSummierbarkeitStack (abstract data type)Buffer overflowRevision controlData dictionaryFunctional (mathematics)Arithmetic meanData structureLoop (music)Exception handlingType theoryIntrusion detection systemTemplate (C++)QuicksortStatement (computer science)UMLComputer animationLecture/Conference
Type theoryPurchasingElectronic mailing listField (agriculture)Right angleOrder (biology)Water vaporTotal S.A.Social class1 (number)Source codeQuery languageRelational databaseObject-oriented programmingMultiplication signProduct (business)UsabilityDatabaseForcing (mathematics)Functional (mathematics)SpacetimeSystem callDistanceCASE <Informatik>Set (mathematics)Customer relationship managementArray data structureLine (geometry)WordMappingElectric fieldDatabase transactionTerm (mathematics)Insertion lossExpressionEmailOperator (mathematics)Complex (psychology)TheoryConnected spaceInstance (computer science)Self-referenceTrailResultantSummierbarkeitCodeView (database)Inverter (logic gate)Odds ratioMetric systemProcess (computing)Atomic numberExecution unitCondition numberClosed setBoom (sailing)Single-precision floating-point formatFunctional (mathematics)Boss CorporationEndliche ModelltheorieLoop (music)Intrusion detection systemData conversionResponse time (technology)Computer animation
CASE <Informatik>DatabaseTheoryRevision controlRight angleNumberFunctional (mathematics)SequelSingle-precision floating-point formatCountingWrapper (data mining)Function (mathematics)Line (geometry)Equaliser (mathematics)ExpressionEndliche ModelltheorieOrder (biology)Functional (mathematics)Table (information)DataflowEmailSign (mathematics)Portable communications deviceQuery languageDifferent (Kate Ryan album)Object-oriented programmingCondition numberCausalityNeuroinformatikExtension (kinesiology)Selectivity (electronic)Game controllerFormal grammarParameter (computer programming)Social classBitGoodness of fitExecution unitField (agriculture)Electric fieldIterationScaling (geometry)Product (business)Attribute grammarTraffic reportingProjective planeCodeIntegerString (computer science)Multiplication signLatent heatNetwork topologyProgrammer (hardware)Electronic mailing listRelational databaseCustomer relationship managementPhysical systemBeat (acoustics)DampingRaw image formatView (database)Point (geometry)WritingValidity (statistics)InternetworkingResultantType theoryHeegaard splittingComputer configurationInjektivitätCorrespondence (mathematics)MereologyFront and back ends2 (number)ForceCore dumpComputer animationLecture/Conference
MathematicsDifferent (Kate Ryan album)Statement (computer science)Category of beingEntire functionQuery languageNumberMusical ensembleCodeEndliche ModelltheorieLogicInstance (computer science)MereologyPredicate (grammar)Right anglePattern languageProgram slicingSound effect1 (number)Noise (electronics)Multiplication signSelectivity (electronic)BitNeuroinformatikFrequencyQuicksortArtificial neural networkQueue (abstract data type)Disk read-and-write headDatabaseFunctional (mathematics)Set (mathematics)Roundness (object)Limit (category theory)Computer animation
Computer animationXML
Transcript: English(auto-generated)
come on hey everybody thanks for coming i didn't know query sets could do that is something that i've thought or said to myself a number of times
over the last few years while learning the ins and outs of django and i was hoping to share some of those moments with you guys today but first you know why is this important why should you care about the words coming out of my mouth if you're a devout reader of the documentation then to be honest this talk might not
be that interesting for you but if you're more like me in that you read just enough of the documentation and or stack overflow in order to solve the problem you're currently facing and then you put it back down then it might be worth a listen right a lot of the stuff that a lot of these solutions that i've encountered i've only encountered them because they
were relevant to what i was working on and simply knowing that i could have done something in native django would have gone a long way towards figuring out how to do it in native django and even if you're fairly advanced a lot of these concepts have only been fleshed out in the last few versions i know for a fact that at least a couple of these examples
could not have been done before 1.8 because in order to do them i was hacking around private internal functions i want to offer practical examples in the hopes that you can use them towards your own code and achieve if not best practices at least better practices that all having been said
let's dive in just as a sanity check for everybody query sets are the things that are returned when you do a model query in django right so every time you say article or user.objects.all or article.objects.filter you're working with query sets later on in the talk i'm going to run through some examples of lesser known
query set behavior and those revolve around a pretty basic e-commerce app right everybody likes making money just to be on the same page i'm going to take a look at some of the rough model references that we're going to be using we've got products with names prices and sellers orders right and each order has many
items so each item has two foreign keys a quantity and a float unit price hopefully that'll make sense now if you've done a django app that's even slightly more complex than a to-do list you've probably run into some of these methods right the ones on the left return query sets
meaning that you can chain them you can filter and then exclude and then reverse right the ones on the right return something else that could be a model instance that could be a boolean value that could be a dictionary full of stuff i don't want to spend too much time here but there are a couple of things i want to point out about annotate and aggregate
right they are somewhat similar they both accept an expression here we're using the count and sum classes to define the expression that we want to compute but annotate will compute that expression for each item in the set right and it returns each item with a new property storing the result
whereas aggregate will compute the expression across all of the items in the set and it returns a dictionary with the final value you know with annotate you can add some new value and then use it later on in the filter expression and if you provide a keyword argument then that becomes the name of the
property or dictionary key but if you don't provide one django will automatically generate one by using the name of the field double underscores and then the name of the function right kind of a subtle point to note is that actually in the top example we are counting all of the items
despite the fact that the product class if you recall didn't have an explicit foreign key for an item on it right which means that here django is implicitly traversing the reverse relationship counting up all the items and then adding that as a field on our final query set of products and of course if you aren't using these functions to count stuff to some stuff to average
stuff you absolutely should it varies a little bit based on your underlying database management system but they tend to be significantly faster than writing a for loop by hand or using even python's built-in sum function cool so beyond those more common ones there are some lesser known query set
methods right these generally revolve around optimizations or you know interacting with the underlying database table structure you may or may not know that when you access a foreign key on an object django goes back to the database there's an extra query in order to populate the foreign keys
information right so when we print the seller id on a product it takes not one but two queries in order to execute that print statement but and i should note the same thing happens you know with many-to-many and many-to-one reverse relationships right not just foreign keys but we can use select-related
to compress that down to one right select-related takes in a list of arguments that it will go ahead and join in the underlying sql allowing you to save time later when accessing it the big limitation of select-related is that it it's limited to foreign keys
because it's doing sort of the joining in the underlying sql but it is very similar to prefetch related right which allows you to do the same thing with reverse lookups and many-to-many fields right prefetch rather than doing it all in the sql it will do the joining in python but you get to do it up front once rather than many times ad hoc like you might do when rendering a template
but what if you have the opposite problem right what if you don't want more information from the database but less this could be for example because you have a model with hundreds of fields on it and you only need to access one or two then you can use only which as the name implies will
only fetch the explicitly listed fields if you know which ones you're need you can optimize your lookups by grabbing those obviously though if you access other fields outside of the ones that you ask for you kind of lose the performance benefit because you have to go back to the database anyway defer is the other side of this coin
it retrieves all of the fields on a model except for the given ones you might want to do this because you have a field type that's particularly expensive to convert to native python maybe it's some geospatial data maybe it's something even more custom than that and so defer lets you punt on
having to do that processing the last one of these methods i want to touch on is in bulk mostly because i recently discovered it and i think it's super cool you pass it a list of ids and it returns a dictionary mapping each id to the associated model instance to me this is interesting because there's a type of query that i run into a lot
right suppose you want to get all of the products ordered in a given month you'd probably start by filtering for all of the items ordered in a given month you would use values list right flat equals true and distinct in order to distill that down to a list of product ids and then you can pass those to in bulk in order to just rapidly
generate a mapping between the two the other way i've done this is using filter right you say filter product id in list that also works as far as i know these are kind of the cleanest ways to do this lookup but if anybody knows a better way to do it please come talk to me after the talk for whatever reason i end up writing this style of theory a lot and i would love to
reduce my lines of code now an interesting thing about the filter and exclude methods that we saw earlier is that when the underlying sql is generated it's always generated using and clauses right so if we look at this example the resulting sql is that we select all of the orders where the status
is shipped and it was ordered yesterday that begs the question what happens if i want to get all of the orders where the status is shipped or it was ordered yesterday right to answer that question i want to look at not searching orders but searching users suppose your customer support team
right says we really want a flexible search view where we can just type in a keyword term and it will return all of the users you know across first name last name email and we can use them for our ticket
right the brute force solution is to write three queries you know first name last name email convert them to lists and then concatenate them together right and this solution while it does work is not the most efficient um you're hitting the database not once but thrice
once for each field and then you are spending time not just converting each of those query sets to lists but then putting those lists all together and lastly you know your final object type is a list not a query set which means that you can't filter on it further you can't order it you can't remove duplicates easily
and this starts to break down pretty quickly right requests will start timing out without you know too many results that you're trying to return and like i said if you had extra constraints that just adds to the response time even more but fortunately we have q and q are these great objects they encapsulate
query constraints right and you can combine them with various operators like and or and not right and so if we were to rewrite that previous query we would do it by combining creating three q objects right one for each of our fields one for each of our constraints and then joining them with a bitwise or operator
right and you can see what the underlying sql is so you know we get a query set we can order it by last name we can you know ask for distinct and best of all it all reduces down to a single sql query boom theoretically you can create as complex of q expressions as you want as long as they can be combined with bitwise ors and
nots i suppose sales comes to you and they say we really want you to build us some dashboards right we need to track our metrics our kpis so you decide to write a function called item total that takes in a set of line items and returns the total right at first you might be saying oh
totally know how to do this easy peasy let's just use a sum aggregate across our query set right and then get the value out of the resulting dictionary but let's recall for a second how we define them the above approach totally works if
we had a total price field rather than a unit price but because we structured it this way we are not taking the quantity into account for each of our line items it might be tempting to just say screw it you know we'll write a total price method and then we'll use a list comprehension and a sum to get that total and this works right
it totally works but if we're taking our own advice about using annotate and aggregate where possible we can do better that's where f comes in while q represented a query constraint f represents an implicit reference on a model field or database column
if that if you don't know what that means that's fine it it took me forever to understand what that means but let's look at it concretely right for this example we're rewriting that product and sum as an expression right but rather than saying so for example if we did it with a for loop we would say for item in items
item sum plus equals item dot unit price plus item dot quantity or times item dot quantity excuse me but we can rewrite that using a sum class and using f in order to implicitly create this expression and without having to access any individual item instances compute the aggregate sum hopefully that makes sense and let's get
one more f example you know this time just to really illustrate that that self-referential aspect suppose now your manager says you know charlie sales are up everything's great but i want you to
add a field to our product so that we can keep track of how many times each one has been purchased right and at first you're like that's dumb why would you do that but it's your boss so you have to do it anyway and the for loop to do it right is extremely simple and to be honest we can't really do much better than this in terms of performance but while we can't save much time we
can save space and using f in this case can actually avoid a race condition right so what's happening is we're using the update method on a query set to update all of the values at once and then we're using f to implicitly refer to the purchases field on each of the
products incrementing it by one and putting it back this converts down to a single atomic database transaction right so in the above example it's it is possible if you had two processes trying to do this at once that one could clobber the results of the other but if we use update and f we avoid that race condition now as it
turns out f objects are just a single example of a much more general jango class called query expression you might know some of these from the aggregate functions right but they actually have a very close cousin in database functions so the ones on the
right you know while performing a query you can include and they will inline concatenate two fields together compared to and return the smaller convert to upper or lower case right and some of you might be thinking hey those look like a lot of sql functions that i know right that's because they are sql functions you can
subclass the database function class which simply takes a list of arguments and then the corresponding sql function to apply them to right from the top example without having to iterate over any products we're converting all of the names to lower case and adding that right to each individual product and in
the bottom example we're taking all of our users using the split part function in sql to break the email based on the at sign delimiter and just grab the second of that split hopefully that all makes sense the the value class that you're seeing there is simply a wrapper around raw python values right to help the function class make sense of everything that's
coming in so some more you know query expression subclasses f's aggregates funks and values we just saw expression wrappers we'll touch on in a minute and conditionals actually are a really powerful subclass that allow you to implement conditional logic inside of a
query right so you can say add this computation add this extension if a certain case is true otherwise add a second completely different computation and these are these are pretty powerful but if you find that you need to write your own query expression it's actually not that bad you only have to define four or so methods as sql
get lookup get transform and output view if you need to write if you need to customize the sql output for postgrease or mysql or sqlite you can define an as vendor name function right where you replace vendor name with the name of your desired backend since writing your own expression is a fairly advanced exercise
i'm not going to delve too deep into what exactly these entail but rather i'm going to point you to the excellent django documentation and leave it as an exercise for the listener it is at this point that i do have to admit one of our previous examples wasn't exactly correct that's my beat technically speaking this code will
absolutely work if unit price and quantity are the same type of right if they're both integers django can say okay an integer times manager we probably want an integer back out but because unit price is a float we have to explicitly tell django what field type we want to come back out
and as you can see we're invoking the expression wrapper class to do this right the final code ends up being a little bit more verbose but you get some more fine grained control over it we're specifying our output field as a float field in this case so that we can coerce it to a float now hopefully ideally you're convinced that you can execute
90 something percent of the queries that you need in django natively right but on the same chance that these haven't solved your problems you can roll your sleeves all the way up and start writing some bare bones sql as a side note i just want to say one quick thing if you're writing raw sql you should probably rethink what you're
about to do now this isn't to say that you should never write raw sql right there are plenty of valid use cases where it's the right thing to do and there are absolutely performance and convenience benefits to be had however there are probably different more maintainable ways of reaping those same benefits if you're
looking for speed increases consider refactoring your code to be more efficient right maybe using some of those query set tricks we just learned uh you know or introducing caching when it comes to convenience you could write raw sql or you could consider restructuring your models and your tables to take advantage of more than native functionality writing your own sql you know while
great in some situations means that you lose portability options when it comes to changing database management systems and of course if you're using dynamic values you start opening yourself up to the possibility of sql injection anyway now that my psa is over um the first way that you can start writing sql is the extra method
right it lets you inject specific clauses into a query sets generate sql you should be aware that this method while not currently deprecated is planned for deprecation so if you really need to use it make sure you file a ticket with the django project so that the core devs are aware of your use case and can ideally try to build some
native functionality to take care of it so in this example right we're introducing an is recent attribute into our select clause and the generated sql looks like that pretty straightforward there are a bunch of different clauses that you can work with using extra you've got select where table is ordered by and if you need to start
escaping dynamic parameters for the select clause you can use select params and for all the others you can use regular params i have no idea why there are two different keyword arguments for these but there are a number of django core devs floating around so i would suggest that you ask them
and if this really truly has not solved your problem you can just write some raw sql dot raw will just take a string pipe it out to what's beneath and send you back the result this example is very trivial and should not at all be you know done using raw sql for a couple
reasons one i was too lazy to come up with a more complex one and two i am employed as a python programmer and not a dba so let's recap right we had our basic methods where we can annotate and aggregate where possible our advanced methods where we can reduce joins with select and prefetch related and use only and
defer to partially fetch data our q objects which were encapsulated query constraints our f objects which were implicit field references our database functions for some average count you might know compact ready to lower you might not query expressions which are crazy powerful and see the docs if you need to
write your own and raw sql right with extra and raw my name is charlie you can find me around the internet usually at charlie r guo please note the r there is another charlie who does not use the r and he gets a lot of my email i write stuff this one time i wrote a book it's called unscalable
it's a collection of interviews with startup founders on the theme of do things that don't scale that's my shameless plug and thank you how am i doing on time yes so we have about five minutes for
questions if you have them please use the microphones at the ends of the aisles hi i had a quick question about a query that we had some trouble with something when we had refactored due to some changes that we were going through where we had to introduce an or statement into an already complex query so we had an or statement which then
we fed into a filter with a bunch of other ands and by the time we were done with this query it was so long running because the data sets were so large that we had to break it up into multiple smaller queries to be able to have it run effectively and not time out all the time would you have any suggestions for how we might be able to handle that and be able to do that in a single
query set yeah again gonna emphasize that i'm not employed as a dba but i would say that it is possible to do some things where you essentially like use the cues and then stack filters on them is that what you guys originally were doing or so we we took the cues and ordered them
into one larger queue essentially and then use that in a filter with other criteria as you know andy and all of that together and it was just massively long running so i don't know if like we did something wrong there if there was a way we could have reorganized that so it would have created a more efficient query
off the top of my head i'm not sure probably the way i approach it is to just for on each query set you can print i query and it will spit out what the signal is actually generating so figuring out from there what to refactor and what to break down thank you that was great um my question
is i ended up in a situation where i needed to query the same model imagine that different categories right but i wanted to have two slices let's say first one 10 items instances and and the other one five is that even possible i don't know if i understand the questions you need
one model imagine article and i did i i need 10 items from one category many too many right and five items from another category in one query set oh so you you have two queries and then you want to join them together
essentially or if you know so if you know for example the two that you generate if you can generate a 10 and then you can generate the five right if you have the code to do that you could put them both in the queues right and then use a queue to add them i think i was actually using the queue for the categories but
the situation was like once you take a slice you can't actually have another slice that was a problem i don't know if that is actually sliced it and then yeah i would figure out a way to do without the slice if you know which ids you need right like if there's a limit if you can
order by id and then just take the first 10 this would be a good place where those extra database functions would come in handy right because you could say um you know if this is greater than nine other ones right then stop but i would suggest i can take a look if you want if you want after the talk it just uh it seems to be
a bit weird that having two different questions for the same model you know just to get items to differ from different categories kind of thing so i didn't i ended up having two questions at the end thank you i just wanted to clarify on your last example there
um so when you do a raw query you can add any number of computed columns on and it'll just return them back as if they were part of the model uh if you i believe if you use an as yeah it just
sort of applies them as properties you'll just have new properties on your model that are sort of temporary properties just from your query i believe so but check the documentation okay uh thank you a big round of applause for charlie