We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How to sort anything

00:00

Formal Metadata

Title
How to sort anything
Subtitle
Keeping your data organized with "sorted" and custom functions
Title of Series
Number of Parts
130
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Sorting is one of those things that we take for granted in Python. The built-in "sorted" function knows how to sort any iterable of objects that are themselves sortable. But hiding behind that simple description is a great deal of depth. In this talk, I'll go deep into what it means to sort, and how we can sort any collection of Python data. We'll see how you can use custom functions to sort built-in data structures in new and interesting ways. And we'll see how you can design your own custom classes such that they will sort in just the way you want. After watching this talk, you'll have a better understanding of sorting, built-in data structures, function objects, and how "magic methods" affect the our Python classes. Moreover, you'll be able to write clearer, shorter, and more easily understood code. Topics I'll address in this talk: - "sorted" and Timsort - Sorting a list of simple structures - Reversing the direction with "reverse" - Custom sorting with "key" - Stable sort - Sorting a list of dicts - Using "lambda" - Using operator.itemgetter - Sorting a list of named tuples - Sorting a list of objects - Making your object sortable - The functools.total_ordering decorator This is an intermediate-level talk; I'll assume that anyone attending knows how to write functions, classes, and methods.
Axiom of choiceGoodness of fit2 (number)Meeting/Interview
Multiplication signNewsletterRight angleE-learningAuthorizationMeeting/Interview
InformationVideoconferencingProduct (business)Electronic visual displayWorkstation <Musikinstrument>Electric currentQuicksortElectronic mailing listDefault (computer science)Sorting algorithmLetterpress printingFunction (mathematics)Source codeAlgorithmInsertion lossPairwise comparisonSequenceNumberOperator (mathematics)String (computer science)QuicksortElectronic mailing listDefault (computer science)n-TupelSequenceInsertion lossKey (cryptography)Data structureCASE <Informatik>Sorting algorithmJava appletString (computer science)NumberProgramming languageSelf-organizationReal numberLetterpress printingCombinational logicObject-oriented programmingVariable (mathematics)Computer fileFunction (mathematics)2 (number)Source codeWordMultiplication signNatural numberOrder (biology)Right angleLibrary (computing)1 (number)Front and back endsHeegaard splittingProduct (business)E-learningPurchasingVector potentialDifferent (Kate Ryan album)SpacetimeFunctional programmingWave packetNewsletterEqualiser (mathematics)TupleSemiconductor memoryTable (information)BitIterationMathematicsSoftware developerCovering spacePerspective (visual)Equals signPairwise comparisonComputer animation
String (computer science)CodeUnicodePairwise comparisonPrice indexEquals signTupleElectronic mailing listSequenceRevision controlAlgorithmType theorySlide ruleComputer fileModule (mathematics)LengthFunction (mathematics)Parameter (computer programming)Sorting algorithmAbsolute valueNumberAttribute grammarElement (mathematics)outputNumerical digitFunctional programmingOrder (biology)Absolute valueElectronic mailing listNumberRule of inferenceAttribute grammarFunctional programmingWordEqualiser (mathematics)CASE <Informatik>String (computer science)Subject indexing2 (number)Negative numberWell-formed formulaQuicksortType theoryError messageDifferent (Kate Ryan album)LengthDigitizingParameter (computer programming)outputKey (cryptography)CountingPoint (geometry)Letterpress printingElement (mathematics)BitResultantObject (grammar)CodeSummierbarkeitTupleArithmetic meanInstance (computer science)Goodness of fitTypprüfungPairwise comparisonSensitivity analysisBoolean algebraInterior (topology)NP-hardSocial classUnicodeDirection (geometry)ASCIIFunction (mathematics)Disk read-and-write headSequenceIntegerRight angleEvent horizonMultilaterationWeightComputer animation
Sorting algorithmNumerical digitNumberWordTotal S.A.StatisticsOpen setDialectFunction (mathematics)n-TupelLambda calculusOperator (mathematics)Object (grammar)Letterpress printingPairwise comparisonCodeSocial classExecution unitoutputCountingWordGame theoryElement (mathematics)Arithmetic meanNumberElectronic mailing listSorting algorithmFunctional programmingFunction (mathematics)WritingIterationKey (cryptography)Letterpress printingQuicksortPhysical systemSocial classSummierbarkeitObject-oriented programmingMathematicsComputer fileObject (grammar)Order (biology)n-TupelAverageAttribute grammarCodeDigitizingOperator (mathematics)LengthPoisson-KlammerModule (mathematics)CASE <Informatik>Lambda calculusEntire functionHecke operatorNumeral (linguistics)Normal (geometry)StatisticsLine (geometry)Data dictionaryInstance (computer science)Pairwise comparisonDefault (computer science)outputMultiplicationSquare numberDirectory serviceParameter (computer programming)BitError messageResultantOnline helpRight angleSoftware testingMoment (mathematics)Product (business)TupleSlide ruleOcean current
Object (grammar)Functional programmingNoise (electronics)Different (Kate Ryan album)DatabasePivot elementMultiplicationDuality (mathematics)Subject indexingComputer configurationSorting algorithmQuicksortFunctional programmingBitSocial classLine (geometry)Semiconductor memoryEquals signAttribute grammarMoment (mathematics)Frame problemRelational databaseData structureSpreadsheetKey (cryptography)Pascal's triangleObject-oriented programmingComputer animationMeeting/Interview
Transcript: English(auto-generated)
All right, I think we're ready to go. There we go. So we are live. Welcome back. I hope everyone's got a fresh cup of coffee or whatever your beverage of choice is. Welcome to the afternoon of the second day of EuroPython
2020. And good morning to everyone in the Americas, myself included, where it is currently 6 AM on the West Coast. And of course, good evening to everyone joining us from Asia. So we've got some awesome talks lined up for the next few hours. And one of these is Reuven Lerner,
who teaches Python, data science, and Git to companies around the world, offers many online courses, including weekly Python exercise, community-based way to improve your Python fluency over time. He writes two free weekly newsletters. He's the author of Python Workout,
co-host of the New Businesses Freelancing podcast. And he's going to be talking to us about something that I rather enjoy, sorting, especially sorting in Python. So this is going to be fun. Thank you, Reuven. Welcome. There you go. All right. I'll turn this over to Reuven. All right.
Well, hi, everyone, all over the world online. Welcome. Shane, we can't meet in person this year for EuroPython, but this is a, I must say, kudos to all the organizers. I've been supremely, supremely impressed by the conference and from the back end of it, from the speaker's perspective, really impressively organized. As I said, my name's Reuven Lerner.
My talk is How to Sort Anything, a few words Jason also said. So I do corporate training. I have all sorts of online courses. I just published my first book with Manning, Python Workout. I actually have it in hardcover. Remember hardcover, folks? That's right. I have my weekly Better Developers newsletter as well, which I think currently has about 18,000 subscribers with a new article about Python
each week. So sorting. Sorting is important. You might say it's sort of important. OK, get used to my humor, folks, for the next half hour at least. Sorting is important. Sorting is something that we do all the time for a whole lot of different reasons and in a whole lot of different ways.
So why do we want to sort? And mixed in together here, I have some more abstract ideas and some more concrete ones. So we might want to display our data nicely. Let's see all of our data in calendar order from January through December or order of years from earliest to latest. Maybe I want to make my messy data slightly less messy. I want to sort of be able to look at it
and understand what's going on. I also want to find the largest or the smallest value in a collection. What was the smallest amount I made in the last year or the largest amount I made from purchase in the last year? We see which products sold best or worst. Which supplier's proposal, assuming you're a company and you've got a bunch of potential suppliers, which proposal will cost you the most?
And then if you're using Waze or another sort of GPS system, you want to find the closest gas station. And do you want to find the closest one? Do you want to find the cheapest one? Do you want to find the one that will take you the least amount of time to get there? These are all things we do with sorting. Maybe you're watching Netflix because you haven't left the house in six months and you want to find a film similar to the one
you've just watched. Well, there you go, we can use sorting for that. Maybe you're on Amazon and you want to find or they want you to find products similar to the one you're looking at. All of these involve sorting. And Python actually makes sorting really, really easy. If you have a list, you can use the sort method. So basically I can define my list here. My list is 10.5 minus three, seven minus two, four.
And then I'm going to print out, I'm using the cool Python 3.8 F string syntax where I can have inside the curly braces variable name equals sign. And this will print out as variable name equals and then its value. So I'm going to print out the value before sorting. Then I'm going to run my list.sort and then I'm going to print out the value after sorting. And sure enough, I get, what do you know before I sorted?
The list is exactly as I defined it. And after I sorted, it's a numerical increasing order. Okay, so far so good. Well, there are a few things to realize about list.sort. It's a list method, so it's only going to work on lists. If you want to sort something else, tough luck. You got to turn it into a list before you can actually run sorting.
Its source from smallest to largest, that's by default. And we'll see how we can change that a little bit. And here is one of the most important things. It changes the list object itself. So what we see here is my list is again, same exact values, get used to them. We're going to see them quite a bit. And then I assign a second variable here, also my list to be equal to my list.
What this means is we have two variables referring to exactly the same list. That's fine, that's not a problem in Python. We can do that. And I'm going to print out what is the value of also my list, meaning the second variable, but I'm going to sort my list. So I'm going to sort via my list. What's going to happen to also my list? And the answer is, before also my list is,
they looked exactly the same. And after, they also looked exactly the same. That is to say, there's no difference between sorting my list and sorting also my list because you're not really sorting the variable. You're sorting the list that the variables refer to. And so this is one of the problems with using the sort method on lists,
that yes, it modifies the list, but yes, it modifies the list, which means all the references to it will see that. Another problem with list.sort is that it returns none. This is a common convention in Python, especially the standard library, that if you have a mutable data structure and you run a method on it that modifies that mutable data structure, then that method will actually return none.
It won't return the data structure itself. So I see this all the time, all the time in my courses. People do this because it feels sort of natural. So we define my list. We're gonna print what it is before. And then what am I gonna do here? So remember that in assignment, it's always right side before left side. So on the right side, I'm gonna say my list.sort. That will actually sort the list.
It will be sorted. But the return value for the method is none. And thus we've assigned none to my list. And so the output for this is gonna be before it's our list and after it is none. So if you're short on memory, this might be a great way to like solve those problems. Otherwise though, not a good thing.
So what can we do to avoid this? Well, we can avoid it altogether. We can ignore it altogether. We can use the sorted method. I'm sorry, sorted function. And sorted is a function that comes, it's built in. So it's in the same library as bill. You don't need to import anything to work with it. It works with all intervals, not just with lists. So you wanna sort a tuple, you wanna sort a dictionary, you wanna sort a file. You can use all of those with sorted.
Now sorted will always return a list and it'll always return it by default, sorted lowest to highest. And here's the greatest thing. It does not modify the source data at all. So if I wanna use it, what can I do? Well, let's start off with my list again. And I can just, I don't need to do a before and after. I can just print sort of my list. And then I'm gonna print my list afterwards
just to compare. And sure enough, the sorted of my list shows me these sorted values there. But afterwards, my list is still the original values. Meaning, sorted will return a new list and I haven't touched, I haven't changed my original list at all. This means that I can sort tuples. How can I sort tuples?
You can't change tuples, right? Because we're not changing anything here. We're just gonna need a new list back based on the tuple. If I sort a dictionary, I'm gonna get the keys back in sorted order, but I'm not touching the dictionary and so on and so forth. Now, how is all this being sorted? Well, because many people ask me like, is it using quicksort, is it using merge sort? Well, I'll give you a hint. The sort algorithm was actually created by Tim Peters
and that's why it's called TimSort. And in case you think that TimSort is this weird esoteric Python thing, it's not anymore. TimSort is now apparently the default sort method used in Java. And I know Java is like not going anywhere, but it does increase the fact that we've got a language here that's actually working,
sort that's working very well. And TimSort is this interesting combination of merge sort and insertion sort, where basically what's gonna do is it's gonna assume, well, it assumes in general that your data is not completely 100% random, that if it's coming from the real world, it's gonna have some natural runs. It's gonna have some sequences that are already sorted inside of it.
So it's gonna take advantage of that. It's gonna create a few runs, hopefully some of them being natural runs, they'll merge them together. What happens if there are no runs? Well, that's gonna create a run. How's it gonna do that? Insert and sort, insert, insert, merge, insert, insert, insert, merge. And that's how TimSort works and it works pretty well. Well, in order to do both the insertion and the merging,
it's gonna need to do some comparisons. And so TimSort needs to know, is A, given two items in our list, in our tuple or whatever, given these two items, is A less than B? Is A greater than B or is A the same as B? And so TimSort is gonna rely on this. Now, before we were sorting a list of numbers, well, list of numbers is gonna work fine because numbers know less than,
numbers know greater than, numbers know equal equal, trivial, that's great. But what if we wanna get a little more interesting? What if we wanna sort a list of strings? So here I'm creating a list of strings, I have my string, I run dot split on it and split of course returns a list of strings. Basically it breaks up our string along white space into a list.
So words here is a list, list of strings. And when I say print sorted words, I'm gonna get out the words sorted from lowest to highest. Well, lowest to highest, what does that mean? Well, we as humans would think of it as in alphabetical order, from the earliest of the dictionary to the last dictionary. So how is Python doing this? Well, it says, well, we can compare one character string.
Remember, there are no really characters in Python, it's all strings. So one character string can be compared with less than or greater than or equal equal. And the comparison is done based on the Unicode code point for that one character string. Now, in case that sounds super like fancy and difficult to follow, you can think of it if you're a dinosaur like me
as just ASCII values, that basically every character is a numeric value. So little a comes before little j comes before little z. So it will sort those alphabetically. Aha, but remember that in Unicode and in ASCII, capital letters come before lowercase letters. And we'll talk about this a little bit later in that.
So things that begin with a capital letter will come before things that begin with a lowercase letter. In any event, so Python first checks, given this by the first checks, the first character of the two strings. So we wanna see, I have two strings, word one and word two, which one comes first? So we're gonna check the first character and we're gonna apply this rule with a less than rule
or the greater than rule, right? Let's compare their code points, let's compare their ASCII values. And we'll know then which one comes first. That's fine if we have a one character string, we'd compare index zero. What happens if that's not obvious? What happens if they are the same? Then we go to index one. What if those are the same? We do index two. What if it's index three? Until we find a difference between them.
This is how we know then if two strings are equal. If they're equal, then all the characters are the same. And well, what do you know? Neither comes first. This also means though, if one is a substring of the other, we're gonna return the shorter string first. So if you have find and finding, so find is gonna come first because it's shorter.
And if this sounds familiar to you, this is how we look things up in a dictionary as I mentioned before. We first look at the first letter, second letter, third letter, we can alphabetize things in this way. The cool thing is that Python does this not just for words, not just for strings, but does this for all sequences. And so when I wanna sort a list of strings, it's gonna use this dictionary,
start index zero, then index one, then index two formula. What about list of lists? It'll also do that. Index zero, index one, index two, and tuples the same way. And lists and tuples implement less than, greater than, equal equal in exactly the same way as well, where they're looking first index zero, then index one, index two, and so forth. Let's just check this. So if I have two lists, list one and list two, and I have 10, 20, 30, and then 10, 20, 15.
So Python is gonna say, if we ask Python, is list one less than list two? It'll say, no, it's not. Is list one greater than list two? The answer is true. Why? Well, check index zero, they're the same. So let's check index one, they're the same. Check index two, up, list two comes before list one. Thus, list two always comes before list one, list one comes after list two.
What if I wanna sort a list containing different types? So here I have my list is 20, B, A, 10, 30, and let's print, sort it on my list. Fantastic, oh, no, it's not. We get an error. And the error we get is type error, less than is not supported between instances of str and int, meaning Python is saying to you, hey dummy,
I don't know, if you give me a string, you give me an int, which is supposed to come first? I have no way of understanding or knowing that. Now, those of you who have used Python two might know that this actually did work in Python two. You were able to use less than, greater than and equal equal in Python two between strings and ints. This was changed in Python three because it was such a hard thing for people to understand
because it wasn't comparing them in a numeric way. It was saying all strings come after all integers. It doesn't matter what their values are. So if you knew the Python internal sort order for different types, it was fine. But clearly no one wants to keep that in their heads. And so in Python three, they said, if the types don't know how to compare themselves
to each other, that's it. We're gonna give you a type error. So don't try to do this or try to do it if you know that it will not work. Okay, what if I want to reverse the direction? What if I don't want from lowest to highest? I want from highest to lowest. So I'm gonna say my list equals 20, 30, 10. And now I'm gonna pass a new parameter. I'm gonna say reverse equals true.
And reverse equals true basically just swaps the value, swaps the order. So basically instead of being lowest to highest will be highest to lowest. And indeed we see here that we get 30, 20, 10. So far so good. But what if I want to sort a list of words, not the words themselves, not alphabetically, but by their lengths? This is where things get really interesting and really cool.
We no longer want Tim sort to be saying, is word one less than word two? Rather we want to be saying, is the length of word one less than the length of word two? Right now we don't want to sort the lengths here. Sometimes when people see this or probably say, oh, what you want to do is use like a list comprehension, turn all the words into their lengths and then sort those. No, no, no. I still want to get back a list of words,
but I want to get back that list of words, that list of strings sorted by length, right? It's like lining up children in a school by birthday or by height or something, right? So how can I do that? Well, we have the key parameter, the sorted function. And by the way, the sort method as well in lists allows you to define a key parameter
and it takes a function. And if we then say key equals F it's gonna say, oh, I'm gonna compare not A and B, but F of A and F of B. And so if I want to sort the words by length, I can call sorted with key equals length. Let's see that. So here I have my words and I'm gonna say print sort of words, key equals length. Notice I'm passing a function as an argument
to the key parameter, right? And this is only possible because in Python, functions are nouns as well as verbs. Functions are objects just like all other objects, except they have this added bonus, you can run them. And sure enough, now we get our words in increasing order by size. So what else can be a key function?
What can I set it to be? Actually anything, anything can be a key function in Python so long as it takes a single argument and returns a comparable value, returns a value that could be compared with less than and greater than for that matter. So here's some examples. If I say sort of words, key equals length, just like we did, I can sort the words by length. I'm using length to sort the words.
I can also say sort of numbers, key equals ABS. ABS is the absolute value. So I'm gonna sort my numbers by absolute value, ignoring whether they're positive or negative. I can say sort of words, key equals str.lower. And here is that case sensitivity that I alluded to a little bit earlier. Basically, if I run str.lower in a string, I'm gonna get back a new string, which is the same as the old one, just all lowercase.
So this allows me to compare the words without taking case into consideration. Meaning it's a case insensitive sort. Now, notice we can pass a method by passing as the class attribute. So I'm not gonna say s.lower or word.lower, assuming that s or word would be the instance. I have to pass it according to the class.
Now, a really common mistake that I see people make is they put parentheses after the key function's name. They're so used to calling functions. So they do something like this, where I say sort of numbers, key equals ABS parentheses. Don't do this, don't do this. First of all, it won't work because ABS requires an argument. And here you're getting an error saying there's no argument.
But second of all, remember that key expects to get a function. It doesn't expect to get the result of a function. So you really wanna say sort of numbers, key equals ABS, just like that. And we will indeed get back the numbers sorted in that way. What if I wanna sort a list of lists? Well, let's say I wanna sort them by length, shortest sublist and then biggest sublist.
Sure enough, I could say key equals len. That's right, just like I did with strings because len works on strings and lists in the same way. What if I wanna sort by the sum of numbers? Well, I can say key equals sum. And then what's happening is we're gonna sum up each of these sublists and then we're gonna order them from lowest sum to highest sum. But I can do even cooler things
if I write my own custom key functions. As long as my function takes an argument, as long as my function's return value is sortable, comparable, then we'll be fine. And remember, the value does not need to be the same type as the input. So I can sort strings by their lengths because the len function returns an integer. So let's see a few examples of this.
So let's say I have a list here of numbers, 500, 2100, 130, 1040. And now I'm gonna say def by digit count of n. Now, this is going to be my key function. This is the function that I'm gonna run on each element and based on its output, I will know, or Tim sort will know how to order things. So what are we gonna do?
We're gonna take a number, turn to a string, get its length. And now I can say print sorted numbers, key equals by digit count. Now I should add something here, which is my personal convention. I invented myself, I think, I hope, is to always say by when it's a key function. So I can then say I'm sorting by this or sorting by that. You don't have to do that, but you'll be really cool if you do.
Obviously your friends will love you as a result. So what happens if I do this? Sure enough, I get 130, 4,500, 100, 2,000, 1,000. I've now sorted the numbers by their digit count. But you might be saying, wait a second, numbers are always sorted by their digit count. And these are not sorted numerically. What the heck is going on?
And the answer is that Tim sort is what's known as a stable sort. That does not mean it only sorts horses. A stable sort means that if two items in the original list, all right, if two items in the original list will have the same sort value, they will stay in that order in the output list. So because 500 came before 100 here,
it will be before 500 before 100 in the output list. And because, well, I guess 2,000 came before 1,000 here, it's also gonna be there as well. So you have to remember it's a stable sort. Normally it's a great thing, but you can be a little confused and surprised here. What if I wanna sort the sub list by their means, right? By the new mathematical averages there. So what we can do is we write key function by mean,
and it'll take one list. It's gonna return the sum divided by the length. And now I can print sorted numbers key equals by mean. And sure enough, we get them sorted by their means from the lowest mean to the highest mean, if you know what I mean. All right. What if I wanna do something even a little crazier? I wanna sort the number of vowels per word. So I'm gonna have here words equals this here
is a fascinating, sincelaying text. And what am I gonna do? I'm now going to just find a new key function by vowel count. And I'm gonna ignore the print for a moment, we'll talk about that in a second, total equals zero. We're gonna iterate over each letter in the word. We're gonna take the word through its lowercase,
or count up the vowels and return that. And sure enough, we can do this. But this print here, why did I put this print here? If I put the print there, then I will see, don't do this in production, right? Don't do this in a real system. But for debugging and testing, it's great because I can see exactly which words are being passed to by vowel count. And so if I run this now,
we will see print sorted words key equals by vowel count. Look what I get. It's gonna show me all the words that were checked and shows that each word was checked once and once only. And then basically, we'll see that the words are indeed sorted by the number of vowels in there. Well, that's sort of file names by file length. So I'm gonna get a little help here. I'm gonna use the glob module and the OS module.
And I'm now gonna get a bunch of, what? Oh, sorry. Okay, ah, all right, so, sorry about that, folks. So by file length to file name, all right? So what are we gonna do? We're gonna get a file name, and we're gonna run OS stat on that to get its size. Fantastic. And now I'm gonna get all the text files in the current directory.
I'm gonna sort them by file length. And you don't know these files, but I promise you this actually happens to be the case. It's from the smallest file to the biggest file. I can sort the file names by the file's file count, where I can use the same sort of thing as I did before. But now I'm gonna go through the entire file line by line, go through each character in each line one by one,
count the number of vowels, and then we'll see here that I can actually count the, sort the file names by the vowel count. Well, let's get even fancier. What if I have a list of dictionaries, right? So here are some people, I have my children and me, they enjoy appearing in my slides or so, I think. And we have first names, last names, and ages. So can I sort this list of dicts?
Nope, can't do that. Why? Same sort of values we saw before. Python does not know how to use less than between instances of dict. It doesn't know how to compare dictionaries in this way. But we can use a key function. So let's say I wanna sort these people by age. I'm gonna define a key function by age of D,
and then I'll just return the age value, the value sorted with the age key from there. And now if I say print sorted people, key equals by age, sure enough, I get the list of dictionaries back, and they're sorted by age. What if I wanna sort by multiple keys, like last name and then first name? Well, then what I'm gonna do is I'm gonna have the key function return tuple
because Python knows how to sort tuples. And so what I can do is I can say return by first last, return first the last name, and then the first name. And sure enough, if I now say print sorted people, key equals by last first, what am I gonna get? I'm gonna get the people sorted first by last name, then by first name. So you'll see here I come first because my last name is shorter,
and then my children all have the same last name, but then sorted by their first names. Pretty snazzy. Now, the thing is, why should I write a key function? I wanna be lazy, right? I can use Lambda. Lambda creates an anonymous function. And as long as you are passing a function object to the key parameter, you're fine. So here I'm gonna use Lambda,
and it's gonna do exactly what I did before. It's gonna take a dictionary, and it's gonna return a tuple with last and then first. So it will sort in exactly the same way. But you know, I can do it even better. A more Pythonic modern approach is to use operator.itemgetter. And it's a function that returns a function. So if I wanna sort people by age,
I can just say, hey, key equals operator.itemgetter of age. That will return a function that will indeed invoke square brackets age on each element and thus allow us to sort them that way. And we can even do better or differently. We can say key equals by first last to sort by last name and first name. Or I could say operator.itemgetter last and first, and then it's first gonna do last name
and then gonna do first name. Exactly what I want. But what if I have my own class? Well, if I wanna sort objects. So here's a class person, exactly the same data as before, just a class rather than a list of dictionaries. And now I have a list of instances of person. And if I wanna print sorted people, what am I gonna get? That's right, another error. Why?
Because less than doesn't work on this. How are we gonna change that? We're gonna implement less than. That's right, the dunder lt method allows me to do that. And now I might as well add eq, dunder eq to provide for equal equal as well. And if I do that, now actually I can sort. You wanna sort by last name and first name,
fine, we can do that too. But wait a second, what about if I wanna sort by, I wanna check greater than and equal? Because we've defined less than and we define eq. What if I try to say, people one is greater than equal people zero? No, I won't do that. Because maybe I've defined less than, but I've not defined greater than equal. What are you talking about?
This is where total ordering comes in, folks. I can use from func tools, the func tools module, I can use the total ordering decorator. And total ordering will use lt and use eq and build up all of the rest of the comparison methods automatically on your class. Pretty darn cool. So now if I say, hey, is people one greater than equal people two?
Oh yeah, that's fine, that's true. But why should I work so hard? In Python 3.7, we introduced data classes and they reduce our code for grading classes by a lot. And they automatically handle sorting. Watch this. From data classes, import data class. And now here's my class. At data class, order equals true, class person,
firster, laster, age int. So I need to import data classes and data class. I need to apply the decorator. I need to say order equals true. That doesn't happen by default. But then it's gonna sort according to these attributes. So how's it gonna sort them? Let's take a look. Here are my people and let's print them out.
And we're gonna get, that's not sorted at all. Or it is sorted, but how? Guess what? It's sorted by first name, then last name, then age. Why? Because that's the order I specified the attributes in. So let's redo this a little bit. Let's say I'm gonna have now the attributes be last, then first, then age.
And of course, because the order changed, I'm gonna need to change how I create my list of people. And now when I sorted, sure enough, it's gonna sort first by last name, then by first name, and then by age. So it works just great. So Python sorted function lets you sort anything. And the key, ha ha, is the key function.
Right, any function method that takes a single input returns a sortable output is fair game. You can write your own, you can use lambda, you can use operator item getter. Classes are sortable. All I have to do is define a few magic methods. And if you use data classes, it's even easier than that. Cool. We covered a lot of stuff. I'm very happy in my few remaining minutes to answer questions.
And let me know. Thank you so much. That was actually very insightful. I enjoyed that a lot. I wish I could get this, ah, here we go. Here we go, actually, if I hit this button. There we are. There we are. There we are. Oh, I thought there was noise on the line for a moment.
Thank you, that was very impressive. There we are. I actually do have a couple of questions here, and we'll see how many we can get through. So one person, Venecius says, is sorting on the database faster or more efficient? Oh my God, if you're using a relational database,
please use the database to do your sorting. It's gonna have, like, unless it's a really bad database, but like, it's gonna do things for two reasons. First of all, it's implemented in C almost certainly. So it's just gonna run faster. And second of all, it's probably got all sorts of like extra indexing thrown in there.
But the third reason is you don't wanna pull all the data from the database into memory in Python, into Python data structures, and then sort it. It's much better just to get it sorted in advance. Okay, Pascal says, awesome, Reuben. Do you know if key functions can be used to sort data frames as well?
Ooh, yes, but not in exactly the same way. So you're like, so pandas data frames, they're sort of like Excel spreadsheets, although soon Excel will be saying it's sort of like pandas, right? But basically, you can sort in all sorts of different ways. I know that you can apply functions there. I can't remember if the syntax is exactly the same. It's not exactly the same as key functions in sorted,
but it's really close and similar. All right, Israel asks, how would you put multiple sort options on a class or a data class? I don't think there's a good answer to that, unfortunately. I mean, you would need to really use a key function there
and sort it. You might be able to have special methods, right? Because you can define the main methods that you want in your class or even on a data class, but then you would have to specify it. Sort is gonna assume that whatever the objects less than, greater than, equal to. I mean, I guess if you really wanted to be sort of weird,
you could have like some sort of attribute that you could switch off how you want to do sorting, and then the LTE, EQ and so forth would switch off accordingly. But I don't know if I'd recommend that. Okay, and then actually one more I have from somewhere else. And if anyone else has any that they want,
if they want to ask anything else, there's the talk, Sorting Anything. And you can ask additional questions there and chat with Reuven. So the last question then is, and it's a little bit outside the realm of Python, but have you heard of dual pivot quicksort?
I have not, but now I'm gonna write it down and find out about it. Excellent. All right, so I think that's all we have. So once again, thank you so much, Reuven. Thanks, everyone.