From days to minutes, from minutes to milliseconds with SQLAlchemy
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 118 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44840 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
EuroPython 201931 / 118
2
7
8
15
20
30
33
36
39
40
45
49
51
55
57
58
63
64
66
68
69
71
74
77
78
80
82
96
98
105
107
108
110
113
115
00:00
Receiver operating characteristicGoogolPoint cloudSoftwareJava appletStack (abstract data type)Regulärer Ausdruck <Textverarbeitung>Formal languageObject-relational mappingRelational databaseObject (grammar)Axiom of choiceDecision theorySoftware frameworkCodeVolumeQuery languageData modelCASE <Informatik>Product (business)Expert systemMusical ensemblePhysical systemTable (information)CASE <Informatik>MereologyDatabaseGoodness of fitAttribute grammarSoftware frameworkInstance (computer science)Roundness (object)Social classNetwork socketQuery languageJava appletCodeLogical constantComputer configurationPattern languageRow (database)Object-relational mappingCartesian coordinate systemDecision theoryConnected spaceNumberFlow separationNormal (geometry)Endliche ModelltheorieAxiom of choiceCore dumpProgrammer (hardware)Information securityDirection (geometry)Mathematical optimizationDesign by contractService (economics)Traffic reportingConstructor (object-oriented programming)Bit1 (number)Pay televisionVolume (thermodynamics)Product (business)Formal languageMultiplication signProcess (computing)Arithmetic progressionAdventure gameFrame problemWebsiteParameter (computer programming)Scaling (geometry)Mathematical analysisLecture/ConferenceComputer animation
08:18
Product (business)Data modelQuery languageObject-relational mappingMultilaterationEndliche ModelltheorieNumberPay televisionElectronic mailing listAuthorizationPrime idealOperator (mathematics)Series (mathematics)CodeInheritance (object-oriented programming)BitSocial classComputer animation
09:56
Data modelQuery languageProduct (business)Digital filterObject-relational mappingIntegerCone penetration testSocial classOrder (biology)Network topologyNumberSeries (mathematics)Query languageProduct (business)Endliche ModelltheorieFilter <Stochastik>DatabaseKey (cryptography)Inheritance (object-oriented programming)Attribute grammarFunctional (mathematics)Object-relational mappingComputer animation
11:16
AerodynamicsIntegerSeries (mathematics)Product (business)Data modelQuery languageInformationKey (cryptography)Cache (computing)Mathematical optimizationDifferenz <Mathematik>Traffic reportingView (database)Series (mathematics)Social classLevel (video gaming)Parameter (computer programming)Attribute grammarPoint (geometry)Differenz <Mathematik>Table (information)Set (mathematics)DatabaseElectronic mailing listIntegerStandard deviationRegular graphInstance (computer science)Order (biology)Key (cryptography)Multiplication signObject (grammar)Declarative programmingCartesian coordinate systemQuery languageOperator (mathematics)Process (computing)MappingResultantDiscrepancy theoryGraph (mathematics)InformationCodeQuicksortDynamical systemComputer animation
17:02
ClefPrincipal idealRoundingSeries (mathematics)CalculationNetwork topologyTotal S.A.QuadrilateralFrequencyMiniDiscCategory of beingOpen setQuery languageDigital filterCountingCalculusAsynchronous Transfer ModeMoment (mathematics)Point (geometry)Filter <Stochastik>Instance (computer science)InformationDatabaseSummierbarkeitWordSocial classLoop (music)Object (grammar)FrequencyQuery languageCodeSeries (mathematics)SpacetimeAdventure gameMeasurementNumberRight angleWritingRow (database)Category of beingSystem callMusical ensembleWeb pageFunctional (mathematics)Computer animation
20:30
Query languageProduct (business)System callStatement (computer science)Cursor (computers)Parameter (computer programming)Physical systemEvent horizonSummierbarkeitDigital filterDecimalSimulationState diagramRandom numberInheritance (object-oriented programming)DatabaseError messageSummierbarkeitGroup actionFunctional (mathematics)Right angleMereologyOrder (biology)SequenceRepresentation (politics)Query languageRow (database)Filter <Stochastik>ExpressionNumberSystem callGraph coloringScalar fieldCASE <Informatik>Computer animation
23:31
FrequencyIntrusion detection systemDigital filterNetwork topologyDesign of experimentsTotal S.A.Query languagePermianOperator (mathematics)File formatSeries (mathematics)Principal idealIEC-BusQuadrilateralExpert systemRing (mathematics)PRINCE2Cache (computing)Pay televisionBulletin board systemData storage deviceExecution unitMilitary operationSpacetimeSource codeFinite element methodSimultaneous localization and mappingTimestampSummierbarkeitCalculationOperations researchProduct (business)Line (geometry)Row (database)NumberResultantEqualiser (mathematics)SummierbarkeitBitMultiplication signOperator (mathematics)Social classFilter <Stochastik>System callType theoryDatabaseDifferenz <Mathematik>Source codeEnvelope (mathematics)FrequencyObject (grammar)Computer animation
26:25
Product (business)Digital filterNetwork topologyPrincipal idealMilitary operationData modelQuery languageOperations researchCalculationSummierbarkeitTimestampFrequencyIEC-BusQuadrilateralDecimalElectronic data interchangeTable (information)Content (media)Personal identification numberPhase transitionCommon Language InfrastructureSimultaneous localization and mappingSeries (mathematics)Order (biology)Database transactionString (computer science)Pay televisionLoginCone penetration testSoftware testingIntegerEmulationLattice (order)State of matterMiniDiscSource codeData typeChi-squared distributionGUI widgetKernel (computing)Cellular automatonView (database)Menu (computing)Row (database)FrequencySeries (mathematics)Operator (mathematics)Product (business)Order (biology)DatabaseRight angleState of matterParameter (computer programming)AverageFactory (trading post)Query languageMathematical optimizationObject (grammar)Functional (mathematics)SummierbarkeitTransport Layer SecuritySystem callResultantRevision controlFilter <Stochastik>File formatKey (cryptography)Variable (mathematics)Group actionQueue (abstract data type)Multiplication signFunction (mathematics)Client (computing)Level (video gaming)Attribute grammarSelectivity (electronic)Special unitary groupParallel portGraph coloringIntegerDivisorComputer animation
34:04
TimestampMilitary operationPrincipal idealSeries (mathematics)Query languageEvent horizonException handlingProduct (business)Local ringThread (computing)Electric currentNetwork topologyCategory of beingAddress spaceProof theoryDigital filterPort scannerDiscounts and allowancesPoisson-KlammerFrequencyPRINCE2DecimalLogarithmThresholding (image processing)Cache (computing)File formatReal numberTotal S.A.InformationMaizePrice indexQuadrilateralExecution unitCountingTunisRAIDPay televisionLattice (order)Interior (topology)Local GroupOperations researchBlock (periodic table)Maß <Mathematik>Element (mathematics)INTEGRALTraffic reportingCountingSeries (mathematics)Multiplication signCache (computing)State of matterCalculationInsertion lossMereologyRound-off errorDatabaseRight angleRange (statistics)InformationNumberSelectivity (electronic)Physical systemLoop (music)Table (information)Parameter (computer programming)Query languageDifferent (Kate Ryan album)Operator (mathematics)Mathematical optimizationSet (mathematics)Dimensional analysisSummierbarkeitComputer animation
39:32
Category of beingIntegrated development environmentNetwork topologyElectric currentWindows RegistryAerodynamicsSeries (mathematics)Physical systemUser profileDigital filterMultiplication signMathematical optimizationWeb pageSeries (mathematics)DatabaseRow (database)Arithmetic meanProfil (magazine)Phase transitionMusical ensembleCategory of beingDynamical systemAuthorizationArtistic renderingMereologySystem administratorInformationBitCodeQuery languageComputer animation
41:30
Context awarenessNetwork topologyDigital filterException handlingAttribute grammarDefault (computer science)Parameter (computer programming)User profileInclusion mapCategory of beingComplete metric spaceEmailFinitary relationIntegerProxy serverAssociative propertyIntegral domainAuthenticationLemma (mathematics)Query languageCache (computing)Profil (magazine)DatabaseMereologyComputer fileElectronic mailing listMultiplication signQuery languageProgrammschleifeStructural loadLattice (order)AuthenticationComputer animation
42:57
AuthorizationMathematical optimizationFunction (mathematics)Regulärer Ausdruck <Textverarbeitung>Observational studyQuery languageExpressionCategory of beingFunctional (mathematics)Tracing (software)MetreDatabaseGroup actionFilter <Stochastik>BitMultiplication signWindowWeb pageCartesian coordinate systemSystem administratorComputer fileProgrammer (hardware)Selectivity (electronic)Context awarenessQuery languageData loggerObject-relational mappingSummierbarkeitReading (process)Computer animation
45:01
Lecture/Conference
Transcript: English(auto-generated)
00:07
Okay, so I'm Leo, I'm a tech lead at Jirou, which is Fintech in Brazil. It's a company that provides credit for people who needs a quick loan.
00:22
I'm no expert, not a particular expert in SQL or SQLAlchemy or Amazon in general, but I've learned a few lessons optimizing some code in my company last year that I would like to pass along.
00:40
So Jirou, as I said, is a Fintech in Brazil. Our stack is mostly Python, Pyramide, SQLAlchemy, Postgres. There is some MongoDB, some Celery, some Java somewhere. We have been carving out this monolith that we started it with in different services, and so some new services get written
01:02
in some different languages, but most of the time, we end up picking Python, Pyramide, SQLAlchemy, and Postgres. And I really like SQLAlchemy. Well, it has two aspects. It's the core, which is basically a DSL
01:22
for constructing SQL queries using Python constructs. And then on top of that, you have the ORM, which helps you map tables to classes and records of those tables to instances of those classes.
01:41
And for a programmer who mostly knows Python, it's obviously a lot more comfortable dealing with the instances of an ORM than it is juggling SQL. And we at Jiro use the ORM most of the time,
02:05
and we tend to use very little the SQLAlchemy core for writing SQL constructs directly. I think SQLAlchemy is awesome. However, whenever you choose a framework,
02:20
it's usually good if you know what the framework is doing behind your back, because frameworks still require you to make decisions about how to use them, and knowing the underlying patterns is essential if you're to make wise choices. So using the ORM is really comfortable. As I said, you declare your class,
02:43
you get instances from it, and then you're manipulating it with the attributes and the methods. Like, there is no SQL database behind it, and that's really comfortable. But it's also a problem, because the database is an external system.
03:01
Most of the time, you should actually treat it as if it were an API of a foreign system. Because it is, really. It is an external system you're talking through, through a TCP IP connection, or a Unix socket connection. And the API to that just happens to be SQL.
03:25
So, what happens then is that you're writing your Python code, and it looks like perfectly normal Python code, but ends up doing bad performance access to a database. It's noticeable in low volumes,
03:42
like when you're developing, or when you're just going to production, so you get lulled into this false sense of security that everything is going fine. But then after a while, your database starts to crawl, your application takes too long. The fix for that, of course,
04:01
is to let the database do its job. You need to be aware of the implicit queries that your ORM is doing whenever you access your instances in your classes. Especially when you have relationships between your instances that map to different tables in your database.
04:20
Those are the ones that tend to cause the most bad pattern of accesses. You should try to do only, approximately only one, or a constant number of queries to your database per HTTP request,
04:41
or API request, or background job. And you should avoid as much as possible looping through instances of your model in Python code. Because the database is a lot better at doing that. You should also be mindful of the amount of round trips you do.
05:01
You should try to do only a fixed number of queries per request, because every round trip costs time. But you also should be mindful of the amount of data you're pulling out of the DB when you do some requests. So I'm gonna talk about specific cases here that I've optimized.
05:22
We had this report that we ran, that we still run about once a month. And in the beginning it was really fast. As years passed, it was taking over 24 hours to run. So let's talk a little bit about JIRAU.
05:41
I don't know if it's readable, but JIRAU is a credit company. It provides loans for people who access our website. We do everything online. You snap your documents, and then after you're happy and do a credit analysis, we send money to your bank. And then you pay back your loan by paying bank slips,
06:03
and you never have to actually face, you can actually, we are designed so that you don't actually, you cannot actually visit us to talk about your cases. Everything is done online. And an early funding model for us to get money to lend to people
06:20
was that JIRAU created this funding company, separate funding company that issued debentures. Debentures is like an official loan from a company that the market can buy to give money to the company without becoming a partner, without it being a stock option. So the company would issue new debentures every six months,
06:46
and these debentures had a contract that said that whatever the loans, the borrowers would pay back, would be the payback for the people who purchased the debentures. So the company was never insolvent.
07:01
Of course, we still want to have a good credit model, otherwise people will not want to buy our debentures. So we issue debentures every six months, and debenture holders buy these debentures. They put money in the company. And so we grant loans with that money, and the borrowers pay back those loans.
07:22
And at the beginning of every month, we'd look at what we got from and pay back once a month the debenture holders. Of course, it's a lot more complicated than that, because part of the money is the amortization, which is the money that the debenture holder
07:42
actually lent us. And then on top of that, you have the premium, which is what you are paying on top of what the debenture holder put in so that they can be happy with their investment. And taxes affect only the premium, not the amortization. And taxes, you pay less taxes, the longer it takes you to pay amortization.
08:02
So we do this with the numbers, like we are only paying amortization for a while instead of paying premium. Then we save a little bit, because in Brazil you cannot only pay premiums, so we have to save some of the amortization, and then start paying back the premium, and then paying the amortization last,
08:21
so there's a whole list of numbers. But that only happened in later debenture issuances. The first debenture issuances, we were paying back from the principle of the loans that are granted to the amortization, and from the premium of the loans
08:41
that are granted, the premium of the debenture holders. And the entity relationship model looks somewhat like this. You have the debenture holder, the debenture belongs to the debenture holder and to the debenture series. The debenture series has an account.
09:00
All the operations from borrowing and the payout come from this account, so we have a bunch of operations. And if that sounds awfully complicated, don't worry, it gets a lot worse. So let's look a little bit at the code. How do we actually code that?
09:22
So first you have this base class from SQLAlchemy. We declare it by creating our own base class, which you called here ORM class. And we added a convenient class method here.
09:40
It's actually class property, yeah. So what we do here, we take the SQLAlchemy DB session and make it easily available inside the class so that you can say, oh, I have my model dot query, and then you can apply filters,
10:01
order by, joins with other classes, and things like that. So it's a little convenience attribute in the class. And so this base here, there's this method, this function from SQLAlchemy that makes this, that wraps this ORM class,
10:21
and you have this magical base class here, which we then have to use in all the models that we declare. So like we said, the debenture, which is the thing that we sell to the debenture holders, which is like the loan that debenture holder is granting to us. It has its database key.
10:41
It has a serial number, which is very different from the ID because the serial number repeats inside every debenture series, whereas the ID is constantly different for all the debentures, no matter what.
11:00
There's a sale price because if you buy at the beginning of the series, you buy it close to the official debenture price, but if you buy it later, it costs you more because it's the price of opportunity for being late to the party. And so we have a sale date.
11:24
Here we are continuing the same class. So those here were regular attributes of the class which get mapped to columns in the database in a table called debenture.
11:44
Here we are talking still about the same class, but here we see some relationships. So we have this holder ID, which is also a regular column, a regular integer column, but then we add a declaration that it's a foreign key into the ID column of the debenture holder table.
12:03
And on top of that, we declare the holder relationship, which has as foreign keys the holder ID that we just declared. So this way, I get a holder attribute on my debenture instance,
12:21
which fetches me the debenture holder instance transparently for me, but this transparently means it's doing a query to the database at least once. During the session, SQL Academy actually will cache that instance so that it doesn't fetch it again every time I access the attribute. But still, I need to be mindful.
12:41
The first time I access this holder attribute, there's gonna be an SQL query, unless I play some tricks on it. So as I showed in the graph, the debenture also has a relationship to the series, to the debenture series that is issuing this debenture.
13:04
And again, we have a series relationship mapping to this column. There's another interesting aspect here, which is this back ref. We saw it here at the holder declaration as well.
13:21
What this does is create a debentures attribute in the debenture holder class that points to an iterable, kind of a list, of debentures that I can conveniently access
13:40
from the point of view of the debenture holder. Same thing here in the debenture series. I get a debentures attribute there, which is an iterable of debentures that points to that debenture series. There's this lazy dynamic here.
14:01
What does it mean? That actually tells you what kind of iterable it is. If you don't say anything, it will create lazy attributes that the first time you access it will make a query to the database and bring back a Python list. But if you do like I did here and say dynamic,
14:21
then instead, the debentures attribute inside the debentures series will be a query object, pretty much like this query object here that allows me to apply filter, order by, and other things like that.
14:41
So by doing these dynamic relationships, I enable the application to lazily get a query object, apply other operations on top of it, like filtering, ordering, joining with other relationships, and only then,
15:04
only when I try to iterate over it, it goes to the database and fetches the data. So I had to debug an issue that was found by the financial people that we were not paying exactly what we should be paying the debenture holders.
15:22
There was some discrepancy, and I started debugging this hours-long report. It took hours because, well, the report itself took about five hours, but it also depended on another process that cached some information, and this other process also took about four hours,
15:41
and that was for each debenture series that we had, and by the point, we had about six debentures series, so the whole run of reports took more than a day. And I was starting debugging that and saying, well, this is taking too much time, so I enabled SQLAlchemy to do debugging for me.
16:03
If you take the standard Python logging, you take the SQLAlchemy engine logger and set the log level to info, it will log every query. If you set it to debug, it will log every query and the results, and when it logs the queries, it logs the parameters that are used by the query as well.
16:23
So I enabled the logging and started running the report, and suddenly, I'm seeing gobs and gobs of the same query repeated over and over and over. So, what I did,
16:40
let me find the diff here, so let's see some code. That's not readable, right? The people in the back, can you read it? Nope. How about now?
17:02
Nope. The people in the back can read it? Okay, now you should be able to read it, right? Okay, so let's see if I can show the things. I think I'm gonna switch to mirror mode.
17:32
So, this is gonna be kind of hard
17:52
to fit all the code in this space, but the kind of things that were being done here, so we have this total paging.
18:01
What does this method here do? It gets all the money that was paid in the debentures of that debenture series. This is a method of the debenture series class. And so, you can call it by specifying a period, if you want, and it gets the debentures that are owned by someone,
18:21
the debentures that were actually sold, and this is a property that brings the debentures related to this debenture series that actually have owners. And it's adding filters here, if you pass the start date or end date, and then it's doing some of the sale price
18:43
of all these debentures. The moment where the code actually goes to the database to fetch is the moment where iter is called into this object. And what this is doing here is taking a huge number of debentures,
19:03
pulling all their columns from the database just to sum their price. Now, when you look at this, this is perfectly reasonable Python code, right? You're summing the sale price of a bunch of debentures. That's exactly what you want. But here, you are pulling a huge amount of information
19:20
from the database just to get the sum of what is essentially a column. And if we had done something, if instead of sale price, we had done something like d.series.saleprice, then not only would I be fetching a bunch of queries,
19:43
a bunch of information with all its columns, for each point in the loop, I would be fetching another record in the database with all its columns and then summing it. This is called the n plus one select problem. So instead of doing that, what I did here, I created this function
20:02
called getColumnSumForQuery. What's the query? The query is debentures, so you see it's the same word here and here. But then I'm going into the class instead of the instance to get the sale price column. So I did not actually have to write SQL by hand
20:23
to have SQLAlchemy do a performant query for me. What does this getColumnSumForQuery here do? It's right here. First, it gets the query, it assumes the iterable is a query, it drops whatever ordering it has, I'll explain later why,
20:44
and replaces all that it was going to fetch with a single expression, which is this coalesceSum of the column. Why does coalesceSum? It's right here. It's using SQLAlchemy func sum,
21:01
which is a representation of the sum function of the database around the column. Now, this doesn't need to be an actual column. This could be a column in our expression or anything that feels like a column to SQLAlchemy. So it's using the sum from the, the sum aggregating function from the database,
21:22
and then it's calling coalesce with zero to it. Why is it doing that? Because in SQL, if you do a sum of a bunch of records, of a column of a bunch of records, but this sequence of records has no records in it, instead of getting a zero, you get a null.
21:42
Or if you have records, but all the columns in those records are null, you're gonna get a null back. But most cases, when I want to use a sum, I actually want a zero back in those cases, so I create this coalesceSum function to return a sum of a column or a zero if there is null in there.
22:02
Please ignore the filters and label for a while. We are going to go back to it later. So I replace all the entities in the query with just the coalesceSum of a specific column. And if you remember it, it was debentures.seo.price.
22:23
And now the reason why I drop whatever ordering the query had is because since I'm calling an aggregating function, either I have to have a group by clause on the SQL, or whatever ordering I have, whatever ordering I have needs to be part of this group by,
22:43
or I cannot have an order. So I make sure there is no order in this query, especially since I'm not using a group by here, in this case, which means that the database will return a single record, and in this case, a single record with a single column, which is the sum I'm asking.
23:02
Because of that, I'm calling the scholar method from SQL, Alchemy, which does exactly that. It gives me the value, not necessarily a number, but the value that is in the single column of the single record of the query that I just did. If the query is not a single record
23:20
with a single color, it raises an error. But this is a very convenient function when I want just a number, which is a sum of a column. So coming back here, instead of looping through a huge number of objects, with all their columns just to get the sale price,
23:42
I'm asking the database to bring me exactly the sum that I want. And I got that in the diff with a single line that's very readable. So in this calculate total paid out, which is everything that I have already paid
24:02
to the dementia holders, they have the same issue. But here, instead, they were looking at the payouts, which were the operations of payment operations. And then I replaced with the same thing. Going back a little, going under a little bit more, there was this whole method here.
24:25
It was called calculate total values involved, which was doing a huge amount of those things, getting the sum of a bunch of operations. These was taking a very long time. And the operations were all a result
24:41
of calling these methods here. Payback operations, earning operations, earning tax operations, and things like that. So when I looked at those methods,
25:02
let's search for one of these methods here, for example, earning operations. So what it was doing, it was looking at self.operations,
25:22
which is one of those query relationships that I showed. Filtering by specific tag, filtering it by some specific dates, and returning them. And all these operations, your fee operations, earning operations, earning taxes operations, were doing the same thing.
25:41
So what I did first was to factor out the data filters and create this class that's really just a record that's collecting filters. So a payback filter is this criteria, source type equals note,
26:00
tags equal note payback, a zero fee is this filter, earnings is this filters, et cetera, et cetera. And what I did with those was I refactored those other methods to get those operations, call these operations in period, which is where I figured out the data queries,
26:24
the data filtering of the operations, and replaced those with the filters applied. Why is that useful?
26:41
Because then I could also do this other method here called calculate operations summary, which returns a single record. What record is that? It takes all those operations in the same period,
27:02
ordered by none, replace all the entities with the summary columns. Why I call them summary? Because just like a bestiary is a collection of beasts, a summary is a collection of sums, and that's exactly what this method does. It takes a coalesce sum of the same columns
27:24
that we were looking at, giving them labels that are like the variables that were being collected in that method above, and returning those as the columns that I'm gonna put in the query.
27:43
And because this is a coalesce sum and I'm not doing a group by, the result of this query is gonna be a single record, which is why I call the one method that returns that single record. So when I use this method here,
28:04
I get back this operation summary, which is a record that has as attributes the labels that I pass to the columns. So when we get here,
28:20
that's why we have this label parameter here in coalesce sum so that I could give that specific label to my summing column. And what is this filter here? There is this characteristic in SQL where if you have an aggregation function in a column
28:43
like sum or average, you don't necessarily need to do that over the whole of the records, or the whole of the records of a group. You can actually apply filter filtering. What does that look like?
29:04
So that is unreadable, right? Is it readable now? Okay, one more, just to be sure.
29:23
So yeah, this is nice to show as well. I have some of those models that I declared. I created a dementia holder, added it to the database, flushed it so that it has its primary key.
29:41
And here I show what a query for a dementia looks like. In SQL alchemy, you can print a query and it gives you the stringified version with the parameters there in place. But it's nice to know what the parameter is gonna be,
30:02
so I created this function that formats and colorizes output and tells me what are the parameters. So if I do a debenture query, but filtered by created bigger than today, and holder equals a certain holder,
30:20
that variable that I created just above, it generates this query to the database. So we can see that it has created here and the dementia holder. And I can see what parameters is going to pass. So created is a date, time with this value.
30:43
And the parameter one here, which is the dementia holder, has this integer value here, which is the primary key of the dementia holder, that was doing it as the filter. Now, the nice thing about here is that I'm actually comparing the relationship object,
31:00
not the ID of the object here, but it translates me into comparing the ID in the query. So here I created the debenture, added to a series. The debenture series object is actually complicated for relationships, so I created a factor for it.
31:24
And here I can check that the holder ID, the debenture holder, is exactly the ID of the debenture holder that I created. So when I look at the debentures of the debenture holder, just like I said, it's a query that selects
31:41
all debentures that match that debenture holder. If I also filter by debenture state and sale date, then I get this other query here with all these parameters replaced. So going back to our optimization,
32:02
the debenture series operations is this query over operations here. And when I ask for giruffi operations, which was that method, it's the same query over operations, but with added filters in the where clause, right? That means, okay.
32:24
So here is what I did with the operation summary. I replaced all the columns in the operations with those columns, and since the formatter wasn't very good, I did my own formatting here.
32:42
Let me show that instead. So instead of, since I added those filters at the column level, instead of putting those filters at the where clause, it's actually putting those filters along with the sum.
33:00
This is very useful for things like, I want to know the percentage of clients that have a certain characteristic over all of the clients. I can do that by doing a sum filter and dividing that by the sum without the filter, and the database fetches that for me. I don't need to do two queries, one with the where clause and one without,
33:20
and then dividing it at the Python side. I can have the database do that for me at the database side. So here we have the operation summary. It creates a bunch of columns which have these filters and doing the respective sums for me.
33:41
So here we have payback operations, and here we have, so you have these tags here and these tags there here, so we have no paybacks and fees and earnings and things like that. So it's filtering them at the select clause.
34:04
In the where clause, it's just making sure it's selecting the operations with the right state belonging to the right account and in the right time range that I asked. So with a single hit to the database,
34:20
I selected subsections of all the operations, did a bunch of calculations for them and got exactly the information that I wanted. And with these kinds of things, I got completely rid of the cache and brought down the time of the report from nine hours per series down to four hours per series.
34:44
Why four hours? Because the rest of the time, it was not just a report. It was actually inserting the debenture payments into the database because next time I wanted to run this report, I wanted to run a difference and anything that was not collected correctly or any rounding errors should be paid in the next month.
35:05
So to optimize the insertion, I had the system, let me locate it here.
35:27
So the report already had some optimization done before
35:43
which was to calculate the insert query for each debenture payout manually. But it was still calculating one insert per debenture every month. Of course, most of these inserts
36:06
they look exactly the same because all the debentures bought by a debenture holder on a certain date have the same calculations. So they were caching the calculations
36:22
by sale date and holder ID, but still they were looping through all the debentures. And then creating those inserts one by one. Instead of that, what I did was to loop through only each integralization.
36:43
An integralization is a set of debentures bought by a holder in a specific day. So I created this criteria which is the integralization columns which is the holder ID and sale date. I'm adding a count of debentures
37:01
so that I can do the proper calculations. I'm joining the debenture holders and the series here so that it fetches everything at the same time. And I'm asking for a distinct query so that the database only fetches one debenture
37:20
per integralization. And because of the distinct, I need this ordered by the integralization columns and the debenture series number. What does the query look like? So because I asked for a distinct,
37:40
SQLAlchemy rendered this as this distinct on query here and then selected all the information from the debentures but only of one debenture per integralization. And then it joined that information, it's kind of hard to read here, but it joined that information
38:01
with the summary information I needed like the sum of the debentures paid and things like that. So it goes with a single hit to the database, fetches all the information and then I can loop and it's all calculated. But the most important part here is that instead of calculating the inserts by hand,
38:22
I get the debenture payout table and ask for an insert query. And then instead of a regular insert, I do an insert from select. And I select from the integralization debentures, replacing the entities with the columns
38:41
that I need to populate one debenture payout. What does that look like? So here I do the same query and what it does, it does an insert into the debenture payout,
39:01
all the columns that I selected and what it's inserting is a select of all those parameters. Some parameters are constant, some other parameters aren't from the select. And then it's mapping all those information. So instead of doing inserts one by one,
39:21
manually written in SQL from the Python side, I'm actually asking the DB to do the inserts for me. And that brought down the report time down to 15 minutes from nine hours per debenture series.
39:40
The last optimization I did, and this is the one that justifies the from minutes to milliseconds, is I optimize a page rendering based on how much time it was wasting authorizing the user. Let me find here.
40:05
So during the authorization phase, while rendering a specific page, it was checking if the user had permission to see certain bits of information.
40:22
This is server-side rendering, so it's not so fun to do these days. But it was doing user.admin.hasPermission, except that admin is actually a property that does a query to the database. So it was doing a query to the database every time and then asking if, and getting an ORM record
40:42
and asking if it has permission, which also goes to the database to fetch if that permission belongs to the user. It was doing that all the time. So just looking at this code, I already said, well, just pull that hasPermission out.
41:02
But the biggest part of the optimization actually happened. Well, I replaced the property by reify, which caches the property and fetches only once. But also, instead of having those profiles
41:26
be dynamic queries, I did the opposite. I pulled all profiles at once, also through our SQL document relationship. So we have this all profiles relationship here
41:41
that is not a dynamic query. It goes to the database once when it is accessed and pulls a list. And so we have a Python list every time you access it. And then I created a get matching profiles, which is a Python file that loops through this list. Now the thing is, this is a tiny list. The roles of our user is a tiny list.
42:02
So it actually makes sense here to pull the calculation out of the database into the Python side, because I'm minimizing the amount of data that I'm circulating. And the other big part here is that
42:24
while fetching the user for the authentication, no, it's not this, let me see, oh, it's here. While fetching the user for the authentication, I'm actually instructing SQLAlchemy
42:41
to join the load, all the profiles, and join load of those profiles, all the permissions. So it does a single query pulling the user, all its profiles and all their permissions in a single hit to the database. And now when I access the properties
43:00
of this linked data structure, I'm not going to the database anymore. And that brought down the page that took minutes to render down to milliseconds. So in conclusion, and yes, there is a conclusion, okay.
43:25
Yes, to figure out what was happening, I used this, there's this Python package called slowlog, which is good for WSGI applications. When a request is taking too long, it starts dumping stack traces into a log file.
43:40
It's perfect to see what is wasting time where without you having to be there when things go slow. You can just go to log file later and retrieve it. And then I saw that that function there with the admin was the slowest one. So in conclusion, ORMs are very nice
44:04
to get started if you're a programmer and not very familiar with SQL. It's a good way to get started. I'm not dissing on ORMs here, and I love using ORMs. But you should understand your SQL. Read the select documentation of your database and try to understand what every bit does there.
44:22
I figured out that you could apply filter where inside the sum columns by reading the select documentation. Understand group by and aggregations and how aggregations reduce the cardinality of your functions. Learn about aggregation functions with filters. Learn about distinct on and window expressions
44:41
that will help you write SQL that fetches things very efficiently. And then study SQL alchemy. Be aware of the underlying queries that it does. Push as much work as possible to the DB, but not too much because sometimes your query is gonna spend hours. And that's it. Thank you very much.
45:13
But I'll be around if you have any. Just come talk to me.