We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Boosting Performance of Order by Limit Queries

00:00

Formal Metadata

Title
Boosting Performance of Order by Limit Queries
Alternative Title
Knocking down the barriers of ORDER BY LIMIT queries with MariaDB 10.5
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The talk will start with a recap of how MariaDB(or MySQL) handles the ORDER BY LIMIT optimization and examples demonstrating why the current optimizer is not good enough. Further, the talk will describe how the optimizer in MariaDB 10.5 mostly solves the issue, the remaining unresolved issues and how DBAs can tackle them. FULL DESCRIPTION: For the first part of the talk, I will discuss the possible strategies by which ORDER BY LIMIT optimization is handled in MariaDB (or MySQL) The strategies are: 1) Using an ordered index (ref, range or index scan) 2) Using filesort on the first non-const table 3) Using filesort on the temporary table, that stores the output of the join Then I will discuss how the current MariaDB/MySQL optimizer makes the choice between the strategies and show the situations where it will never get a good query plan For the second part of the talk, I will describe how a new cost-based optimization in MariaDB 10.5 solves the above issue. The talk will contain details about how the costs were taken into account during the optimization phase. Further, with the help of examples I would demonstrate how the execution differs for this new optimization and how this leads to improved performance for ORDER BY LIMIT queries.
33
35
Thumbnail
23:38
52
Thumbnail
30:38
53
Thumbnail
16:18
65
71
Thumbnail
14:24
72
Thumbnail
18:02
75
Thumbnail
19:35
101
Thumbnail
12:59
106
123
Thumbnail
25:58
146
Thumbnail
47:36
157
Thumbnail
51:32
166
172
Thumbnail
22:49
182
Thumbnail
25:44
186
Thumbnail
40:18
190
195
225
Thumbnail
23:41
273
281
284
Thumbnail
09:08
285
289
Thumbnail
26:03
290
297
Thumbnail
19:29
328
Thumbnail
24:11
379
Thumbnail
20:10
385
Thumbnail
28:37
393
Thumbnail
09:10
430
438
Mathematical optimizationSoftware developerLimit (category theory)Order (biology)Query languageDifferent (Kate Ryan album)Order (biology)Limit (category theory)UsabilitySoftware developerMathematical optimizationQuery languageComputer animation
Limit (category theory)Order (biology)Query languagePrice indexComputer fileRange (statistics)Subject indexingRow (database)Table (information)Subject indexingOrder (biology)CASE <Informatik>Rule of inferenceClosed setKey (cryptography)Game theoryPort scannerReading (process)Cartesian coordinate systemMereologyLimit (category theory)Range (statistics)QuicksortComputer file2 (number)Query languageComputer animation
Order (biology)Subject indexingMiniDiscLimit (category theory)Port scannerPrice indexStreaming mediaCondition numberFunction (mathematics)Order (biology)Subject indexingRow (database)Table (information)Limit (category theory)Computer fileCASE <Informatik>Function (mathematics)Keyboard shortcutLogical constantQuicksortMereology2 (number)NumberControl flowBuffer solutionLattice (order)Condition numberRule of inferenceClosed setTelebankingComputer animation
Function (mathematics)Limit (category theory)Order (biology)Mathematical optimizationLattice (order)ZugangsverfahrenQuicksortElement (mathematics)Lattice (order)Table (information)Order (biology)CASE <Informatik>Subject indexingLimit (category theory)PlanningStrategy gamePropagatorRow (database)Type theoryMathematical optimizationEqualiser (mathematics)Query languageKeyboard shortcutComputer fileComputer animation
Order (biology)Lattice (order)Limit (category theory)Information managementPlanningSubject indexingOrder (biology)Table (information)Computer fileKeyboard shortcutMathematical optimizationRow (database)QuicksortCASE <Informatik>Strategy game2 (number)Run time (program lifecycle phase)Function (mathematics)Limit (category theory)NeuroinformatikComputer clusterRight anglePoint (geometry)Graph coloringLevel (video gaming)Multiplication signComputer animation
Mathematical optimizationSign (mathematics)Limit (category theory)Mathematical optimizationComputer animation
Mathematical optimizationKeyboard shortcutPartial derivativeFraction (mathematics)Limit (category theory)Order (biology)Military operationEstimationLattice (order)Computer configurationExtension (kinesiology)Materialization (paranormal)Table (information)Loop (music)Function (mathematics)Level (video gaming)Limit (category theory)WhiteboardOrder (biology)Equaliser (mathematics)Closed setFunction (mathematics)PropagatorThomas BayesPropositional formulaPlanningInstance (computer science)Table (information)Operator (mathematics)Subject indexingLattice (order)Row (database)Fraction (mathematics)NumberRing (mathematics)Identity managementWechselseitige InformationCondition numberGodEstimatorOcean currentWell-formed formulaCodeQuicksortKeyboard shortcutComputer file
Condition numberTable (information)Limit (category theory)Order (biology)Limit (category theory)Function (mathematics)Table (information)QuicksortMereologyLattice (order)Buffer solutionResultantRow (database)Computer fileCondition numberStability theoryEvent horizonWebsiteSet (mathematics)Computer animation
Order (biology)Limit (category theory)Convex hullTable (information)Key (cryptography)Lattice (order)Prime idealDiscrete element methodCondition numberHistogramSelectivity (electronic)Predicate (grammar)Table (information)Computer filePlanningRow (database)Condition numberKey (cryptography)Selectivity (electronic)Subject indexingPredicate (grammar)HistogramMereologyMathematical optimizationRun time (program lifecycle phase)Limit (category theory)Order (biology)Field (computer science)EstimatorQuicksortResultantCASE <Informatik>Query languagePhysical lawHypermediaWorkstation <Musikinstrument>BitMixed realityFault-tolerant systemWebsiteIdentity managementComputer animation
Multiplication signSubject indexingOrder (biology)CASE <Informatik>Table (information)Computer animation
Prime idealOrder (biology)Limit (category theory)Lattice (order)SummierbarkeitLipschitz-StetigkeitRule of inference1 (number)CASE <Informatik>Mathematical optimizationEstimatorCondition numberDecision theoryHypermediaOrder (biology)Right angleSelectivity (electronic)Table (information)Row (database)Computer animation
Point cloudOpen source
Transcript: English(auto-generated)
Hello everyone, I will be talking about how to improve the performance of order by limit queries in MariaDB and what is what has been the history of order by limit queries in both MariaDB and MySQL. I work as a optimizer developer in MariaDB corporation,
I have been involved in the company for the last 3 years almost. So, let us start. So, now this part would be the history and then I will talk what is new in MariaDB 10.5. So,
to handle order by limit queries what are the ways which we can do it. One thing is that we use an ordered index. The ordered index can be used in 3 ways, you could have a ref axis where you could have some key part equal to constant and then with the remaining key parts the rows
would be ordered. Something similar would also happen with range scan and with index scan you cannot have with index and you cannot have an equality. So, it would just be a normal index scan on the index. Second case would be file sort, file sort would have 2 ways how we have
it either you sort the first table and then do the join with the remaining tables or you put the entire join inside a temporary table and then sort the temporary table. So, this is the case the first case that we will use with an ordered index. So,
the records from this table would be ordered by the order by clause if the index can resolve the order by clause and then you perform the join with the remaining tables. The important thing is while doing the join you cannot use join buffering an example would be because
join buffering breaks the ordering. So, you have to make sure that join buffering is disabled. Second would be if you have a descending clause in the order by clause then you have to make sure that descending is for all the key parts of the index that is mentioned.
Also with this approach you can shortcut the join execution you do not need to compute the entire join you can stop as soon as the you get the number of records mentioned in the limit.
Now, this would be the second case where you sort where you apply file sort on the first non constant table. So, here you read from the table T 1 and pass it to file sort and then you read the output of file sort and then perform the remaining join. So, it is almost
similar to how we have with an ordered index that you can again you cannot use join buffering. Another important thing would be any condition that is dependent on the first table would be computed before sending the output to file sort. To see if this approach is working in explain
for example, you will see using file sort in the first column of the first row in explain. And again this is also another way in which you will be able to shortcut the execution with
limit because the rows would be ordered already. Now, this comes the third part where we try to perform the entire join first and then put the records inside the temporary table and then
pass the temporary table to file sort and then we get the ordered output. To apply the limit you cannot shortcut the join execution you have to compute the entire join and then only when you do file sort then you can shortcut the execution of file sort that you just as
soon as you get the limit records you use tell the file sort to stop and you are done. With this case there is no issue with using join buffering or any type of join order is not allowed, but the main drawback with for order by limit queries is that limit cannot
be applied before and you have to compute the entire join and only then you can apply the limit. So, this will this could be very inefficient for smaller limits and in explain the first row again shows using temporary using file sort which means that you are
using temporary table and then doing file sort. Now, what all we do we currently not have in the order by limit optimizer we currently do not take the cost of sorting limit is not taken into account and the only way in which we are able to shortcut the join execution is after
a join order is picked we check if a use of an index can be done to shortcut the join execution or if there is something like equality propagation you could do. So, that you could just file sort by the first table that you pick by the join order
here, but the join planner does not take limit into a consideration these are all the things done after the join planner has picked the join plan. So, here is an example we have a order by a particular column on which we have an index the join optimizer
picks a strategy where we use to compute the entire join and then do file sort which has an executions time of 25.2 seconds. Now, if I try to force a particular join
order using straight join then you can see that here I T fact is using an index on column 1 to do the join execution and the execution time is like 30 milliseconds. So, if the join optimizer had any idea that with this index we could shortcut the join execution
we could have come up with a better execution plan. Now, here is another example where you have again order by T 0 dot d, but you do not have an index on this column the join
optimizer again picked an approach where it was using temporary table plus file sort. Now, if I try to force the join order now in this case it is using file sort on the first table and again it is because it is using file sort on the first table the output is
ordered it can just go ahead and shortcut the execution as soon as it gets the limit records no need to compute the entire join again. Now, what is new in MariaDB 10.5 is we are coming with a cost based optimization where the join planner would take into
consideration both the cost of sorting and the limit. So, motivation wise both pushing the limit and cost of sorting and again the main thing that I have been emphasizing we want to shortcut the join execution for limit. So, to consider the limit we try to push
the limit now how the limit is pushed is we try to find a join prefix that can resolve the order by clause that is the prime condition that you can sort the prefix with the order
by clause and then you can just push the limit. So, inside the join planner code we try we apply the sort operation here and then we push the limit. So, how is push
the limit calculated? So, what exactly is meant by push the limit is you find a prefix by which ordering can be resolved and then the records in the prefix when you push the limit you will only read a fraction of those records and this fraction
is how this fraction is how we consider by taking limit into account. So, this is the prefix which can resolve the order by clause this is the total join cardinality and then you have limit. So, this is the number of this formula would give us the number of records that we would read from the prefix that can resolve the order by clause.
And so, to push the limit first you need the estimate of join cardinality. So, we run the join planner once to get the estimate of join cardinality currently. Also you have access methods like indexes that I was talking about that already have the
ordering in that have already all the records ordered. So, these are also considered during the join planner stage. Now, this will be a bit technical. So, for each join prefix you have
two conditions if that prefix can resolve the order by clause you can either push the limit at that very instance or you can extend the prefix by another table and then push the limit. Why do we do this is we want to consider all possible plans where we can push the limit.
Also, equality propagation earlier equality propagation is always done after the join planner has picked the plan. Now, equalities are propagated before. So, if you have a join prefix like this T 2 comma T 3 and you have an order by clause by T 1 dot a
and there is an equality. So, we would consider pushing the limit both after this plan and after this plan. So, the equalities of the where clause are propagated and considered by the join planner. Now, how does the join execution work? So, as soon as we have a prefix that can
resolve the order by clause we try to materialize the prefix and put the records inside a temporary table then we sort the materialized table with the order by clause and then from the output of file start we start reading one by one and then we join with the remaining tables and again
the execution again stops as soon as the records are found. So, again in by this we are not computing the entire join. So, this is how the execution path exactly looks you have
tables in the prefix, you materialize the result and put them inside the short nest. Join buffering is allowed for all these parts because these are coming before we are performing the order by clause plus conditions the conditions that are dependent only on these
tables are checked beforehand rest are attached to the later tables. So, the shortness you have you push send the records to file sort and then you start reading file sorts output and join
with the remaining tables. Again now it is similar to the example where we had we had file sort on the first table and then we were doing the join. So, again no join buffering is allowed over here and the execution will stop as soon as the limit records are reached.
So, here is an example where we will be using a short nest. So, you have an order by clause on two tables. So, if you see in the explain it shows a short nest here which is materialized
result of nation and customer table and so, this is and then you can see that we are applying file sort on the short nest. Another interesting part is if you can if you look at this condition for orders tables it will not know anything about the customer key.
So, it needs to refer to the fields of the short nest and not of the base table itself because the base tables are now no now not at all over there they are all referring to the shortness table fields. So, this was one of the examples that I was demonstrating
earlier where I was showing the limitation. So, we have again the same case where we have a key on column 1 and this was earlier using temporary table plus file sort, but after this optimization it is able to use an index on column 1 to do the joint execution
and it provides a 1900 x speed up for this case. Also here is another query where with the dbt 3 data set which I have created here.
So, again the execution time which was picking temporary table plus file sort earlier is now picking an index to do the joint execution and short cutting the joint execution as
soon as the limit records are reached and again providing a significant speed up. So, the most important part for the optimizer to pick any plan is the selectivity of the conditions. This approach that MariaDB has taken for order by limit optimization
heavily heavily depends on the selectivity of the conditions. So, if someone wants to use it I would suggest at least use histograms to provide the selectivities to the optimizer because otherwise the estimates can vary a lot and when we push the limit then that can give
us plans that are not that good. Also another limitation is that we have few predicates for which selectivity is not known. So, what we try to do is we predict
the selectivity as 100 percent, but we have discussed this internally in the team to may be add some penalty for such cases because if this condition is highly selective then using a shortness or using this new optimization can be harmful for us.
And again this part that the estimate of joint cardinality are very pessimistic we over predict everything. So, again when we push the limit then our estimates can be very optimistic instead of being pessimistic yeah that is all. Thank you questions.
. No, I did not get the complete question
can you repeat the question. .
No but the where clause would be evaluated beforehand before you do the order by clause if it is if the tables are inside the shortness that is you have it may be the case that the
conditions you have in the shortness more rows and then when you are joining with the remaining
ones you are not finding enough rows to match that may be an overestimate yes, but if you give the optimizer the right selectivity estimates I think so it can make a smart decision and may include the order table inside the shortness yeah if for such cases then I do not think
so this optimization would work that well yeah.