Deep Dive Into Query Performance
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61712 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023516 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Query languageFront and back endsBitDatabaseCartesian coordinate systemBridging (networking)Computer animation
00:47
DatabasePoint (geometry)Service (economics)Query languageError messageDependent and independent variablesMathematicsBuffer solutionResponse time (technology)Point (geometry)Query languageService (economics)Cartesian coordinate systemTable (information)Error messageResultantComputer animation
01:47
Dependent and independent variablesQuery languageSoftware developerResponse time (technology)Query languagePoint (geometry)DatabaseComputer animation
02:11
Query languageQuery languageResponse time (technology)Cartesian coordinate systemDatabaseIdentical particlesTerm (mathematics)CASE <Informatik>Point (geometry)MereologyDifferent (Kate Ryan album)Computer animation
03:34
Query languageMultiplication signAverageMetreComputer animation
04:06
Electronic data interchangeLibrary (computing)Latent heatDependent and independent variablesQuery languageDistribution (mathematics)Source code
04:32
Dependent and independent variablesQuery languageDatabaseQuery languageInteractive televisionNumberMehrplatzsystem
05:30
Error messageQuery languageNormal (geometry)Dependent and independent variablesResponse time (technology)WeightError messageTable (information)Computer animation
06:26
BackupCAN busDependent and independent variablesMultiplication signSoftware maintenanceDirection (geometry)Multiplication signQuery languageFrequencyResponse time (technology)Dependent and independent variablesComputer animation
07:32
Dew pointGamma functionMeta elementQuery languageResponse time (technology)Query languagePoint cloudDatabaseCartesian coordinate systemView (database)Self-balancing binary search treeDependent and independent variablesQuicksortE-bookProjective planeState observerCASE <Informatik>NumberTerm (mathematics)InformationSoftware developerHard disk driveCombinational logicFlagMeasurementParameter (computer programming)BitSoftwareMultiplication signAugmented realityStandard deviationComputer animation
10:17
Meta elementQuery languageSoftware testingInformationRevision controlFunction (mathematics)Revision controlFocus (optics)Uniform resource locatorIntegrated development environmentCore dumpSubsetQuery languageFunctional (mathematics)State observerComputer animation
11:07
Group actionIntegrated development environmentDatabaseProfil (magazine)Goodness of fitQuery languageCASE <Informatik>Cartesian coordinate systemWindowQuicksortBitVideo gameDifferent (Kate Ryan album)Parameter (computer programming)Computer animation
12:31
View (database)DatabaseCustomer relationship managementTable (information)MathematicsQuery languageDatabaseTable (information)CASE <Informatik>Different (Kate Ryan album)Subject indexingComputer animation
13:10
Service (economics)System identificationDirection (geometry)Goodness of fitInformation securityDatabaseService (economics)Multiplication sign
13:52
Integrated development environmentChemical equationPoint cloudCASE <Informatik>Query languageDatabaseSubject indexingInstance (computer science)Configuration spaceServer (computing)Web 2.0Different (Kate Ryan album)Cartesian coordinate systemComputer animation
14:38
Server (computing)Service (economics)Instance (computer science)CausalityInformation securityCodeConfiguration spaceRevision controlClient (computing)Cartesian coordinate systemInstance (computer science)Multiplication signClient (computing)Query languageLimit (category theory)DatabaseData storage deviceMathematicsComputer animation
15:59
Query languageDifferent (Kate Ryan album)Group actionMathematical optimizationCASE <Informatik>Query languagePlanningDifferent (Kate Ryan album)Computer animation
16:43
Dependent and independent variablesBefehlsprozessorComputer networkMiniDiscQuery languageBefehlsprozessorInheritance (object-oriented programming)Physical systemResponse time (technology)Latent heatMultiplication signComputer animation
17:30
VolumeQuery languageQuery languageSoftware developerJava appletComputer animation
18:10
Queue (abstract data type)Dependent and independent variablesQuery languagePhysical systemBackupPoint cloudQuery languageOperator (mathematics)Instance (computer science)DatabaseInteractive televisionGoodness of fit
19:31
Focus (optics)Query languageElectronic mailing listTable (information)Statement (computer science)InformationMultiplication signCASE <Informatik>Right angleVariety (linguistics)WeightSocial classQuery languageStatement (computer science)InformationElectronic mailing listPlanningInternet service providerSet (mathematics)Table (information)QuicksortParsingComputer animation
22:17
TwitterMultiplication signPlastikkarteSubject indexingStandard deviationDifferent (Kate Ryan album)Right angleMathematical optimizationINTEGRALDatabasePlanningVolume (thermodynamics)Query languageComputer animation
24:50
Program flowchart
Transcript: English(auto-generated)
00:05
Now, we already spoke here a little bit about developers, right, and especially the front-end developers, right? And one purpose of this talk for me is to really sort of bridge the gap, right, which
00:23
I often see between the people who really have databases at the center of their, at least professional life, right? And the people who are writing the application and, you know, databases are like a thing. It's like a toilet, right? You do your business and you move on with your life, right?
00:42
I mean, something like that, right? For those people, the database is typically like a black box, right? There is this black box, right? And what I want is I connect to the service point which is provided to me, right? I connect it quickly. I run my queries and that's all I care about.
01:04
And all that kind of, you know, change buffer combo, yeah, never heard about it, right? What about queries, right? What would you as a developer think about queries? Well, these are actually pretty simple things, right?
01:20
When you connect it to a service point, you add queries. You want them to complete with no errors, right? You want them to provide you correct results set because if you wouldn't, we could alter all the MySQL tables to black hole and get a fantastic performance, right? No errors too. And also you want them to make sure that they complete in that response time what
01:43
your application expects, right? And I think that is a very important thing to understand. If you look at from the individual developer standpoint, right, like write an application, hey, performance response time for my queries is all I care about. And how that internal database kitchen works, you know, somebody else's problem, right?
02:06
Now if you think about the response time from the database point of view, right, that is often seen like, well, I see that response time for a query is an average, right, or whatever distribution, we'll talk about that later.
02:22
But that is different from what business cares about, right? If you think about the business point of view, you think about, well, do my user have outstanding experience in terms of performance if all of the application interactions, right? That means, you know, like a search should work and place and order should work, right,
02:44
and whatever. And database is an important part of that, of course, but not is there a complete part. What is interesting in this case is what as database engineers, we often talk about those kind of different events, kind of like that performance and the downtime and
03:03
say, well, you know, no, no, we weren't down, it just was taken 50 minutes to run my very basic query. Well, from user standpoint, that performance, very bad performance, isn't distinguishable from downtime, because, A, we don't have patterns, right, then even if people are
03:22
very patient, then the browser, right, or some other timeouts will happen and nobody gives a shit about that query, which may still continue to be running. Another thing to understand about query performance is you do not want to focus on the averages, right, I like this kind of one saying, but there was one slave demand who tried to cross
03:46
the river in the average one meter deep, right, that is same applies to the query, right, if your average query time is x, well, that means pretty much nothing, right, you need to understand more about that.
04:05
And I like, in this case, to look at their, you know, percentiles, and even more, to make sure you can look at the specific distribution of your, you know, of a query, query response time, if you, that, that gives you a lot more insight.
04:26
Now, one thing to understand about the percentile, you may be looking and saying like, well, great, my queries have this decent 99 percentile, right, but, but that does not mean on a business
04:41
side what 99 percent of your users have a good or like acceptable experience, why is that? Well, because guess what, the single user interaction can correspond to a lot of queries sequentially, right, which all adapt and typically to through their journey user has a number of those interactions, right, so I would say like even 99 percentile that may,
05:05
well, depending on your application, only correspond to like 50 percent of user session, so if you really see the complicated large environments, right, they are really focused on
05:20
either relatively, you know, short SLA for or the rather high percentiles. Another thing I would encourage you to pay attention to is errors, right, and make sure you are measuring response time for those as well because errors actually can be of two kinds,
05:42
fast errors and slow errors. In certain cases, let's say if your table doesn't exist, right, you may be like get the response time straight away and if you put all your error queries and actually normal queries in the same bucket you may say oh my gosh my response times are doing
06:00
kind of so well, right, but on the other hand if your query is, for example, error is a lock weight timeout, then that is a slow error, right, it actually will have a higher response time, right, than the normal cases, right, that is why I always suggest to make sure we measure response time for normal queries and for queries with problems differently.
06:26
Another thing which is very important is looking at response time over time, right, because traffic changes, a lot of things are going on in the system, right, and just saying hey I'll have a response time of x over some long period of time, it's not very helpful.
06:44
Also what you would see in many cases, you still start with like a small performance problems, right, maybe so you know like SLA violations which are if unfixed they convert in the downtime, like for example in MySQL world, right, you may say well I have
07:06
forgotten this kind of runaway query and my history accumulates, right, it will slowly increase and increase your response time. If you're measuring that over time and say well something is not trending in the right directions you probably can fix it before
07:21
that will be seen as a downtime by your users, right, if you are not then not so much, right. And this is example what we have here, right, what you often may see something like this as well for where all the queries have like a spike in the response time which you often
07:44
make a response to something external happening in the environment, right, and I think here is what is very interesting especially for us running in the cloud we only have limited observability to environment, right, if there is some shit going on on an Amazon back end they're not going to tell us that, right, oh you know what we had let's say some you know
08:03
hard drives failed which back our your EBS and we had to you know some rebalance yada yada, right. Okay, the another question I would ask is where we want to measure response time, right, from queries. In my opinion both application view and database you are in
08:24
the combinations are very helpful because the application can see real thing, right, if your network for example is adding some latency or whatever it is you will see that from application not so much in the database, right, because it only sees from hey I got response to
08:43
then it's sent the data back but the database view allows you often to see a lot more insight about what has been going on inside there from application side we often can you know just capture the query time maybe some very basic additional parameters. So what we spoke
09:05
from our business view, right, well we already said what that all users have outstanding performance experience of all the application interactions, right, let's now try to break it down a little bit more, right, to what that may mean. In this case I want to introduce this
09:26
little project or flag from you this is SQL Commander project by Google, right, I mean which is pretty cool in terms of what it allows to pass you the metadata, right, which you understand as developer all the way to SQL query. They implemented that support
09:45
through a number of frameworks, right, and it's also supported in their Google cloud monitoring environment, right, and I would very much see that developed more, right, and for at
10:01
least kind of us come to some sort of shared standards between the databases, right, or wherever, how we can augment query information with sort of like tags, values, right, which users care about. So what are the possibilities which can be quite helpful in this regard? Well finding
10:21
for example who is our actual user tenant whose query corresponds, right, because we often may have, you know, different performance issues, right, finding the application like or some sort of like a subset of location functionality where many of them may be hitting the application, right, version information, maybe information about like their engineer of a team who is responsible
10:46
or not. I often see DBAs or SRAs team having problem like oh I see this nasty query which was shipped yesterday. I know because it was shipped today because I know it wasn't fair yesterday, right, but now having to find out who the hell introduced that stupid query may be problematic
11:04
in a large environment. Now a lot of focus and I think the core of a query-based observability may be about the query and by query I mean obviously like a query with sort of like it's
11:20
which are same except different parameters and that is very helpful because well obviously they have a different complexity, different expected SLA and so on and so forth. The next way also to break things down for me would be to look at the schema or database and why is
11:45
that interesting? Oh I just let us know just right now what it's being cut a bit. Yeah well anyway life is life. I'm just not going to be lucky in this room, right,
12:03
but well we can blame our windows, right, on this conference we can and should blame windows. Okay well why schema and database are also good because often we would separate in the multi-tenant applications different tenants by schema, right, and in that case
12:27
that gives us a good profiling for performance of their different schemas, right, like we can see here an example with the PMM tool. Another thing what I found is very helpful to
12:43
find a way to separate the data by different tables, right, in many cases you want to say hey you know what how our query is hitting given table is affected especially if you did some change which relates to the table. Hey you know what I changed the indexing on this table,
13:01
let me see how all the queries hitting this table is impacted. Very helpful because there may be some surprising differences. Database users, that is another thing which is quite helpful because that often allows us to identify the service or application, right, if you're following
13:21
good security practices you would not let all your applications, right, just use one username, you know, not a good idea, right, and also find human troublemakers, right, which are doing having direct access, right, and so many times you'll find somebody, you know, running the query, right, and say okay well yeah it's slow but wherever I'll go for lunch,
13:45
you know, I have time, well you may have time, but your database may not, right, so we also, like here's example how we, you know, provide that. I also mentioned database host and indexes in many instances, in many cases that is very helpful because
14:07
even if you may think oh my different database instance should perform the same, well world is a messy place and world in the cloud is even messier place, right, they may not exactly have the same performance due to, you know, some strange configuration differently,
14:25
having a bad day, right, or even maybe having a different load, right, and that is a good to be able to break it down, right, when you see some of your queries are not performing very well. I would also look at the same stuff from a web server or application server instance
14:42
because again if you have like maybe like 100 nodes, you deploy the same application, you may think hey they're all going to perform the same, hitting the database, well that is not always the case, right, I have seen changes from people saying one of them is misconfigurable for some reason, cannot connect the cache, so it's, you know, hitting, you know, 10 times more
15:04
queries right on database than it should be, or the application rollout didn't go well, where you've eliminated nasty query on 99 of application instances but not some others, right, it's very good to actually be able to validate that because what you would see or like again
15:21
from a DBA standpoint, you know, developers, sysadmins, storage people, they are going to tell you shit, right, but they are going to lie, right, they are going to lie, right, maybe not intentionally, maybe because of their ignorance and limitation of their tool, but as a DBA, a CD or something, you want to point them out to their shit and say look, please, I have evidence,
15:46
right, evidence is good, right, so clients costs, custom tags is very helpful if you can extend, that is what we spoke about the SQL commenters, something else which I find very
16:04
helpful which we cannot quite easily get with MySQL but being able to separate the query by the query plans, right, often you may have a query which looks the same but it may take different execution plans, right, and often that may be correlated to its performance.
16:23
In certain cases it is totally fine, right, very different situations, sometimes MySQL optimizer may get a little bit, you know, crazy just and have that optimizer plan drift for no good reason which may not be very easy to catch, right, and would be helpful to do. What I also would like to
16:46
highlight, right, is when you find the specific query and say hey this query has, you know, nasty performance, right, we often want to understand where that query response time comes from, right, and that is some of the things, right, where it can come from, certain of them are relatively easy
17:08
to find out, right, certain are not very well, right, for example, wherever query has waited on available CPU, right, because system was overly saturated, well, you can't really see on
17:21
query basics, right, you can only see those things, well, my CPU was kind of like a super packed, right, on a period of time. Okay, here are a couple of other things to consider when you're looking at the queries. One, you want to really look at separately at the bad queries, right, versus victims because sometimes you will see, oh, queries are getting slower but it's not
17:44
because of them, it's about some other nasty queries, right, maybe that is your Java developer who thought, well, you know, to solve my problems I will just launch this stuff with, you know, with 200 threads, right, and make sure I am good but everything else is kind of
18:04
slowed down, right, and that's maybe tricky. One thing is what you should not forget the running queries. In many cases, like if you look in performance schema, queries by dash, that gives you what happened in the past but believe me, if you started, you know, 50 instances
18:23
of some very bad query which continues to run, well, that may be the reason of your problem, not the past, right, and to connect to that, I think it is less problem in MySQL right now, right, if you're using query timeouts which is a very good practice, right, because if you say, you know what, for all my interactive queries, by default, I set the timeout of, let's say,
18:44
15 seconds, then you should not care too much about your past queries because, well, you know what, everything gets killed after 15 minutes. Also, 50 seconds, right, you should not ignore stuff which is invisible from a query standpoint, right, databases do a lot of shit in the
19:04
background, you may also do things like your operation teams like, well, backups or provisioning another node for cloning, right, or the cloud or wherever your VM system may need to do something in the background, it may not be directly visible but that can impact the query performance,
19:22
right, so sometimes, well, when you observe a query impact and you can't really see what is causing that, it's possible. I also would encourage to avoid what I would call like a biased something. I see people sometimes would say, hey, you know what, we will set long query time to one
19:41
second and only look at the queries which are more than one second in length, well, you may be only focusing on the outliers, right, and missing the possibility to optimize our queries, right, or actually even focusing on the queries which provide, which are responsible providing that experience, right, for your users. Okay, we find another thing like a last minute
20:09
I have or something, I wanted to say, hey, what I would like to see from MySQL to do better. Where is Kenny? No Kenny? Yes, he's always hiding, right, he probably wanted to
20:24
get another sandwich, damn it. Okay, so here are some things that I would like to see. One is better support of prepared statements, right, and right now, right, it's kind of, you know, not done in the same way, right, which is, I think is a problem,
20:44
right. Now I would say consider grouping data by time in certain cases. Right now you get like all the statements in one table, right, and you have a lot of statement variety that table tends to overflow, right, which is not really helpful, right, and if you have to kind of
21:03
reset your queries all the time, that is not very, you know, good practice in my opinion. Provide list of tables query touches, right, that is very helpful because, well, MySQL parser already knows it, right, it knows tables or query touches, but it's very hard to parse
21:25
it out from a query, especially if you consider views, right. I don't know by looking at the query alone wherever something is a table or a view, right, so in this case. Information about plan ID, right. I would like to see for the query, right, some sort of plan hash or something so I
21:45
know then query is using something like that and also what I would call like a top weight summary, right. Right now we have information about the weights in MySQL performance query and about query, but I cannot see and say oh that query was slow because it spent xyz amount of
22:05
weight on something or whatever, right, or at least kind of like some small class of queries, right. Yeah, I don't think that's convenient. Well, with that, that's all I had to say. Hope that will help you to avoid tuning your indexes by the credit card.
22:28
And, yes, oh, I have time for questions. You told me like, Peter, five minutes. Oh, to answer, I have time for questions. Yes, any questions? No? Oh, yeah.
22:45
What's the difference or advantages of this SQL commenter thing compared to what OpenTracing standard does because it starts kind of tracing the whole thing? What's the difference of SQL? Well, what I would say in this case, yes, I mean there is obviously OpenTracing framework, right, which you can use. This gets specifically to the database
23:07
and specifically in every query, right. If you look at the OpenTracing framework, I think, you know, getting every query, right, maybe a lot of volume out there, right. And again,
23:25
I also think, well, the good thing if also SQL commenter, right, is what that does it automatically, if you will, right. That does not require you to take an extra integration.
23:42
Okay, anybody else? Oh, yeah, I mean, it works with MariaDB as well. Yes. Well, there are practices,
24:11
there are no good practices, right. Like you can, there is a lot of optimizer hints you can use, right. So you can actually force the query to go like this particular stuff,
24:23
right. But that also prevents optimizer choosing different plan if better plan becomes available. Yeah. Okay, well, then thank you folks.