We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Practical sysbench

00:00

Formale Metadaten

Titel
Practical sysbench
Untertitel
Benchmarking mysql and IO subsystems
Serientitel
Anzahl der Teile
199
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This session will be about benchmarking MySQL and disk IO subsystems with sysbench and interpreting the results. In our consulting company, I helped a reasonable number of customers with sysbench so I know the common caveats most people run into. This talk will cover benchmarking IO subsystems with fileio tests, as well as benchmarking MySQL.
65
Vorschaubild
1:05:10
77
Vorschaubild
22:24
78
Vorschaubild
26:32
90
115
Vorschaubild
41:20
139
Vorschaubild
25:17
147
150
Vorschaubild
26:18
154
158
161
Vorschaubild
51:47
164
Vorschaubild
17:38
168
Vorschaubild
24:34
176
194
Vorschaubild
32:39
195
Vorschaubild
34:28
BeanspruchungElektronische PublikationBenchmarkSimulationBitEin-AusgabeMultigraphRückkopplungKondensation <Mathematik>EDV-BeratungMultiplikationsoperatorSoftwaretestProzess <Informatik>Kartesische KoordinatenPhysikalisches SystemPaarvergleichZahlenbereichSchnittmengeErneuerungstheorieOrdnung <Mathematik>RichtungFormation <Mathematik>Formale SpracheComputeranimation
MultiplikationsoperatorEvoluteMailing-ListeFrequenzTermGarbentheorieBasis <Mathematik>EreignishorizontBildverstehenBildschirmmaskeWeb SiteSummierbarkeitMereologieProzess <Informatik>MathematikWort <Informatik>FlächeninhaltDatenflussQuick-SortCASE <Informatik>ResultanteAggregatzustandSchlussregelBitt-TestArithmetisches MittelProgrammierumgebungInformationRichtungSchreib-Lese-KopfGruppenoperationRechter WinkelKlasse <Mathematik>DivisionSelbst organisierendes SystemGeradeAutomatische HandlungsplanungRechenwerkEntscheidungstheorieEinfach zusammenhängender RaumForcingEinfügungsdämpfungEinsWasserdampftafelKontextbezogenes SystemLikelihood-FunktionZellularer AutomatEndliche ModelltheorieWellenpaketGesetz <Physik>Exogene VariableZahlenbereichLokales MinimumVarianzPhysikalisches SystemNichtlinearer OperatorVollständigkeitKernel <Informatik>SynchronisierungLesen <Datenverarbeitung>ZweiGraphThreadInformationsspeicherungFunktion <Mathematik>BenchmarkSpeicherabzugParametersystemNotebook-ComputerPlotterElektronische PublikationZufallsgeneratorTotal <Mathematik>PunktRechenschieberResponse-ZeitFormale SpracheClientDatenbankTabelleMinimalgradBus <Informatik>Globale OptimierungSkriptspracheMAPServerInterpretiererFehlermeldungSchreiben <Datenverarbeitung>IterationEin-AusgabeDifferenteCharakteristisches Polynomp-BlockDickeWiderspruchsfreiheitSoftwaretestHalbleiterspeicherProtokoll <Datenverarbeitungssystem>GraphfärbungJensen-MaßDeklarative ProgrammierspracheObjektorientierte ProgrammierspracheInnerer PunktInverser LimesTeilbarkeitMixed RealityTransaktionEinfache GenauigkeitProdukt <Mathematik>Metrisches SystemDatenreplikationNatürliche ZahlRandwertRandomisierungStellenringLoginSpannweite <Stochastik>PhysikalismusFolge <Mathematik>Computeranimation
Transkript: Englisch(automatisch erzeugt)
So let's start. Hello, everyone. My name is Peter Borosz, and I work as a consultant at Percona. And we will talk about Sysbench and benchmarking and a bit about benchmarking in general.
This talk is beginner level, so you would benefit the most from this if you have ever seen Sysbench or did very little stuff with it. This is mostly based on my experience that when I have people about benchmarks, that's what was useful for them, and it's kind of condensed into this talk.
So we're going to talk about benchmarking in general. We will talk about benchmarking iOS subsystems with Sysbench. We will talk about benchmarking MySQL with Sysbench. And I will give you some tips on how did I produce
the graphs I used, because last time I gave this talk, this was the first question. So what is a benchmark? A benchmark is a synthetic workload. And it doesn't really represent a real-life workload, but it is really good for comparison.
And you can measure your system's benchmark, or you can simulate your actual workload. But the actual workload, the actual workload, simulation of the actual workload is a lot less predictable than the benchmark.
But the benchmark is not necessarily close to the real workload of your application. So it's like with everything in IT, you can choose if your testing method will correspond this way or the other way. Or you can put it better that an engineer's job is to meet the right compromises.
Sysbench is pretty easy to use. I recommend to always use Frank, 0.5 Sysbench. This is how you can combine it. Frank creates nice packages for Frank and Valora out of it. I always use his stuff, almost.
So let's do a five-level benchmark. This is how you create a set of files for five-level. We create 32 bits of files here. The number of files are 32, so each file is one bit. And the feedback command will take the files
which the tests will be performed on. And the benchmarks which either this counts or counts were made on my laptop, so they are readily reproducible. All this was benchmarking my laptop's SSD. So there are some parameters which we were using in the benchmark.
First is the block size. We were using 16-bit because that's the only big size. We were using the total size of files and the number of files. The number of files can be important because some file systems, like .exe and .exe4, have a lot of problems on writing the inode tables.
So if you do it with only one file, you make it. And the database is usually more files as well. We will use direct file. We will initialize the random numbers, the single random numbers at the beginning of the benchmark.
So if we wanted a bottleneck or random number generation, we will vary the number of threads. We will test synchronous and asynchronous IO. And we will test random reads, threads, and random reads, threads in mix. OK, we will also use the next request parameter
so we don't limit the benchmark based on requests the next time parameter. And we will capture the metrics every second. So this is how the output looks like. This is a write test.
This here shows you the sample, reads throughput, threads throughput, and I think the response time for the needed operation. And the response time is 95% the response time. And when I said that we vary the number of threads,
does this actually matter, right? We are benchmarking the rest of SSD. A lot of SSD can do 1000 IOPs or 2000 or whatever. It's a parameter of the SSD. Do we have to bother with threads?
Yes, probably. The core concept for this is independent, which happens to be the core concept for 95% of data is related to questions. So, in case of synchronous IO, our first test, we will wait until the IO request completes
and only then have the result. In case of asynchronous IO, we don't wait until the completion of the IO request, but rather check back later and see how it goes. So in case of asynchronous IO,
adding more threads or removing them doesn't matter that much, because we are not doing this in this. In case of synchronous IO, the number of threads we will use matter a lot. You will see it from benchmark results.
I like to show stuff live, so I wrote a little script, which just does the final work.
So my laptop can do 44 banks in this task. If we check the demo, then we see that this is synchronous IO on a single cell.
Does it matter if I change the block size here, right? I have an SSD in this laptop, so if I did smaller or larger blocks, will it matter? To put the question differently, is sequential or random IO an immediate difference on SSD?
We are so lucky that I have this laptop here, just so we can think. If I set the block size to, let's say, 1 man, it will be a lot faster. So the nature of IO,
if it's sequential or random, still matters on flex storage, but not as much as on spindles, just the boundaries are in different places. I just changed the block size to 1K, and for 1K blocks,
performance will start. It will be slower and slower if I lower the block size, but if I lower it under 4K, which is the physical block size of the device, here I never write 1K, I always write 4K at the physical level. So this is why,
if the block size reaches a certain size, then it will get a lot more slow. So this is a graph of the line throughput of my laptop in case of AsyncOSIO,
and we decided that the number of strands doesn't matter. And this is a graph from the response time in milliseconds. And let's stop a bit here. Is it that good? Why not?
Because otherwise I wouldn't ask, right? So what is not visible from this graph is that the response time is between 10 and 20 milliseconds. But is it more like 10? Is it more like 20? What's the distribution?
That's why I like jitter plots, because from this plot we will see that 10 is more frequent, because we have more frequent samples, and we can see that it's more like an either-or relationship, that sometimes it's 20,
but more or less it's 10. And the other good answer is that can we measure the response time accurately in case of AsyncOSIO? And we can't, because we don't wait for the completion of the actual IO,
we will check back later if the IO operation is completed, and we can record the time when we get the results from that check, but at that time the IO operation may be completed for a while.
So this is again the write throughput, and here it is visible that the read throughput of my SSD is much higher. So it can read much faster than it can write. And we can also see that the read performance is consistent.
From the jitter plots I like to use alpha channels for the individual dots, so if they plot it on top of each other it will show a sharper color. And here you can see the read response time.
Since then you can also do mixed throughput, and this is one rookie error that usually people make, that OK, my storage can do 400 lengths of reads, so it will be OK. But as soon as you mix them,
the read performance, the characteristics of read performance will change. So if I do reads and writes at the same time, I'm not able to do 400 reads throughput anymore. So it seems, and here is the response time of this graph,
and the red one is the read, the read throughput, and the blue one is the write. OK, so in last November I started a MySQL instance, and it wrote that it is using Linux native asynchronous IO.
And I told you at the beginning that we will discuss results for synchronous IO, so do we have to do it? At all or we can stop here? Because it doesn't matter. It does matter.
Asynchronous IO is used for check pointing, but for example the redo logs are written synchronously. And this is a synchronous write throughput, and this graph doesn't have time in the x-axis, but rather the factor of the number of threads. And if you check this one,
so the write throughput at a single thread, why is this important? In MySQL 5.5 and in 5.6 if you have a single schema, because replication is single-threaded, right? Which means that replication can't go faster than this.
Why? The throughput of my storage at 16 megs is a lot better. 16 threads is a lot better. CPU?
2. But the benchmark is not CPU-intensive. So I use this many threads, but this laptop has 2 CPU cores.
So if you graph the response time, you can see that up to a point, up to 16 threads, the response time somewhat increases, but after a while, if I keep adding threads, the throughput doesn't change too much, but the response time will skyrocket, right?
So, and this is another point to make, that every system has an optimal degree of parallelism, and if you go over that, then you will get whatever the response time is. And this is why it is a bad idea to set max connections from MySQL to 10-20,000
or something like that, because if you start using all of them, you don't think this will happen. Similarly for leads, so if I use more threads, the leads throughput will increase, but again, similarly,
the response time will just go up after a while. And this is for leads. So similarly, my lead performance in isolation was pretty good, but if I do this benchmark, I won't be able to do 400 megs for leads anymore,
and the response time is pretty similar, so after a while, the throughput doesn't grow, but the response time starts to grow exponentially.
Ok, so, so far this was FileIO, benchmarking your IOS subsystem, and now let's talk about benchmarking MySQL. Here is how you can prepare a test database, so you create a schema,
and use the schema and use the parallel declare.lua script to create your test database in parallel. And here is how you do one benchmark iteration. One more thing with this is,
usually you write scripts like this, that for all test modes, for all IOS modes, for all thread configurations, you do the benchmark and at the end you will aggregate it. Similarly, this is how a typical OLTP benchmark looks like,
that you write a number of threads, and you measure the threads, if you want to write another parameter, then you can do it as well, so I recommend you write basic scripts like this.
I think the most well-known benchmark in systench is OLTP, it has some parameters like the table size, and it has some parameters for experience as well, like what should the int size be, how many point selects should be in a transaction, how many simple edges should be in a transaction.
This is by the way in systench in common.ua, so you can find the definition in common.ua. And for database benchmark, I chose to examine the workload, now it reconnects pretty frequently.
So we won't do the OLTP benchmark now, but I would like to show you, how can you use systench to reproduce an issue which you may see in the production environment, and in kind of a sample, by modifying the Lua.
So let's edit, let's examine through my systench, let's examine the updateIndex.lua,
which we will use for this, and I modified this script a bit, that in event function, I do a DB disconnect, which means that after every transaction, systench will disconnect from the database. So what do I simulate here? Let's say a PHP application,
where the PHP interpreter only runs while the HTTP request is alive, and at the time it will reconnect to my SQL. So, if I do the benchmark without the disconnect,
then it will be OK. So, this is the throughput again, for updating by single language on 32 threads. Here the tables used in the benchmarks are,
the benchmark is rather small, so all of this is in memory. But, if I add the disconnect back, and do it again, then for a while it will be kind of OK.
You can also see that in MySQL, the connections are cheap, so it doesn't matter that much, if you are reconnecting or not, it matters a little. And now MySQL sucks, right?
Oh, it sucks again. And again. What happened? MySQL is not running? Yes, it is running. Oops. It sucks again. It's here. What happened here?
No? Also, it's in time wait. Yes. So, the issue here is that when you close a MySQL connection,
the protocol level, the MySQL protocol level doesn't close the TCP IP level connection. So, if I check, also it's in time wait, and here I'm actually out of the local
for TCP IP. How can I fix that? I can limit the number of connections the number of connections at the kernel level that can be in time wait.
If I limit this, then I can't have more than a thousand connections in time wait. The kernel will simply kill it.
And my benchmark is fixed now. And this will end forever. Because I won't go out of the local port range anymore. So, if we do this again,
we keep roughly a thousand because the rest is killed by the kernel. If we check what happens here, no, it's not here, but here if your traffic is high enough, then the kernel can complain
that if you had TCP IP connections, I tried to break MySQL connections with that and I wasn't successful. So, looks like these TCP IP connections are there for like no reason. When you close the MySQL level connection,
the server doesn't initiate the TCP IP level connection, but it is initiated by the client later, so the connection will be reused. I don't know if it's probably it would be better for the server to initiate the TCP IP connection close when
the client thread is closed. Okay.
Which I almost forgot. My colleagues and friends from Belgium are organizing the community dinner and they asked me to show these slides. I haven't seen them myself, so I'll check it out with you and improvise. This is the first one. It has some nice
logos on it. I guess that they are the companies who are sponsors, right? Yes, okay. In this slide there are some text and the map. The text says which bus should you take to get to the community dinner. So this is kind of a useful information part.
And here is bus and table. So you don't have to send it away.