We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

MySQL 8.0 Performance: InnoDB Re-Design

00:00

Formal Metadata

Title
MySQL 8.0 Performance: InnoDB Re-Design
Title of Series
Number of Parts
644
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
29
48
Thumbnail
52:23
116
173
177
Thumbnail
24:07
182
206
222
Thumbnail
48:23
287
326
329
Thumbnail
25:09
349
356
Thumbnail
26:14
361
Thumbnail
51:22
373
Thumbnail
25:25
407
411
423
Thumbnail
10:54
449
450
Thumbnail
24:27
451
Thumbnail
27:12
453
459
Thumbnail
34:41
475
Thumbnail
18:23
489
Thumbnail
40:10
496
503
Thumbnail
12:30
515
Thumbnail
05:10
523
525
Thumbnail
19:53
527
538
Thumbnail
25:25
541
Thumbnail
25:08
565
593
596
635
639
System programmingDatabaseBenchmarkSource codeQuery languageSubject indexingMathematical optimizationScalabilityDatabase transactionPatch (Unix)Server (computing)Vector potentialPoint (geometry)Block (periodic table)Data managementCache (computing)Partition (number theory)Fundamental theorem of algebraLocal GroupThread (computing)Data storage deviceSpeicherbereinigungOcean currentGraph (mathematics)Exception handlingFile systemMultilaterationData storage deviceCache (computing)Reading (process)Key (cryptography)Computer hardwareSound effectThread (computing)Subject indexingSpacetimeQuery languageMultiplication signEndliche ModelltheorieCommitment schemeVirtual machine2 (number)Data structureArithmetic progressionMathematicsScaling (geometry)Point (geometry)Complex (psychology)Computer clusterLoginDatabase transactionCore dumpDifferent (Kate Ryan album)Data managementReal numberProxy serverPartition (number theory)Operator (mathematics)WorkloadCodeBlock (periodic table)Linear regressionFlash memoryFilm editingTable (information)Adaptive behaviorHash functionTheory of everythingRight angleRow (database)Content (media)WindowDisk read-and-write headWhiteboardDesign of experimentsStaff (military)MiniDiscoutputResultantFile formatFood energyFiber (mathematics)Square numberProduct (business)Computer fileInsertion lossComputer animation
Process (computing)Channel capacityConfiguration spaceLine (geometry)Computer configurationExecution unitScalabilityBefehlsprozessorSocket-SchnittstelleWritingData managementTable (information)Network socketFlash memoryGame theoryLimit (category theory)WorkloadGodData compressionWeb pageDefault (computer science)Military operationGame theoryBlock (periodic table)Data storage deviceBefehlsprozessorArithmetic progressionSemiconductor memoryDatabase transactionMultiplication signNetwork socketWorkloadWeightRight angleIdentity managementExecution unitDisk read-and-write headNumberNatural numberDifferent (Kate Ryan album)Web pageLinear regressionWechselseitiger AusschlussData compressionOnline helpPhysical systemThread (computing)Software testingBuffer solutionCentralizer and normalizerMaxima and minimaMiniDiscRevision controlData managementFlash memoryCodeOcean currentConfiguration spaceSoftware developerResponse time (technology)LoginPoint (geometry)MultiplicationTheory of everythingFrequencyLine (geometry)Digital electronicsDefault (computer science)ResultantComputer configurationLevel (video gaming)PrototypeTouch typingRead-only memoryMehrplatzsystemGroup actionForm (programming)Reading (process)MathematicsEndliche ModelltheorieReal numberVideo gameFood energyCASE <Informatik>Operator (mathematics)Row (database)WritingComputer clusterDependent and independent variablesSynchronizationTunisComputer animation
Software testingWorkloadCASE <Informatik>Process (computing)IntelWeb pageBefehlsprozessorEquals signMathematical optimizationDisintegrationDatabase transactionContext awarenessHill differential equationGauge theoryCodePatch (Unix)Bit rateLoop (music)Linear regressionNormal (geometry)Software bugFeedbackGroup actionSystem callProcess (computing)Database transactionEndliche ModelltheorieReading (process)Computer-assisted translationDoubling the cubeMultiplication signWorkloadLogicArithmetic progressionSoftware bugContent (media)Artificial neural networkReal numberLink (knot theory)Point (geometry)CodeFlash memorySoftware testingHydraulic jumpOcean currentPareto distributionLevel (video gaming)Object (grammar)Wechselseitiger AusschlussData storage deviceSelectivity (electronic)Structural loadWeb pageRule of inferenceScheduling (computing)Cartesian coordinate systemLoop (music)Programmer (hardware)Patch (Unix)WhiteboardVideo gameProduct (business)Different (Kate Ryan album)Operator (mathematics)CASE <Informatik>Theory of relativityFunction (mathematics)Thread (computing)Food energyRight angleGoodness of fitRevision controlCanonical ensemble2 (number)BitStaff (military)PlastikkarteAlgorithmMusical ensembleProgrammschleifeTheoryGroup actionComputer animation
AreaDoubling the cubeMultiplication signData storage deviceWeb pageRight angleComputer animation
SoftwareRight angleMultiplication signData managementDatabase transactionFlash memorySemiconductor memoryIntegrated development environmentBlogData storage deviceMiniDiscStatisticsCommitment schemeBit rateComputer animation
CollaborationismService (economics)Program flowchart
Transcript: English(auto-generated)
Hello, everybody. So let's start, I have only 20 minutes to tell you all the fun stuff. This about me, so very short.
Just working full time about on my scale performance. Extremely fun stuff, and I can you tell much more so later, yes, I have only 20 minutes about. I'll try to do it in 15 and so what are my main problems?
So in fact, if your problems are something, how to tune your query or use in the right way my scale and so on, everything you can read from TFM, it's not my problem. So my problem is it's about what you cannot fix, which is broken by design or not work by design
and so on. So many problems are known, many problems we just discover and well, work in progress. So it's a complex progress here. Historically, so from 5.5, my scale 5.5 was, who already tested my scale 8.0, fine guys.
Who already moved to 5.7 at least. So 5.5, you know, we deliver it just, well, some fixes which were already known. In 5.6, we started to do some deep changes, which bring huge pain because there was many regression,
many differences and so on. The most painful point in 5.6 was and writes were faster than reads. So in fact, if you want to read faster, you need to send some writes and they will unlock your reads and you will read faster. So it was completely odd, dumb, what you want.
In 5.7, finally, we fixed reads. So reads become faster, but we started to lose in efficiency. So there is always cost on something. We always balancing the code to get the most optimal stuff. So in 8.0, we are fighting and I think it will be main find for many years now
for efficiency to get the best possible performance on the same hardware. So we are not running for the bigger hardware. You should be more efficient in the same way. So you will see some flame graphs. So it was huge progress on 5.7. You can see here, once we delivered on the read only,
monstrous result, we were so happy. Since this big machine was upgraded from 72 cores to 96. So we are happy to get more than 2 million square in now. So okay, it's real numbers, it's possible. But the difference between 8.0 and 5.7, it's not really big.
And in fact, well, we don't care about this anymore because we already saw that we can do 2 million five. So for read only, it's fine. We don't touch this. So there'll be probably some regression because we added new stuff. Okay, but our main problems, it's everything which was reminding from 5.7.
So I will just add potential fixes, which we'll have here and speak about the problems. So on read only, there is still remaining block locks, for example. So if you read constantly the same rows, you will have contention.
So you will go just slower. The other, the work around here, it's just to use query cache or proxy SQL query cache. There is no more query cache in 8.0. So proxy SQL is the best solution. Look up on secondary indexes on InnoDB can be many times slower than primary key.
So it's the main work around just to use primary key or while we are working in another solution to speed up all this stuff. There is also adaptive hash index. If you enable, but the problem, as soon as you have writes, it can slow down your stuff. So these are hot topics for us here. And for UTF-8, it's extremely faster now in 8.0.
So before, for example, in 5.7, you can be 10 times slower if you use UTF-8. So at least in 8.0, it's only 10, 20% slower than Latin-1. Huge progress here. For double write, so this is all read write problems.
Double write is expected to be fixed at 10.8.0, so I will not speak here. I will tell you about the redo log changes we are doing, transaction logging, and log management. So I will speak to you about CATS, so I'm coming with 8.0. Transaction isolation was still work in progress. There is a huge potential fix here
on update performance directly related to redo logs, and inside performance, we are working here. Well, as soon as we keep B3 cached, like if you can use partitions or different tables, you can go very fast on insert. Otherwise, well, B3 is impacting you.
And for purge, if purge is lagging in your production, so at least with 8.0, you can truncate undo, so you can truncate the space which is your garbage collection effect. So to be short, so we touch it,
so we try to select the most killing problems that we have. So one, the most killing one is redo log, so as soon as you optimize it, everything, and then you cannot write on redo log as fast as you want, you unblock it. So this is the final bottleneck that we hit. Well, except if you hit some other problems. If you don't hit them, so this is one.
So for this one, this one, we attack it to fix an 8.0. Another one is every IO-bound workload. So in InnoDB, we have global lock currently. As soon as you start to read something from the disk, every IO operation taken the global lock.
So of course, it cannot scale. And as you have faster and faster storage, you cannot do faster with faster storage. And so this is related to file system locking, and also so for role locking. So we take a contribution from Michigan University,
so I will speak about this later. So now about the redo log. So what happens inside? So well, I suppose that you know, then we have a transaction commit. You know how to use them, so you flash on every commit or not, or once per second. So well, it was three years ago, we discovered that we can go faster
even if we flash on every commit. So in fact, the storage is not a problem. All the problem was about the locking inside. So all user threads are fighting to write a redo log. And in fact, this was the old model. So you see the global lock, which is blocking everybody.
And the new models, and we have dedicated log writer thread, log flasher thread, and notification threads around, so just simplified structure. So users are not block it anymore. They are writing directly to the log buffer, and in parallel, we are flashing all this data to the disk. So in fact, we are block it only
by your storage performance. So faster you can write, faster will your redo log work. There is no more grouping, so we don't wait. It's natural grouping by your flash speed on the disk. That's all. So you will go as fast as your storage can go. So this code is extremely well instrumented.
So you can know exactly how many weights you have, what your thread are doing, so what happens inside, and all configuration is dynamic. So you can change whatever you want inside and see it, even resize your buffers, log buffer, for example, live, or even stop all the redo log if you want.
So this is multi-thread model now, but you have a trade-off. So with multi-thread, you never can be faster than single-thread, which is just doing alone. So it will do write, if sink, write, if sink, without any weights, without any synchronization. So of course, it will be go faster
than the right place with threads. So there is a trade-off, and the only option, so we are disappointed, then even driven system cannot be as fast as spinning. So spinning is the most efficient way, anyway. So with this painful, probably later,
we will reinvent something more efficient, but currently, only spinning help him to go as fast as single user doing the same actions. So you can imagine here, so what happens, well, just to give you an example, so if we do nothing, if we don't use redo log,
so you can see, this is where going, this is a red line, it's about the current 8.0. So if we don't do these changes on redo log, you see, we don't reach the levels, so this is, let's see me, this is the highest level that we get with spinning,
but spinning, as soon as you eat a lot of CPU, so then you go down. So in fact, you need a balance between spinning and even driven, and so, currently what we decided, so it will be adaptive spinning.
So in fact, okay, you can tune at least to say, okay, as soon as I reach this level of CPU usage, I don't want to spin anymore, only as soon as my disk become slower than a given response time, I don't spin anymore. So at the end, it will be just auto-tuned, we will auto-discard what happens on your system, and you will don't touch anything.
But as a first release, well, we prefer to give you some tuning points here. So what we have is this, for the first time, we're presenting the result with transaction commit one, because before it was never possible, as soon as you use one, everything was slower. Now it become faster.
So 8.0 become faster than 5.7 and 5.6, in the pure OLT period write test, and much more faster yet in pure update, so when you bombard and when you have heavy updates workload, so the difference you will see. What is amazing here, then, you see in 5.7,
we got huge regression here, and 5.7 is slower than 5.6. So in fact, we are two times faster than 5.7, fine, but we are getting back this regressions and we got since 5.6. So this was related to all this work about read-only improvement and so on.
So, we are still not scaling, why? So because read-a-log is just the first step and we have next layer log, so all this transaction logging, log management and so on. So just to give you an example, how it should be, so this is our prototype,
so in current development version, and it's, so if 8.0 is not scaling on one, so there's one CPU circuit on the left side, so this is one CPU circuit, and here in fact, it's two CPU circuits. So we can reach 400,000 updates per second,
which is enormous number, so never seen until now, but well, this work is in progress. I hope it will be fixed soon once we deliver 8.0. Now, what about IO workloads? So well, any IO-bound workloads
were blocked by storage until now, but now you have a huge game changer with flash storage, which has come and become faster and faster and faster, and in reality, you have max throughput on your flash storage, but your real performance today is driven
by IO operations per second. So in fact, this throughput is limited, but you can divide it like a memory access, so it's not like before, then we tried to do bigger IO, to read more on whatever. Now you can read small IO and you will still match the same throughput,
but you will have more operations per second, especially if you need to read few records from big block, so it's smaller blocks will give you better performance, but in fact, what happens on InnoDB by default, we have 16K page, okay? So compression topic is very popular,
so we say, okay, let's compress, imagine we can compress it four times, so we will be able to read four times more because so page of 16K, compress it to 4K, and with the same throughput, we can read four more pages. Great, the problem is we have exactly the same buffer pool,
so your memory is not four times bigger, so once you uncompress your data, it's still the same useful data set, so in fact, you can read faster, but you cannot use your data faster because before to read, you need to process what you already have, otherwise why you read, right, this data.
And what happens if you will just, instead of 16K page, use 4K page. In reality, for the same memory, you will have four times more useful data, and here really will go fast, so of course, this works if you need only few data from the same page, not if you need the whole page,
otherwise, okay, this should be okay as well, but all the story is not possible just because we have this global lock mutex, so global lock for every eye operation, so as soon as you're starting to read faster, this global lock is blocking you, so you cannot read it anymore, and so the good news is that in 8.0, we fixed this,
so to validate these changes, so we got the chance to use the latest Intel Obtain drive which is alone able to deliver, you could imagine, so just one single thread,
doing pure IO can just read with one gigs per second, so 1,000 megabytes per second, one single thread, so in fact, using two drives like this, in theory, with 4K page, we're able to do one million reads per second, but is it true, this sequence?
And in fact, yes, so we're doing more than one million real IO-bound selects per second, so it's huge, so it's pure IO-bound point select here, and the same huge jump in updates, because in update, it's much more expensive,
so we can have to read the page, update the page, write the page, and so on, so we're constantly doing IO operations, and well, this is the progress that we expected from a long time, and finally, everything coming good on the same time, so we have storage solutions which are coming with flash from any vendors,
and we have code which will work with this, and for the last point, I wanted to tell you about the cats, so in MySQL, it's called cats, initially, it was called vats, and cats are just contention about transaction scheduling,
so was invented by University of Michigan, and adopted in MySQL now, so available since MySQL 8.0, so the idea is, well, looks pretty simple, so in fact, not all transactions are equal, and you have some transactions which will lock more data, some less,
so some objects are more important, less important, so in fact, this is a simple schema, I put you links also here to read more about, so it's very long paper, scientific paper, explaining all the logic with many examples and so on, so in fact, how to decide, traditionally,
we just using FIFO and first coming, first unlocked, but in fact, if you do it in more smart way, so you don't unlock the transaction which came first, but transaction which is blocking more others, you have better performance, but in fact, all the story, it was much more fun than this,
because it's turned into a real detective story, in fact, so there was a claim about huge performance improvement, but as soon I started to test it, so any probe test did not show any difference, no gain, zero, or you're just going slower,
so well, we started a long investigation and discussion with guys from Michigan, well, because the first impression was, they are just kidding, you know, there is nothing real, just it's fun something around, and they know MariaDB already applied the patch, so everything is working, shit, we did not see any results, okay,
so we started to understand what they want to solve, finally, so I found the way to build the scenario which shows the difference, so then it was big UP, finally we can see what could be improved, and as soon we started to see the workload, Sonya started discovering bugs and bugs in the patch,
and then it started to loop, in fact, on the remastering, fixing, retesting again, do we have again, we lost again, we got again, another bug is open, and so on and so on, so well, in fact, well, it's probably with some nine months of this, in many loops, before fixing everything,
and at the end, it was still unclear in which way you should use which algorithm, and again, DBA cannot sit down and look on the workload and say, okay, let's five minutes around this one, or now I switch to another model again, so finally, we've come with solution
which after detecting the problem, and then you will switch from FIFO or CATS according what is better for you in your current workloads, so we'll just discover how many locks you have and decide what is better, so it helps everywhere when you have rule or contention,
how you can recognize it, just follow your show engine mutex output, and you will see if you have locks on your current workload or not, so in fact, you need to, well, somebody never monitors the production, well, bad questions, everybody monitor production, right? So well, you monitor this all the time,
so of course, you will see your spikes, and you will see, okay, it's your case, so you will be happy with A2, and so here is example with a sole level Pareto distribution, so means you will have artificial contention on data, and many threads, many users will fight
for the same data, so with a growing load, you can see them, so without this algorithm, you lose performance with growing load, with this one, at least you solve something, okay? But you should be also realists, and well, if you write your application, and you lock every time, everything,
so well, you're just bad programmer, right? So you try to avoid locking, in fact, so it's because it's by designing your application, so you don't want to create problems, and main problems are coming mostly because in InnoGB, we have repeatable read transaction installation, okay? We want to move to committed read.
The problem is, and this currently, it's creating even more problems because of transaction locking, but this is a way what we will do, but in fact, even this case, when you can use a read commit, you don't see any difference, you know? You use what, or you don't use, or whatever,
everything is fine, so while I think once we'll deliver all the fixes that we want, everything will be transparent and equal. You'll be much more happy, even, well, than before, and with this solution as well. So, go to Axon, download the stuff,
and the most important point, have fun, because otherwise, we are doing Axon, and our job is stupid. So if you don't have fun, this is the point. Thank you, I'm on time, I see.
There's no time for a question. Yes, so why I will set up Giuseppe, you can answer a question and just get there. So, any questions? Yep, yep. Well, so in fact, the main problem with double write, you see, is,
currently, so you have, you just have a small double write buffer, which is very small, and especially, then you have locking inside. So in fact, what is the problem with double write?
So, this is the only protection we have today, because of corrupted pages, and what's amazing, then, no storage vendor on Linux can support you, then you will not have corrupted pages. There is no support. So in fact, you need to write it twice, every page, and the only problem is then,
you need to write them sequentially. So in fact, your write time will be two times bigger. So as soon as you can write two times more in parallel, everything is fine. So the new solution is just allowing you to write faster in parallel, and because you will have many writes,
a conquer in France going together, and you will hide this latency time, which is increased by two. It's not bigger, it will be unlocked in fact. So you're mostly blocked only by storage. Currently, you're blocked by design. So this is a problem. But as soon as environment will come,
you know, so you just, you don't need storage anymore, because the problem will, you write twice, you will kill your flash device, for example, two times faster. So as soon as you will have NVRAM, which you can rewrite a new time as you want, and it's battery protected, so you don't care anymore. It will just go in memory and it's fine.
Yeah, but this is double right. First write to memory and it's protected, and then you write to disk. So at least one will be safe, so it's fine. Other question, yeah?
Well, so we are working on full redesign of transaction management. So in fact, all this lock-in which happens today, which may kill you, I have a detailed blog post, for example, explaining the stuff around
why it couldn't be dangerous. So as soon as this lock-in will go, you will be just happy with committed read. So you know, so just be patient. It's coming. No question? Okay. Thank you.