Benchmarking Storage Systems with blkreplay
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 90 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/40265 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 201374 / 90
2
5
8
10
12
13
14
15
17
19
21
24
25
28
29
31
32
34
36
39
40
43
44
46
50
51
52
54
55
57
58
62
65
66
67
78
79
87
88
00:00
Speicher <Informatik>SystemprogrammierungCoxeter-GruppeNichtlinearer OperatorDifferenteQuaderSystemprogrammierungUnternehmensarchitekturKartesische KoordinateniSCSIDatenspeicherungXMLUMLVorlesung/Konferenz
00:36
BenchmarkSpeicher <Informatik>DatenspeicherungSystemprogrammierungTropfenOperations ResearchProgrammierumgebungQuellcodeReelle ZahlLastMotion CapturingTabusucheSeitenbeschreibungsspracheKartesische KoordinatenBenchmarkp-BlockTypentheorieLastSystemprogrammierungDatensatzMultiplikationsoperatorDickeKernel <Informatik>Nichtlinearer OperatorInterface <Schaltung>SkriptspracheRechenzentrumCodeWeb-SeiteQuellcodeDatenspeicherungSchnittmengeBitFreewareEin-AusgabeInformationTermÄhnlichkeitsgeometrieVererbungshierarchieLokales MinimumLesezeichen <Internet>Nabel <Mathematik>Quick-SortNetzbetriebssystemBildschirmfensterReelle ZahlZahlenbereichSoftwareentwicklerFolge <Mathematik>AblaufverfolgungDateiformatZweiComputeranimation
04:43
p-BlockQuick-SortVorlesung/Konferenz
04:58
p-BlockSpeicher <Informatik>MultiplikationsoperatorBitProzess <Informatik>Zentrische StreckungDateiverwaltungResultanteElektronische PublikationMusterspracheSystemprogrammierungQuaderInformationsüberlastungMini-DiscDatenspeicherungNichtlinearer OperatorSoftwaretestOpen Sourcep-BlockGeradeHardwareFolge <Mathematik>SoundverarbeitungEin-AusgabeMetadatenSchnittmengeSkriptsprachePlotterRechenschieberMultigraphUltraviolett-PhotoelektronenspektroskopieGüte der AnpassungGreen-FunktionGraphSpeicher <Informatik>WarteschlangeiSCSIXMLUML
08:32
BlackboxVorlesung/Konferenz
08:46
p-BlockMinkowski-MetrikVolumenMinkowski-MetrikDatenspeicherungTermp-BlockSoundverarbeitungMini-DiscMultiplikationsoperatorBlackboxVirtualisierungPhysikalismusMathematische LogikBenchmarkAdressraumSchnittmengeXML
09:29
MaschinenschreibenZahlenbereichAdressraumMathematische LogikBenchmarkFormation <Mathematik>DatenspeicherungVorlesung/Konferenz
09:46
LoginAdressraumMinkowski-MetrikZufallszahlenVolumenp-BlockMathematische LogikURLTermAdressraumPhysikalismusMini-DiscXML
09:59
Netzwerk <Graphentheorie>p-BlockLoginMIDI <Musikelektronik>ZufallszahlenMini-DiscPhysikalismusMinkowski-MetrikXMLUML
10:12
p-BlockMinkowski-MetrikZufallszahlenMathematische LogikMaschinenschreibenMIDI <Musikelektronik>Ganze FunktionPhysikalismusAdressraumVirtualisierungEinfacher Ringp-BlockSystemprogrammierungBenchmarkSchnittmengeCASE <Informatik>ZahlenbereichDatenspeicherungEinfache GenauigkeitMinkowski-MetrikKomplex <Algebra>SoftwareDifferenteKartesische KoordinatenQuaderEin-AusgabeBitMulti-Tier-ArchitekturDimensionsanalyseVorhersagbarkeitAblaufverfolgungXMLVorlesung/Konferenz
12:00
PrognoseverfahrenDatenspeicherungDatenstrukturTeilbarkeitBeanspruchungVisuelles SystemSystemprogrammierungPaarvergleichBenchmarkLastSystemprogrammierungLastSoftwaretestFehlertoleranzWiederherstellung <Informatik>DatenstrukturResultanteBeanspruchungZahlenbereichAutorisierungCASE <Informatik>TeilbarkeitComputerspielQuick-SortWeb SiteDisk-ArrayMultiplikationsoperatorNichtlinearer OperatorPaarvergleichPunktKonstanteReelle ZahlBitratep-BlockComputeranimation
13:40
DatenkompressionZahlenbereichRandomisierungProgrammverifikationHardwareFolge <Mathematik>SystemprogrammierungSpeicherabzugQuick-SortSpezifisches VolumenBeanspruchungp-BlockTermMusterspracheSoftwareMini-DiscRechter WinkelMetadatenSoundverarbeitungAnalysisPixelVirtualisierungSoftwaretestTypentheorieMultigraphDatenspeicherungInhalt <Mathematik>QuaderMoment <Mathematik>SchnittmengeSchreiben <Datenverarbeitung>Vorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:01
Okay, thank you very much so most important things I said already and the topic I'll be talking of and it's About a tool and also our experience which we made essentially driven by the operations in our company Where we run a lot of different storage systems ranging from high highest and
00:22
Enterprise applications down to Supposedly simple Linux boxes Running NFS or iSCSI and Well Maybe some some of you share the experience we we made in some places. I would like to start with a couple of examples
00:45
first thing is You run some kind of benchmark on your storage system like Bonnie a plus plus iOS on or whatever Is your preference and it shows you get a throughput like 150 megabytes per second. Okay, so you are fine probably
01:01
Afterwards your monitor what your real application is Is putting through and it maxes out at somewhere like 50 megabytes. That's only a third. So what did your benchmark tell you? example to You get a you get a vendor of some some systems Around and they say okay our super high-end storage system delivers
01:25
40,000 IOPS no matter what After that you you will notice that in operations, it's like 10% of that maximum. Nevermore Although your applications would like to have it more than that
01:41
Or third examples you get a new storage system from any kind of vendor And it performs excellent. You are really happy with that Over time it fills with data and The further it gets The more your performance drops
02:01
sometimes dramatically And you you start to think what should I do in terms of benchmarking beforehand so I can You can be sure that I get a system. That's Please become load offs to order new copies closer better Okay
02:20
So you ask how can you how can your storage system be benchmarked before you run into problems? While running operations we developed a solution in one-on-one Where we try to Put a real application type of load onto the storage system
02:41
There's a lot of applications which offer application benchmarks, but that's a what we developed is a bit more generic So take a Linux system For example and the idea is to capture just the IO on the block layer of the of the Linux counter Fortunately in the Linux kind of there is an interface for that
03:00
Which is called BLK trace block trace and a tool set for recording data that goes over the block block layer interface so That tool records all all the requests with a request timing precise counter timing The length of the request the type of the request if it's read or write it It records the exact sector number it starts. So you've got everything in place
03:26
To recreate a particular load off that particular sequence of requests and That's what my colleague. Dr. Thomas all the toy toy on our kernel developer and just did he Developed that tool BLK replay block replay
03:42
Which does this kind of replaying? loads One short note The block trace interface is specific for Linux, but there's Similar and compatible sort of compatible interfaces also available for Windows or Solaris
04:01
So pick your favorite operating system and see where you can get the requests Simple shell script should and should trans trans code that that format to whatever block replay can read Of course, there's a there's a web page for lots of more information the source code full documentation And even a lot of data sets we recorded at one and one in our data center
04:24
And Best of all is everything including the data sets has been published using GPL or free documentation license, so you are really free to use that stuff and contribute if you like or ask questions to us
04:40
And I think as a company that's using lots of Linux. That's the least we can get back to the community So how does block block replay actually work? Essentially The picture is sort of as simple as it is in practice
05:02
You put whatever block trace Got you into the standard input for of the main process There's up to a thousand worker nodes Which keep queues of the requests so they so that I can make sure that all the requests are fired at the precise
05:20
Exact timing as you like that There's a bit of housekeeping here and the results are being written out and standard out So you can pipe data through that or use files for that Whatever you like the result files are then processed by any tool you like
05:41
We deliver a set of scripts that using that are using a new plot and you get a bunch of nice graphs out of that So let me show you an example This is a storage appliance, which we had in our test lab and
06:02
First I explain you how to read that kind of graph. This is just a duration of our test sequence On the y-axis you see the throughput in IO operations per second the green line green pattern is what we
06:22
Originally recorded the yellow is what the tested storage system made out of that We intentionally put a huge peak at the beginning of that test so we can see How the system would behave in overload situations and how it would recover and what you see is that?
06:45
We get out something like 10,000 IOPS which is well acceptable for the particular kind of storage system Let's compare that to a Linux box with a couple of disks
07:02
Which is about less than half the price of the commercial storage solution This is a has been a Linux box essentially Plain Debian Linux and an and an iSCSI target on top of that So what do we get out?
07:21
It's the same pattern only the timing is a little bit different here on the time scale is different And you'll see the Linux box also delivers about a thousand IOPS So this is good news for for open source and commodity hardware But not something interesting on the next slide back to our
07:44
Storage appliance, which I showed you This is just a second run with this actually the exact same Storage appliance, which I showed you two slides before but this time we filled it up with data Remember, it's block layer. So no no metadata stuff. No file system. Nothing as just a block layer
08:04
And I can I can assure you that the Linux system doesn't care if the disks have been written full or not but evidently this appliance Does what we get out is less than 5,000 I obsessed less less than half then we get out of the empty system
08:24
and this is a This is an effect. We observe on many many storage systems Which are commercially available? And I show you why we think that might happen. So that's
08:41
For us, it's the most likely explanation So these commercial things are essentially black boxes there's most of the time and actually, it's SSD inside and They implement some kind of storage virtualization. What does that actually mean? And in terms of
09:02
physical effects it actually translates the logical block addresses to the physical sectors on the disks and Evidently it does some kind of reordering And what happens is that usually then usually benchmarks including block replay usually
09:21
Just touch a tiny amount of of the disk space So that's a small working set. That's being being being handled by any kind of Probably any kind of benchmark. So what what these storage appliances do is if That upper band would be the logical address space and your benchmark touches a number of sectors
09:46
randomly spread about about the address space The appliance would start reordering these in terms of physical location and put these just in the cup first couple of sectors of their
10:00
physical disk space so accessing that data is significantly faster than Than Addressing the entire physical space. So that's that's one of the explanations that lead To that slowdown depending if you pre-filled the entire physical address space or not
10:21
As a side remark I might also say that Commercial systems have been shown using block trace and block replay To consist evidently of a huge number of different Software layers that are doing storage virtualization that do a thin provisioning that do
10:44
Tiering and all that stuff and what we learned is that every single layer of complexity in the first place Costs performance and for some working sets for some behaviors for some use cases Some of these layers can give you an improvement
11:02
but Many of especially the use cases we have in our company Don't gain on these layers of complexity. So these layers of complexity in a commercial box just cost performance and this is this is essentially the reason why for Some applications the Linux box with nothing on top of that is the most efficient system
11:28
So let me conclude with about one and a half minutes left on the kitchen timer five minutes Your kitchen timer shows two minutes left or one and a half or something
11:43
It's to have space for questions Well that that leaves a bit of space for questions and probably one or two of you guys have some Okay, so what have you learned? Synthetic benchmarks like like IOs on Bonnie or whatever you use Are not very predictive they of course for some dimensions. I have a predictiveness
12:04
That's the reason why they exist but but compared to your real-world use case, they are not very predictive and a crucial factor that we identified for for benchmarking as the structure of your Workload you're running
12:21
Actually actually on your productive systems What we can do with block replay is to recreate Precisely these loads which we recorded in real life operations So that's that's sort of nothing synthetic on that And the results of replaying that on
12:41
number of test systems gives us a detailed comparison of the behavior like graceful degradation and graceful recovery or something some other stuff What I can also tell you at that point is that we have a tool set which is also able to modify your your
13:02
real-life workloads into some some kind of partly synthetic workloads where you have constant request rates or Rising request rates so you can you can play with your loads recorded in real life
13:21
So, okay the experience we made is clearly identified So perfect timing Again if that sounds interesting to you Don't hesitate to visit our website or contact me or the author of that awesome toolkit as you like and now that opens the
13:42
opportunity to ask questions Did you run the same test With the filled block device on on Linux. Yes, actually we did I didn't bring the graphs
14:02
Because they look right exactly by pixel by pixel the same But we actually ran it and make sure that that this is true but by the way on Linux you also have opportunity have ways of introducing that kind of storage virtualization like using LVM or
14:21
Whatever likes have and a block layer on top of that That's a huge opportunity a huge amount of software that can do that and we observe this kind of same effects So am I right and assuming that the behavior that you Can show it doesn't not really depend on the actual data that you read on the disk or write on the disk
14:42
So it is enough to have the metadata and the excess pattern But you do not need to include the actual data into the dumps. That's right Okay So the dumps do not get like 30 terabytes big so that you can have a small set of metadata and can still recreate Workload of 30 terabytes. That's right. That's right. That's exactly right We have we have only the metadata of the requests and not the content of the blocks
15:05
But beware in the meantime, we have SSD on SSDs on the market that are compressing the data So it's important to have a look at what kind of hardware you have on the bottom of that if you have a compressing SSD somewhere in the system you should think about preceding
15:24
The Data volume with some kind of random data or where you know, the compressibility zero blocks are very well compressible and Compressing SSDs handle that very well. We've measured that. Yeah, that was my sort of my follow-up question whether like in block replay you just
15:41
Pretend you you read some random data instead of just all zero boxes. Yes, so Nothing prevents you of filling also real data on the on the disk beforehand Although whatever you feel fit for for your type of analysis But on you know, dumb disks
16:02
Just reading zero blocks on writing zero blocks. It's okay, by the way in terms of writing we When we have a writing access to the blocks, we actually put a sequence number in there so so we can do verifications for Afterwards to see if the sequence has been correctly replayed. There's test tools for that, too
16:28
Thank you for the talk