We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Benchmarking Storage Systems with blkreplay

00:00

Formale Metadaten

Titel
Benchmarking Storage Systems with blkreplay
Serientitel
Anzahl der Teile
90
Autor
Mitwirkende
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
How to reliably evaluate storage systems The blkreplay toolkit replays I/O-loads on storage systems on the block-level. This allows measurements and comparisons of storage systems under close-to-real-life conditions. With this tool deep insights into workloads and storage systems can be obtained easily. The blkreplay toolkit was developed by 1&1 Internet AG in order to reproduce natural loads, recorded via blktrace in our data centers. It automates large laboratory projects, e.g. benchmarking comparisons of a wide variety of storage hardware and its multi-dimensional parameter space. The presentation will explain how the blkreplay toolkit works and discuss the block layer storage loads that have been recorded on various servers of a large ISP. The loads available cover a large range of applications including for example webhosting, databases and backup servers. The application of these loads to commercial and open source / commodity storage systems has led to a deep insight into their behaviour. Some of which are surprising and eye-opening. Finally it will be shown that OpenSource based hard- and software systems can compete with commercial ones in many areas, provided that certain conditions are met.
25
Vorschaubild
15:46
51
54
Vorschaubild
15:34
55
57
Vorschaubild
1:02:09
58
Vorschaubild
16:08
62
Vorschaubild
13:26
65
67
Speicher <Informatik>SystemprogrammierungCoxeter-GruppeNichtlinearer OperatorDifferenteQuaderSystemprogrammierungUnternehmensarchitekturKartesische KoordinateniSCSIDatenspeicherungXMLUMLVorlesung/Konferenz
BenchmarkSpeicher <Informatik>DatenspeicherungSystemprogrammierungTropfenOperations ResearchProgrammierumgebungQuellcodeReelle ZahlLastMotion CapturingTabusucheSeitenbeschreibungsspracheKartesische KoordinatenBenchmarkp-BlockTypentheorieLastSystemprogrammierungDatensatzMultiplikationsoperatorDickeKernel <Informatik>Nichtlinearer OperatorInterface <Schaltung>SkriptspracheRechenzentrumCodeWeb-SeiteQuellcodeDatenspeicherungSchnittmengeBitFreewareEin-AusgabeInformationTermÄhnlichkeitsgeometrieVererbungshierarchieLokales MinimumLesezeichen <Internet>Nabel <Mathematik>Quick-SortNetzbetriebssystemBildschirmfensterReelle ZahlZahlenbereichSoftwareentwicklerFolge <Mathematik>AblaufverfolgungDateiformatZweiComputeranimation
p-BlockQuick-SortVorlesung/Konferenz
p-BlockSpeicher <Informatik>MultiplikationsoperatorBitProzess <Informatik>Zentrische StreckungDateiverwaltungResultanteElektronische PublikationMusterspracheSystemprogrammierungQuaderInformationsüberlastungMini-DiscDatenspeicherungNichtlinearer OperatorSoftwaretestOpen Sourcep-BlockGeradeHardwareFolge <Mathematik>SoundverarbeitungEin-AusgabeMetadatenSchnittmengeSkriptsprachePlotterRechenschieberMultigraphUltraviolett-PhotoelektronenspektroskopieGüte der AnpassungGreen-FunktionGraphSpeicher <Informatik>WarteschlangeiSCSIXMLUML
BlackboxVorlesung/Konferenz
p-BlockMinkowski-MetrikVolumenMinkowski-MetrikDatenspeicherungTermp-BlockSoundverarbeitungMini-DiscMultiplikationsoperatorBlackboxVirtualisierungPhysikalismusMathematische LogikBenchmarkAdressraumSchnittmengeXML
MaschinenschreibenZahlenbereichAdressraumMathematische LogikBenchmarkFormation <Mathematik>DatenspeicherungVorlesung/Konferenz
LoginAdressraumMinkowski-MetrikZufallszahlenVolumenp-BlockMathematische LogikURLTermAdressraumPhysikalismusMini-DiscXML
Netzwerk <Graphentheorie>p-BlockLoginMIDI <Musikelektronik>ZufallszahlenMini-DiscPhysikalismusMinkowski-MetrikXMLUML
p-BlockMinkowski-MetrikZufallszahlenMathematische LogikMaschinenschreibenMIDI <Musikelektronik>Ganze FunktionPhysikalismusAdressraumVirtualisierungEinfacher Ringp-BlockSystemprogrammierungBenchmarkSchnittmengeCASE <Informatik>ZahlenbereichDatenspeicherungEinfache GenauigkeitMinkowski-MetrikKomplex <Algebra>SoftwareDifferenteKartesische KoordinatenQuaderEin-AusgabeBitMulti-Tier-ArchitekturDimensionsanalyseVorhersagbarkeitAblaufverfolgungXMLVorlesung/Konferenz
PrognoseverfahrenDatenspeicherungDatenstrukturTeilbarkeitBeanspruchungVisuelles SystemSystemprogrammierungPaarvergleichBenchmarkLastSystemprogrammierungLastSoftwaretestFehlertoleranzWiederherstellung <Informatik>DatenstrukturResultanteBeanspruchungZahlenbereichAutorisierungCASE <Informatik>TeilbarkeitComputerspielQuick-SortWeb SiteDisk-ArrayMultiplikationsoperatorNichtlinearer OperatorPaarvergleichPunktKonstanteReelle ZahlBitratep-BlockComputeranimation
DatenkompressionZahlenbereichRandomisierungProgrammverifikationHardwareFolge <Mathematik>SystemprogrammierungSpeicherabzugQuick-SortSpezifisches VolumenBeanspruchungp-BlockTermMusterspracheSoftwareMini-DiscRechter WinkelMetadatenSoundverarbeitungAnalysisPixelVirtualisierungSoftwaretestTypentheorieMultigraphDatenspeicherungInhalt <Mathematik>QuaderMoment <Mathematik>SchnittmengeSchreiben <Datenverarbeitung>Vorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
Okay, thank you very much so most important things I said already and the topic I'll be talking of and it's About a tool and also our experience which we made essentially driven by the operations in our company Where we run a lot of different storage systems ranging from high highest and
Enterprise applications down to Supposedly simple Linux boxes Running NFS or iSCSI and Well Maybe some some of you share the experience we we made in some places. I would like to start with a couple of examples
first thing is You run some kind of benchmark on your storage system like Bonnie a plus plus iOS on or whatever Is your preference and it shows you get a throughput like 150 megabytes per second. Okay, so you are fine probably
Afterwards your monitor what your real application is Is putting through and it maxes out at somewhere like 50 megabytes. That's only a third. So what did your benchmark tell you? example to You get a you get a vendor of some some systems Around and they say okay our super high-end storage system delivers
40,000 IOPS no matter what After that you you will notice that in operations, it's like 10% of that maximum. Nevermore Although your applications would like to have it more than that
Or third examples you get a new storage system from any kind of vendor And it performs excellent. You are really happy with that Over time it fills with data and The further it gets The more your performance drops
sometimes dramatically And you you start to think what should I do in terms of benchmarking beforehand so I can You can be sure that I get a system. That's Please become load offs to order new copies closer better Okay
So you ask how can you how can your storage system be benchmarked before you run into problems? While running operations we developed a solution in one-on-one Where we try to Put a real application type of load onto the storage system
There's a lot of applications which offer application benchmarks, but that's a what we developed is a bit more generic So take a Linux system For example and the idea is to capture just the IO on the block layer of the of the Linux counter Fortunately in the Linux kind of there is an interface for that
Which is called BLK trace block trace and a tool set for recording data that goes over the block block layer interface so That tool records all all the requests with a request timing precise counter timing The length of the request the type of the request if it's read or write it It records the exact sector number it starts. So you've got everything in place
To recreate a particular load off that particular sequence of requests and That's what my colleague. Dr. Thomas all the toy toy on our kernel developer and just did he Developed that tool BLK replay block replay
Which does this kind of replaying? loads One short note The block trace interface is specific for Linux, but there's Similar and compatible sort of compatible interfaces also available for Windows or Solaris
So pick your favorite operating system and see where you can get the requests Simple shell script should and should trans trans code that that format to whatever block replay can read Of course, there's a there's a web page for lots of more information the source code full documentation And even a lot of data sets we recorded at one and one in our data center
And Best of all is everything including the data sets has been published using GPL or free documentation license, so you are really free to use that stuff and contribute if you like or ask questions to us
And I think as a company that's using lots of Linux. That's the least we can get back to the community So how does block block replay actually work? Essentially The picture is sort of as simple as it is in practice
You put whatever block trace Got you into the standard input for of the main process There's up to a thousand worker nodes Which keep queues of the requests so they so that I can make sure that all the requests are fired at the precise
Exact timing as you like that There's a bit of housekeeping here and the results are being written out and standard out So you can pipe data through that or use files for that Whatever you like the result files are then processed by any tool you like
We deliver a set of scripts that using that are using a new plot and you get a bunch of nice graphs out of that So let me show you an example This is a storage appliance, which we had in our test lab and
First I explain you how to read that kind of graph. This is just a duration of our test sequence On the y-axis you see the throughput in IO operations per second the green line green pattern is what we
Originally recorded the yellow is what the tested storage system made out of that We intentionally put a huge peak at the beginning of that test so we can see How the system would behave in overload situations and how it would recover and what you see is that?
We get out something like 10,000 IOPS which is well acceptable for the particular kind of storage system Let's compare that to a Linux box with a couple of disks
Which is about less than half the price of the commercial storage solution This is a has been a Linux box essentially Plain Debian Linux and an and an iSCSI target on top of that So what do we get out?
It's the same pattern only the timing is a little bit different here on the time scale is different And you'll see the Linux box also delivers about a thousand IOPS So this is good news for for open source and commodity hardware But not something interesting on the next slide back to our
Storage appliance, which I showed you This is just a second run with this actually the exact same Storage appliance, which I showed you two slides before but this time we filled it up with data Remember, it's block layer. So no no metadata stuff. No file system. Nothing as just a block layer
And I can I can assure you that the Linux system doesn't care if the disks have been written full or not but evidently this appliance Does what we get out is less than 5,000 I obsessed less less than half then we get out of the empty system
and this is a This is an effect. We observe on many many storage systems Which are commercially available? And I show you why we think that might happen. So that's
For us, it's the most likely explanation So these commercial things are essentially black boxes there's most of the time and actually, it's SSD inside and They implement some kind of storage virtualization. What does that actually mean? And in terms of
physical effects it actually translates the logical block addresses to the physical sectors on the disks and Evidently it does some kind of reordering And what happens is that usually then usually benchmarks including block replay usually
Just touch a tiny amount of of the disk space So that's a small working set. That's being being being handled by any kind of Probably any kind of benchmark. So what what these storage appliances do is if That upper band would be the logical address space and your benchmark touches a number of sectors
randomly spread about about the address space The appliance would start reordering these in terms of physical location and put these just in the cup first couple of sectors of their
physical disk space so accessing that data is significantly faster than Than Addressing the entire physical space. So that's that's one of the explanations that lead To that slowdown depending if you pre-filled the entire physical address space or not
As a side remark I might also say that Commercial systems have been shown using block trace and block replay To consist evidently of a huge number of different Software layers that are doing storage virtualization that do a thin provisioning that do
Tiering and all that stuff and what we learned is that every single layer of complexity in the first place Costs performance and for some working sets for some behaviors for some use cases Some of these layers can give you an improvement
but Many of especially the use cases we have in our company Don't gain on these layers of complexity. So these layers of complexity in a commercial box just cost performance and this is this is essentially the reason why for Some applications the Linux box with nothing on top of that is the most efficient system
So let me conclude with about one and a half minutes left on the kitchen timer five minutes Your kitchen timer shows two minutes left or one and a half or something
It's to have space for questions Well that that leaves a bit of space for questions and probably one or two of you guys have some Okay, so what have you learned? Synthetic benchmarks like like IOs on Bonnie or whatever you use Are not very predictive they of course for some dimensions. I have a predictiveness
That's the reason why they exist but but compared to your real-world use case, they are not very predictive and a crucial factor that we identified for for benchmarking as the structure of your Workload you're running
Actually actually on your productive systems What we can do with block replay is to recreate Precisely these loads which we recorded in real life operations So that's that's sort of nothing synthetic on that And the results of replaying that on
number of test systems gives us a detailed comparison of the behavior like graceful degradation and graceful recovery or something some other stuff What I can also tell you at that point is that we have a tool set which is also able to modify your your
real-life workloads into some some kind of partly synthetic workloads where you have constant request rates or Rising request rates so you can you can play with your loads recorded in real life
So, okay the experience we made is clearly identified So perfect timing Again if that sounds interesting to you Don't hesitate to visit our website or contact me or the author of that awesome toolkit as you like and now that opens the
opportunity to ask questions Did you run the same test With the filled block device on on Linux. Yes, actually we did I didn't bring the graphs
Because they look right exactly by pixel by pixel the same But we actually ran it and make sure that that this is true but by the way on Linux you also have opportunity have ways of introducing that kind of storage virtualization like using LVM or
Whatever likes have and a block layer on top of that That's a huge opportunity a huge amount of software that can do that and we observe this kind of same effects So am I right and assuming that the behavior that you Can show it doesn't not really depend on the actual data that you read on the disk or write on the disk
So it is enough to have the metadata and the excess pattern But you do not need to include the actual data into the dumps. That's right Okay So the dumps do not get like 30 terabytes big so that you can have a small set of metadata and can still recreate Workload of 30 terabytes. That's right. That's right. That's exactly right We have we have only the metadata of the requests and not the content of the blocks
But beware in the meantime, we have SSD on SSDs on the market that are compressing the data So it's important to have a look at what kind of hardware you have on the bottom of that if you have a compressing SSD somewhere in the system you should think about preceding
The Data volume with some kind of random data or where you know, the compressibility zero blocks are very well compressible and Compressing SSDs handle that very well. We've measured that. Yeah, that was my sort of my follow-up question whether like in block replay you just
Pretend you you read some random data instead of just all zero boxes. Yes, so Nothing prevents you of filling also real data on the on the disk beforehand Although whatever you feel fit for for your type of analysis But on you know, dumb disks
Just reading zero blocks on writing zero blocks. It's okay, by the way in terms of writing we When we have a writing access to the blocks, we actually put a sequence number in there so so we can do verifications for Afterwards to see if the sequence has been correctly replayed. There's test tools for that, too
Thank you for the talk