Visualizing benchmark data with R and gpplot2

Video in TIB AV-Portal: Visualizing benchmark data with R and gpplot2

Formal Metadata

Visualizing benchmark data with R and gpplot2
Alternative Title
Mysql And Friends - Benchmark R Gpplot2
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Graph (mathematics)
Email Presentation of a group Thread (computing) Confidence interval Graph (mathematics) Execution unit Variable (mathematics) Response time (technology) Benchmark Single-precision floating-point format Identity management E-book Sampling (statistics) Sound effect Maxima and minima Bit Database transaction Category of being Arithmetic mean Order (biology) Writing Reading (process) Geometry Suspension (chemistry) Computer file Time series Data storage device Rule of inference Event horizon Plot (narrative) Number Product (business) Frequency Goodness of fit Quantile Default (computer science) Pairwise comparison Standard deviation Graph (mathematics) Scaling (geometry) Matching (graph theory) Key (cryptography) Line (geometry) Cartesian coordinate system Grass (card game) Limit (category theory) Greatest element Frame problem Mathematics Word Error message Visualization (computer graphics) Personal digital assistant Statement (computer science) Collision Musical ensemble Table (information) Code Plotter Multiplication sign Function (mathematics) Parameter (computer programming) Mereology Cartesian coordinate system Subset Geometry Bit rate Determinant Scripting language Area Clique-width File format Data storage device Variable (mathematics) Benchmark Thread (computing) Parsing Degree (graph theory) Type theory Configuration space Right angle Metric system Freeware Alpha (investment) Resultant Row (database) Laptop Functional (mathematics) Table (information) Observational study Divisor Floating point Electronic mailing list Distance Graph coloring 2 (number) Software testing Mathematical optimization Addition Multiplication Consistency Variance
Sample (statistics) Sampling (statistics) Right angle Benchmark
Area Discrete group Graph (mathematics) Scaling (geometry) Divisor Graph (mathematics) Sound effect Ext functor Cartesian coordinate system Variable (mathematics) Twitter Mathematics Touch typing Right angle
it is so Jerry start of so this talk will be only loosely related to my SQL in I do a lot of benchmarking and they often ask the question how they make of graphs so this talk will be about
holdover do how to make that graph in 17 easy steps so that's a that's an old all bench comparing uh multiple storage devices both in the read-only and read-write cases and why visualisation is important I have a some examples here and so on the the OK determinants smaller so the raw data of this uh looks like that so it's a lot of suspension would we can look at 1 so it has the seconds granularity data so it's consisted of 60 time series data so if you if I would like to graph that it's 6 to graph 6 the graph is you know so much to soul In a 15 minute Stokes so we have to do something you know which can visit visit visualize this better and the are and Jaeger book to for so in order to move the use the stuff I and when he was here we have to preprocess the data to put that into our rate of things for that the data needs to be in this format the 1st column the 2nd you can see from the article called that uh from the column names what does it actually mean so this means that for the 1st 2nd of the benchmark when the storage device all it was a read only passed with this minus strands the metric used suspense transactions per 2nd and the value is uh more than 2 thousand bed so the so here or in a more realistic scanner audio or you have lots of data 48 seconds you will have and the study deposit 48 2nd you will have the instead data you would have probably all of them I ask you mean variables so you can choose from all from the R code you can choose what is interesting but for the sake of this example idea I created a parser for this so it's like I had you I usually have a script which states that all benchmark the tightening the all benchmark code for that today and it just you know parts is the data and look at it like that and redirected to to a file and that's fine uh I did from ah and on yeah and do the grass from that so hopefully the 5 from we are creating the bit of fame with the book table which will read the cable from a file this is the file which is built with of the given script so it's 1 line per 2nd for automatic the we name the columns and 40 it's graph we the worst subset for example here hearing this is bench GPS data frame we have such a thing this is bench or what the uh what we give it a frame them command frequently threads will be 256 the read write keys this storage device uh we called word the value 2 numbers uh in these examples if you all this this is a necessary but as soon as you have my esculentum in also for their for example which has some known number uh no number metrics what you will see is as soon as you make the graph each metric repeated in in the in the y axis is the text but as textons based knowledge or something like that near the Y axis with with the graph that doesn't make sense this function here uh it only creates some aggregate's I usually like to go that's so this is but also also our has a nice idea called us through the all this is the 1st graph I'm going to show you soon but here is the it is the aggregate data frame it's not visible guess but it calculates for the given benchmark for the each of these benchmark out benchmarks set an long it calculates the standard deviation of stroke what means to put the 95th percentile and and the maximum so kind of for each case we have kind of a tabular data as well OK so
initial graph the output for that is this the output growth during the presentation so we take just a 256 threads read write benchmarks at 1st and in order to create a graph we GGP book to you have to create a bluff objected the Jiji put function and you have to add at least esthetics and at least the geometry here at the esthetics side that the x-axis is the time of the Y axis is the value of the metric which is the the now and I plotted as lines so you saw that example you load it into into our studio and and it will look like this you can change the geometry for example when I liked his geome texts which were just old what's the the values this text OK that I don't have a label Irish soldier in text I guess but we can change it to uh to Jupiter which is a jitter problem so the issue with this plot with the line plots is that the individual values are always joined so in this black area we have absolutely no idea what is happening and it's a quiet high variance outflow it did did it matters if it's 6 thousand or 6 and a half so we can do jitter products but still with the jitter with a broken bone and adults we still have some that black areas so we can use the offer channel future of digital books so we are plotting transparent thoughts so when the color in which in this case is black is more the events we have more samples in that area because the book separately on each other that they are more like the we have only there are only 1 sample next thing is this seems to be inconsistent at the 1st time but it's only inconsistent because the because of altered effect the axis so you can use the expand limits functional set the start of the y axis the 0 which you know really to this graph and you can specify geometry and the caller whether the specified that it will be the storage the column from the data fame it doesn't matter what you specify a scholar if something is a different text or a different number it to assigned a different colors and that's it and this automatically generates the legend the legend is not that nice to be here right so we add to the so these are all these additions so From here the output would be too much to so so I only saw I only saw the additions so we can put the legend here like that and I like this because it shows the flexibility this legend is quite probably because we're using off channel for the books the actually figured resulted threads that last fall's then so you know the the coding the legend is very light so we can override the legends tools have have the offer channel of 1 so we have the regular collision of we can add the label to the y axis we can add the label to the x-axis this is a fairly common thing you they can use more vibrant colors like that Ch the and in the legend that it was still a bit of it because the background was so we can separate the White so it's the it is probably quite seamless so this is still a single a single benchmark ground so this is 1 time series this is one hour verse of CIS bench in the next example want to change is the x-axis instead of putting the time that we are extending the beta frame so we are not restricted to a single step configuration but the Viet examining all the strands and in the x axis we're putting factor Trent which shall result in a local like uh like this in the 1st column here practically world with a single stranded 2nd is those threads 4 threads it sets X appearance at the so this kind of graph use I think a good way to see the whole something scales and red is the red is the optimal degree of confidence in uh if you've specified the x-axis like that then we'll actually be a hidden x is for each column which is time so it is still a unit still ordered by time OK the other feature I like in our is fascinating because with Fesup thing you can make it easy comparisons here in this example I added that and all all around for this uh this storage device and I I'm faceting based on if the band uh based on the read only the right column so here I would have to graph 1st is only a 2nd is that right and they they do have matching scales so it's like alter magically if you're using for something so it's pretty easy to compare them that the read write value is slightly lower but that's because believe that distances is bench so and it's different because in this is bench transaction if you're going to read-only tests there are some statements that person simply not run which of which would guide the database and in that that case good Morfessor thing so you can do more article on thing here on the so the 1st argument of this is the radical built means that I I'm I don't facets radically and this is the horizontal Uriarte faceting it in a way that we are using the read only the bright and the storage device so in the 1st rule we will have to read-only data in the 2nd row we will have the right data so for all the 3 storage devices that he'd only entities right uh performance he is compatible and you know why why this type of this visualization is important we could say that you know it could do uh it could go some transactions per 2nd or it could have this amount of free response time but you know benchmarks to the with a single number as a result I'm not telling you too much right it's important how consistent is that performances and from the gist of what you can see that the you know for example in the here in the the the right case if you that after 256 threads it goes morning consistent while why not only not only to be so it be morning consistent as well and the identity of the entity that case we have some spikes up actually actually so that you see that it's not that wire and then most of them out and most of the samples that in the lower region but then some of their money out in the upper region OK so we have some stuff to set the graph title which will appear here and the only thing which is which remains I in this is the captioning this that's because we did you know so much as something uh these numbers out kind of pressed together and now the out only the bone the can 6 that by setting uh by setting the properties of the X axis so we we have writing them in you know where at 45 degrees in order to save the plot uh you need to use the GGC function so by default while you are working with our studio uh it has so let's go to the the last example and this graph would actually take some time to generate so see it started to generate the the the the the because it's so many books because it has you know a footage a growing the frame attest to plot of them so it's a it takes some time than that at this so if you like it you know during uh during the period value working with this you'll see an notebook here and you can save this would be the GG save the function and the B and at the end of the the show like the grass OK audio or the article on 1 side so actually to generate the graph they are is this just you know step by step this
uh goes to bolster my it had before that all the examples out there and or the sample data as well as the sample data is from a recent benchmark and the customer was kind enough that can write I could use it in in this talk and that's
it means big are there questions you can look at the of the act on the right
the fact the this in and all the notion of factor In fact the press the the
so factor just means discrete values so let's try to remove the effect of year and just go the trends and the issue what we see here is that it is a bilinear right so this is 1 closely for the 1 thousand so the the factor means that I'm supplying discrete values to you and the age disputes value should have it's own uh you know an equally area on the graph that that's what effect about so we don't factor it looks like this the so you the 1 have yeah faceting is multiple graphs next to each other we did that with the scales matching the we