We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Art of Performance Evaluation

00:00

Formal Metadata

Title
The Art of Performance Evaluation
Title of Series
Number of Parts
29
Author
Contributors
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Production PlaceOttawa, Canada

Content Metadata

Subject Area
Genre
Abstract
"Contrary to common belief, performance evaluation is an art." (Raj Jain, 1991) Successful performance evaluation may not be achieved with merely executing common benchmarking tools. This talk presents fundamental principles of performance evaluation and how you can put them into practice. Do you understand what exactly "pgbench" does? Is it appropriate workload for your performance evaluation goal? Common benchmarking tools like "pgbench" are handy for just comparing system A and system B, but if you intend to deeply understand the performance of your system, answers to these questions are critical. In order to conduct a meaningful performance evaluation, the methodology should be elegantly designed to meet the goal of the evaluation: choose metrics for the goal, and choose observation techniques for the metrics. Each step requires careful consideration and deep knowledge about the target system. It cannot be done mechanically. This is why performance evaluation is an art. This talk presents principles of designing performance evaluations and shows how you can put them into practice by introducing the speaker's experiences of performance evaluations with PostgreSQL.
Performance appraisalPower (physics)Pole (complex analysis)InformationTransformation (genetics)Power (physics)Electronic data processingMoore's lawPhysical lawVirtual machineInformationTransformation (genetics)DatabaseBitPhysical systemPerformance appraisalData storage deviceUniverse (mathematics)EvoluteTwitterProcess (computing)outputRight angleDiagramKey (cryptography)Computer animation
Metropolitan area networkInternetworkingPhysical systemMobile WebInformation systemsSet (mathematics)File formatInternetworkingInformationPower (physics)MassPhysical systemKey (cryptography)Internet der DingeReal numberSource codeElectronic data processingDatabaseTransformation (genetics)
Transformation (genetics)Physical systemComputer hardwareSystem programmingMathematicsPerformance appraisalMetropolitan area networkBlogIntegrated development environmentCode refactoringSimulationComputer networkMathematical analysisData storage deviceKernel (computing)BefehlsprozessorMiddlewareRead-only memoryMaxima and minimaElectronic mailing listSoftware testingMagneto-optical driveData modelElectronic program guideComputerGamma functionPort scannerSocket-SchnittstelleZoom lensMiniDiscIntelGame controllerCache (computing)FreewareBoss CorporationPerformance appraisalPhysical systemField (computer science)Range (statistics)Arithmetic meanRight angleCache (computing)LogicConnectivity (graph theory)EstimatorNichtlineares GleichungssystemSoftwareSemiconductor memoryInsertion lossGoodness of fitPower (physics)Forcing (mathematics)Point (geometry)Graph (mathematics)Computer configurationTransformation (genetics)Computer hardwareMeasurementCASE <Informatik>Fitness functionVirtual machineMereologyHard disk driveThermal expansionCore dumpSelectivity (electronic)Procedural programmingSystem callElectronic mailing listData storage deviceForm (programming)TheoryEndliche ModelltheorieCycle (graph theory)Speech synthesisMetric systemPerspective (visual)Computer architectureMathematical analysisFundamental theorem of algebraDatabaseDependent and independent variablesCoprocessorInformationObservational studyProcess (computing)BlogBefehlsprozessorMiniDiscBenchmarkClassical physicsCartesian coordinate systemOrientation (vector space)BitMultiplication signEnterprise architectureElectronic data processingValidity (statistics)Response time (technology)WorkloadFlow separationResultantArithmetic progressionMechanism designPhysicalismComputer animation
Process capability indexFlash memoryData storage deviceCoprocessorCASE <Informatik>Flash memoryProgram flowchart
Power (physics)Dependent and independent variablesGame controllerMetropolitan area networkBit rateOperations researchTouch typingPerformance appraisalFundamental theorem of algebraSimulationMathematical singularityPhysical systemMathematicsMereologyData modelApproximationCore dumpVariable (mathematics)NP-hardMiniDiscAsynchronous Transfer ModeCylinder (geometry)RadiusLogical constantRotationStatisticsComputerService (economics)BefehlsprozessorData storage deviceServer (computing)AverageMaxima and minimaLemma (mathematics)Inclusion mapSystem programmingDensity of statesBenchmarkElectronic program guideEndliche ModelltheorieIntegrated development environmentClient (computing)Block (periodic table)Port scannerParameter (computer programming)Computer hardwareSoftwareMathematical analysisTuring testReal numberParsingProcess capability indexDistribution (mathematics)Proper mapRippingPhysical systemService (economics)Product (business)Endliche ModelltheorieRotationMeasurementSimulationBefehlsprozessorStructural loadReal numberCylinder (geometry)Metric systemApproximationPopulation densityConcurrency (computer science)Operator (mathematics)Queue (abstract data type)Data storage deviceGame controllerBenchmarkDistanceWorkloadNumberCartesian coordinate systemGraph (mathematics)DataflowAverageProcess (computing)Bit rateMultiplication signMiniDiscDatabaseNP-hardClique-widthMusical ensembleFundamental theorem of algebraCondition numberQuery languageInfinityOrder (biology)TwitterImplementationProcedural programmingSoftware bugTheoryCore dumpParameter (computer programming)PlanningWeb 2.0Channel capacityView (database)Form (programming)Performance appraisalSign (mathematics)Point (geometry)ResultantFormal verificationTerm (mathematics)Key (cryptography)Data recoveryType theoryServer (computing)Analytic continuationSigma-algebraInformation overloadEmailReading (process)Constraint (mathematics)Distribution (mathematics)Well-formed formulaDifferent (Kate Ryan album)Line (geometry)Hard disk driveDisk read-and-write headThermal conductivityThresholding (image processing)2 (number)Mathematical analysisDatabase transactionBitInsertion lossSensitivity analysisResponse time (technology)Pattern languageHypothesisSingle-precision floating-point formatCoprocessorCovering spaceBasis <Mathematik>InformationNeuroinformatikCASE <Informatik>FrequencyPower (physics)MereologyOrientation (vector space)Food energyComputer programmingSlide ruleElectronic program guideMultiplicationFunctional (mathematics)Variable (mathematics)Observational studyIntegrated development environmentRun time (program lifecycle phase)Level (video gaming)Natural numberExecution unitDivisorTotal S.A.Thermal fluctuationsGoodness of fitSemiconductor memoryVideo gameAuthorizationRight angleFormal languageRootkitRadiusHistogramDependent and independent variablesTheory of relativitySummierbarkeitInformation securityIdeal (ethics)Operating systemSkalarproduktraumFamilySoftware maintenanceQuicksortRoboticsWave packetData structureWireless LANOpen sourceSocial classPhysical lawFile formatMechanism designEuler anglesCuboidTriangleExpressionContext awarenessSphereInclusion mapPole (complex analysis)Surjective functionMaxima and minimaState of matterField (computer science)AreaPresentation of a groupMatching (graph theory)Inheritance (object-oriented programming)Matrix (mathematics)Auditory maskingDerivation (linguistics)MultilaterationDirection (geometry)Pulse (signal processing)Client (computing)Electronic visual displayRadical (chemistry)Computer animation
Transcript: English(auto-generated)
So today I'm going to talk about the art of performance evaluation. I'm a university researcher at the University of Tokyo, and I'm doing database and storage system research. So this talk is going to be a little bit conceptual and opinionated talk, so I hope you could enjoy this
talk, and I'm happy to, I'm happy if I can hear your opinion on this topic. So let's get started with this question. What is the law of information technology? What is the law of information technology?
My answer is the law of information technology is transforming the computing power into business or social values, or you can just say user's value. This is a tabulating machine, the first automatic data processing machine in the world, and this
is built for national population census in the United States. So the population is the very fundamental information for the nation. And actually, this tabulating machine is converting machine power into valuable information.
And performance is the key criterion of this transformation efficiency. So computing power is not necessarily machine. Actually, human is a kind of computing power. So national population census was done by human,
historically. And tabulating machine drastically improve the performance of data processing. So the evolution of information technology is actually the history of this improvement
of this transformation. And today, technology trend is still basically about transformation of computing power into user's value. Big data is a very, very good example. This is an attempt to extract valuable information
for a massive amount of data. Or Internet of Things, or cyber-physical system, this is an attempt to capture the real world with many sensors and doing something interesting. So today, the data is the source of values,
and the performance of data processing is becoming more and more important. So from technical perspective, database system is a very important fundamental of transforming data
processing power into user's value. So database engineers are very important today because they know how this works. So their deep knowledge of database system is directly connected to the performance of system.
And performance study of database system is really end-to-end. Maybe you have to know how disk works, how SSD works, how processor works, how memory works, how network works. And of course, you have to know how application works and even how business works.
So it's really end-to-end from physics of hardware to logics of application. And every component technology is changing and growing very rapidly in this field of IT. And sometimes, architecture of computing system
also drastically changes. So learning every component technology is necessary but not sufficient. So important point is acquire the way of learning how this transformation works,
how database system can work together and achieve performance. That is performance evaluation. Performance evaluation is the key process to understand performance of systems. And the important point is the performance evaluation
must be user's value oriented. So every performance evaluation starts with setting a goal, a goal which contributes to the user's value. And then evaluation procedure is designed toward such goal.
OK. When talking about performance evaluation, maybe many people think performance evaluation, oh, it's benchmarking. I can do it with Unix bench or PG bench or DBT3.
Yeah, that's all. You may remind some blog posts, our fresh new no secret is faster than your old fashioned classic SQL database, something like that. Frankly speaking, these are not performance evaluation.
Yeah, benchmarking is important skills. And it's actually the part of performance evaluation. But this is just a mean of performance evaluation. And there are many individual skills which is necessary for performance evaluation. But they are not the core part of performance evaluation.
So performance evaluation is goal oriented process. So the philosophy of orchestrating individual skills is the core part of performance evaluation.
Let's take an example story. For example, if your boss says, we need faster storage for our Postgres database, so maybe your goal might be find the best price per performance SSDs. And one possible approach is list all SSDs
available on the market. And buy all SSDs available on the market. And try all SSDs with PG bench. And you can find best SSD. Good? This brute force style makes sense
if you have much, much money and much, much time. But if you have a philosophy of performance evaluation, you can do it smarter. First, characterize the work load and model performance with some performance metrics.
And estimate the performance. So if you have model, you can estimate performance with specs on data sheets. And so you can validate your model with measurement on several SSDs. So in this point, actually you measure the performance
with actual SSDs. And so that validation confirms the model. You have now a good precision model. And now down the candidate SSDs. And finally, confirm the detailed performance
on candidate SSDs. And you get the final result. So a good point of this approach is you do not have to try all SSDs on the market. And acquired model can be used in the future evaluation. Yeah, that's good.
And another important aspect of performance evaluation is that a goal could be changed. Because performance evaluation is the process of understanding performance. So its progress might deepen the insight into the target system.
And sometimes it identifies more important goals. And goal will be changed. So key point is keep considering what is the important goal for your user's value. And then when goal is changed, then
orchestrate the individual skills toward the goal. So this is the philosophy of performance evaluation. And it enables you to meaningful performance evaluation. Or this philosophy can be said as the art of performance
evaluation. This is very famous book about performance evaluation. And it said performance evaluation is an art. So it cannot produce mechanically. And it requires intimate knowledge of target system and careful consideration about selection of methods and tools.
So how can we develop our own philosophy? This is not easy, honestly speaking. And the cornerstone of philosophy is experience a lot.
But important thing is the theory behind experiences. So theory forms the foundation of experiences. And experiences helps deepening the understanding of the theory. So keep trying this cycle. This is a way to develop your own philosophy.
And so as a database researcher, I experienced various performance evaluation. So let me introduce some examples. This is the measurements of latency of processor cache
and memory system. In NUMA architecture, each processor has local memory. And processors are connected with interconnect like QPI. So access to remote memory takes extra cycles,
extra CPU cycles. Or a modern processor has multi-level cache. So smaller cache is faster, like this. So it is a little bit difficult to measure
this latency, but it is possible. So maybe this is an interesting graph for you. Or another example is the case with traditional disk LA. This disk LA is mid-range cross enterprise storage with 160 hard disk drives.
And so when I start my master course, I did many measurements on this machine. For example, IOPS passes response time like that. And more interesting example is performance
of PCI Express flash storage. So PCI Express flash storage IO latency is very, very low relative to existing storage devices. And so in this case, latency of processor interconnect
is very visible as a portion of IO latency. You can measure that. Another study is power consumption and energy saving of database system. So this is my master thesis.
So modern processor has functionality of changing operating frequency in the runtime. And so by injecting application performance information into frequency controller, aggressive energy saving can be achieved with maintaining
some service levels. And experience sometimes requires question of hard work. So this is the graph of power consumption fluctuation. So even if the system is idle, power consumption
is fluctuated by many factors like temperature or total power load in the building, blah, blah, blah, something like that. So in order to measure power consumption precisely, I had to do experiments during the power is very
stable period. So this graph shows power consumption is relatively stable from 0 AM to 6 AM. So when I was doing this experiment, I showed up laboratory every midnight
and start experiment at 0 AM and finished experiment at 6 AM and went to bed when everyone is waking up. This is just a side story. So the remaining part of this talk, I present the fundamentals of performance evaluation
for principles and its basic techniques. This is just for basic theory because I don't have much time. So if you want to learn more detail, you can try this book. And in this talk, I'm going to explain the theory with my practice and experiences.
OK, let's move on to principles. Principles are very simple. As I repeat it, define a goal first. So performance evaluation must be user's value oriented.
So define a goal which contributes to user's value. And this is very, very difficult process usually. So initial question of performance program might be very value and usually subjective.
So your job is to elaborate it, elaborate them, and set a clear and objective goal. So if you can set a goal, maybe you already have finished 50% of your job or 60% of your job.
So the remaining process keeps the goal oriented and design the procedure toward the goal using appropriate techniques. OK, so move on to basic techniques
of performance evaluation. There are three basic techniques of performance evaluation. And first one is modeling. Modeling is expressing the performance in a quantitative form, usually in mathematical form. The real system is usually too complex
to analyze every single detail, so approximation is the key to understand the mechanism of a target system and find the important variables and formulate the performance in a mathematical form.
So this is modeling and the core of scientific approach. OK, let me give you an example. So maybe you know how hard this works. And the hard disk consists of rotating product
and thick header, sigma magnetic header to read data on products. So the throughput of hard disk drives can be modeled by this easy mathematical formula. So this is how many bits are swept by the head in a second.
So from this model, you can know that throughput is proportional to the density of cylinder. And the density of outermost cylinder is the largest.
And so inner cylinder has lower density. And you can confirm that with measurement. So measurement clearly depicts the performance. So the outermost product has the largest throughput.
And the innermost product has lower throughput. Another example, the latency of hard disk drive. So latency of hard disk drives consists of thick latency and rotational latency. So if you are lucky enough, access rotational latency
is almost zero. But in the most unlucky case, you have to wait for the product to go round. So latency can be expressed in this form.
And you can confirm the model by measurement. This graph shows the disk latency. And x-axis shows the thick distance. And y-axis shows the latency. So as you can see, the larger thick distance
increase latency. And also, this width of latency band is exactly equal to the rotational latency of this hard disk drive, 10k RPM.
And another popular modeling technique, queuing model. Queuing model consists of a server and a waiting queue. And its behavior is described by arrival rate of customer and average service time.
So when you consider CPU performance, you can model CPU as a customized CPU instructions and service is execution unit in CPU. So when you consider storage system,
customer is IO request. And service is storage controller or individual disks. And when you consider database system, customer is a QLE and service is a query executor process. And the simplest and useful modeling is MM1.
I don't explain too much detail, but this consists of one service and one infinite queue. And with some conditions on arrival rate and service time distribution. And if you do a simple math, average response time
can be calculated as this simple formula. And it is not so difficult to extend this to multiple servers. And using this model, we can do some interesting analysis. So here, I present performance model of OLTP system
with variable operating processor frequency. So frequency goes down, the throughput decreases, and response time increases. So you can model the response time with multiple service
queuing model. And so by parameterizing processor frequency, I can formulate the average response time in this way. And this graph shows the cover model average response time.
So y-axis is the throughput and x-axis is throughput of system. And the measurement result is this graph. So as you can see, this model clearly
matches with the trend of actual measurement. So even with this simple model, you can evaluate the performance impact of frequency. So this measurement was the basis
of application performance-aware energy saving, as I explained in the former slide. OK. So modeling is a very important step to understand system. Maybe good model provides you a good view
and enables you to predict performance and scaling trends. And also, it helps you to notice that incorrect measurement of value caused by implementation of bugs because you know the trend of performance by model.
But system is actually very complex usually. Sometimes modeling can result in a bad model. But this is not bad news.
Bad model is also informative because it indicates that you are missing something important in modeling or measurements. So you can try to improve your model. And maybe it also indicates that system is too complex.
So in that case, this is a sign that you should change the approach toward the goal. OK. So next technique is measurement.
When doing performance evaluation, many people tend to do measurement fast. But measurement without modeling is totally pointless because you cannot understand the result. So if you have a model, you can validate the result with the model. So you can understand, oh, this is working.
So if there is a difference between measurement and model, you can notice that model or measurement, something is wrong. So do not start with measurement and do start with modeling.
And this is a guide for meaningful measurement. So measurement is not just running a benchmarking tool. So there are many things to be considered.
So work load and metrics, measurement environments, and measurement methods and tools. Then conduct measurement and analyze results. So this is a very simplified example of the story. So here, I assume that I want a storage system that
can support some TPS transactions throughput. And work load is random IO for many clients. So in this case, metric is the IOPS of storage system. So before measurement, model performance. In this case, work size and concurrency of IO
might be important for performance. And also, the IO queue depth in various layers in system might be important. And design measurement environment before design
carefully measurement environment before conducting measurement. So when you want to measure the IOPS of storage system, ensure that IO request is really reaching the storage system, not cached by main memory.
And performance measurement is monitoring of flow of resources in the system. So designing environment is designing the flow of resources in the system. And then pick appropriate methods or tools. So I repeat that performance evaluation
is a very goal-oriented process. So do not start with your familiar tool, but appropriate tool for the goal. And be careful about what metric is really measured. For example, Iostar is very famous tool for measuring IO performance.
It can measure the number of IO requests issued from operating system, but it cannot measure the number of IO requests actually issued to the individual disks behind the storage controller or the number of read-write system calls from application.
So be careful about what you are really watching. So we are all set and run measurement. And maybe in simple case, you can run measurement with exhaustive parameter values. But when you are doing measurement
with multiple variables, maybe it is unrealistic. So plan your experiments with minimal steps. And if you have a good model, model will tell you where you should measure next. And after the measurement, do not
forget to analyze the result. So if your result matches with the model, this is very good. It means you have a correct understanding of your system. That's very good. But if results do not match with your model, something is wrong.
So keep trying to improve model or measurement method. Let's move on to the simulation. Simulation is if you do not have a target system or if it does not exist in the real world yet,
simulation is a very effective way to evaluate performance. Or simulation is also used for detailed behavior analysis. Modeling is usually difficult. Modeling every single detail of real system
is usually difficult. So modeling is done with some approximation. But simulation can do much more detail. And in terms of database system, one interesting simulation technique is I-O replay.
In I-O replay, first trace the I-O pattern during the actual execution of workload and replay the I-O pattern on hypothetical devices. So in this way, you can evaluate the performance of various devices with realistic workload.
This graph shows the results of I-O replay on various devices with TPCC benchmark in this case. And so we're playing I-O workload on various hypothetical systems like hard disk,
and it's not a disk and PCX SSD. So you can observe how I-O can be handled in every system in very detail. And simulation is a kind of what-if analysis
based on some assumptions. So a realistic assumption can easily relate to nonsense results. Be careful about assumptions. And so simulation without other techniques is a bit dangerous. So keep in mind that assumption must be validated with other techniques
like modeling or measurement. OK, now I covered the three basic techniques, modeling, measurement, and simulation. And each technique is not so useful by itself.
So in order to deeply understand the system's performance, using multiple techniques and validating each other is the key. So I'm wrapping up this talk. Maybe I have speak a little bit faster than I fixed it.
So in this talk, I talk about the art of performance evaluation. The art of performance evaluation is very, very fundamental ability or capacity of database engineer.
Because it is a key process to understand how system works and how system achieves performance through end-to-end. So because performance evaluation should contribute to user's value,
this is very goal-oriented process. And then procedure of performance evaluation should be designed with basic techniques, modeling, measurement, and simulation. So this is the basic theory of performance evaluation.
And based on this theory, your experiences may develop your own philosophy of performance evaluation. OK, that's all for my talk. Thank you for listening.
Do you have any questions? What do you think is the best way to go about presenting? Could you explain? Presenting?
When you have to pre-man your model to yourself, you establish your model, you test it yourself, you prove it to yourself, most of that information contains very technically based web and business version.
Do you have any tips for how to convince someone in that market who is not necessarily technically savvy what types of training? That's a very important problem. And so in that point, I'm not so experienced
because I'm a university guy. So maybe all of you are more experienced in communicating with the business people. But the important point is simplifying things
as much as possible. So if it is possible, trying to calculate
the actual economical impact of introducing new storage system or something like that, so keep it simple.
So trying to translate the technical jargon to business person's language. Does it answer your question? Also, in my experience, if you get it proven and you get the results there, so there's somebody else who's technically going to also verify it,
take that, keep it that. And then you rephrase it in terms that actually are really useful to the business unit, such as, like he was suggesting, economic impact or impact to users, or importantly, like disaster recovery scenarios, like business continuity. If you phrase it in terms, those kinds of terms,
like they're going from business time to day, I don't understand percentile and percentages and stuff like that. So talk about how it impacts either the bottom line or the business continuity or whatever. And then have this also backed up saying, we've done the research and this is the best way to go.
So it sounds like it's also more of an art. It is an art. It is a conversation. And yeah, don't overload them with technical jargon. That's the term. Don't overlook what it is. I think that if you say, Google is doing it.
Performance, inherent response time constraints
in their nature. The human brain gets distracted when your simple web request takes longer than two seconds. You start looking, your brain starts looking for alternative things that it can do.
So if your computer response time goes beyond certain thresholds, that's no longer tolerable because it impacts the human's performance.
And a lot of business people don't see that. They see a cost benefit between response times and how much that costs in hardware, but how much the productivity loss is when the response time exceeds certain thresholds is important here too to factor that into cost benefit analysis
as one example. Any questions?
No questions? OK. OK, so thank you so much.