We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The guide to measure what matters.

00:00

Formal Metadata

Title
The guide to measure what matters.
Alternative Title
Knowledge is power! The guide to measure what matters.
Title of Series
Number of Parts
163
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How do you monitor the key performance indicators of your application? Do you know if signups are decreasing versus last week? Have you adopted agile principles but also a hard time to monitor the improvements of your continuous deployments? In this talk we will briefly discuss multiple measuring solutions before diving into the nitty-gritty details of measuring with the help of StatsD. We will implement a few counters and timers and graph these so we can start to make sense of the data. Then we will use powerful functions to analyse the data and spot trends before your users do. After this talk you will be empowered to create your own data metrics with help of StatsD and have basic knowledge how to plot these metrics into meaningfull graphs. Be empowered! Code examples will be in C# but technology demonstrated is not limited to this.
Electronic program guideMeasurementProjective planeInstance (computer science)Decision theoryMetric systemMobile appJSONXMLUML
Video gamePoint cloudInternetworkingBitComplete metric spaceMobile WebComputer hardwareProduct (business)Internet der DingeScaling (geometry)Projective planeLaptopServer (computing)Graphics tabletComputer animation
Software maintenanceControl flowService (economics)Food energySoftware maintenanceMultiplication signMeasurementChainGame controllerComputing platformComputer filePOKEInstance (computer science)Process (computing)BitPatch (Unix)CASE <Informatik>Cartesian coordinate systemSoftware engineeringGame theoryDifferent (Kate Ryan album)Server (computing)InformationVirtual machineInformation securitySoftwareCore dumpComa BerenicesShift operatorBit rateNumberStatisticsRemote procedure callWeb-DesignerSoftware developerArithmetic meanJSONXMLUML
State of matterProcess (computing)WebsiteContent management systemSoftware developerWeb-DesignerMobile appWeb pageEvent horizonComputing platformSystem callPhysical systemMultiplication signMeasurementBuildingInstance (computer science)Form (programming)Confidence intervalComputer animation
View (database)Translation (relic)InformationDialectMeasurementLatent heatCartesian coordinate systemPower (physics)BuildingFile formatDefault (computer science)Analytic setLine (geometry)MereologyFront and back endsDatabaseBuffer solutionMeasurementGame controllerFamilyConfiguration spaceElectronic visual displayGraph theoryFunctional (mathematics)InformationComputer configurationLoginConsistencyData storage deviceType theoryKey (cryptography)Product (business)Service (economics)Flow separationNumberFreewareMobile appCodecOffice suiteDecision theoryComputer fileMetric systemPhysical systemCross-correlation2 (number)Web 2.0CollaborationismStack (abstract data type)WebsiteMessage passingTime seriesMultiplication signDemonInstance (computer science)CodeOpen sourceGraph (mathematics)Crash (computing)View (database)Content (media)Translation (relic)StatisticsCommunications protocolReal-time operating systemWeightFilm editingSet (mathematics)Reading (process)Right angleTunisObservational studyGoogolSelf-organizationDreizehnSurfaceJSONXMLUML
Configuration spaceWebsiteFormal languageImplementationLibrary (computing)Instance (computer science)Physical systemCommunications protocolClient (computing)Latent heatAlgebraic closureConstructor (object-oriented programming)Multiplication signMeasurementCodeMereologyGroup actionLine (geometry)Java appletJSONXML
Java appletView (database)Formal languageContent (media)Translation (relic)Metric systemElectric currentWeb pageDependent and independent variablesGauge theoryCountingDatabaseEvent horizonProcess (computing)Queue (abstract data type)Different (Kate Ryan album)CASE <Informatik>Default (computer science)Message passingBitProfil (magazine)Latent heatContent delivery networkInformation retrievalRepresentation (politics)Scaling (geometry)Communications protocolServer (computing)Gauge theoryDemo (music)Connectivity (graph theory)Physical systemLine (geometry)Software developerInstance (computer science)Type theoryMeasurementMedical imagingGraph theoryCountingArithmetic meanSet (mathematics)Bit rateInterpreter (computing)Slide ruleView (database)Multiplication signFormal languageFront and back endsPointer (computer programming)Maxima and minimaSummierbarkeitMereologyNumberLoginWeb pageGraph (mathematics)CodeEuler anglesExistential quantificationService (economics)Row (database)Self-organizationXML
Cache (computing)MeasurementMetric systemCodeMeasurementProduct (business)Software developerLine (geometry)Instance (computer science)SoftwareServer (computing)Maxima and minimaStructural loadKey (cryptography)Cross-correlationTheory of relativityWebsiteError messageCausalityMathematical analysisGraph (mathematics)Price indexRootOnline helpConnectivity (graph theory)Network topologyMessage passingComputer scienceFlow separationCASE <Informatik>Coefficient of determinationPlastikkarteMereologyGroup actionInheritance (object-oriented programming)CuboidScripting languageSensitivity analysisSet (mathematics)Matching (graph theory)Proper mapCache (computing)Probability density functionGraph theoryNeuroinformatikBefehlsprozessorNP-hardType theoryLoginCartesian coordinate systemJSONXMLUML
Network topologyWeb pageGamma functionWeb pageKey (cryptography)MiniDiscMetric systemInteractive televisionWebsiteArtistic renderingNamespaceData storage deviceGraph (mathematics)Network topologyDefault (computer science)View (database)ResultantGraph (mathematics)Graph theoryMeasurementComputer filePoint (geometry)Selectivity (electronic)Multiplication signCASE <Informatik>Chaos (cosmogony)DatabaseLevel (video gaming)SummierbarkeitHierarchyPurchasingQuery languageSource codeXML
WebsiteTranslation (relic)Content (media)Dependent and independent variablesMereologyAerodynamicsShared memoryTranslation (relic)MereologyGraph (mathematics)MeasurementGroup actionAdditionSelectivity (electronic)Default (computer science)Metric systemMultiplication signGraph theoryGame controllerBoss CorporationInformation securityCartesian coordinate systemFunctional (mathematics)Program slicingLevel (video gaming)Graph (mathematics)Artistic renderingFormal languageDynamical systemCASE <Informatik>Key (cryptography)FamilyDiagramInstance (computer science)Grass (card game)Exception handlingEvent horizonExecution unitHand fanSimilarity (geometry)Insertion lossResultantJSONXMLUML
Graph (mathematics)Menu (computing)Formal languageDependent and independent variablesAuthorizationComputing platformSimilarity (geometry)Error messageGroup actionMultiplicationResultantMeasurementLine (geometry)Series (mathematics)SummierbarkeitShift operatorSoftware frameworkFunctional (mathematics)Demo (music)MereologyMathematicsCross-correlationDot productCombinational logicInstance (computer science)CodeGraph (mathematics)Multiplication signGraph theoryStatisticsHand fanMetric systemTerm (mathematics)PlastikkarteCASE <Informatik>Computer animation
Zoom lensGodProduct (business)Online helpGraph theoryProcess (computing)Multiplication signFingerprintOperator (mathematics)Cartesian coordinate systemFocus (optics)Different (Kate Ryan album)Translation (relic)Computer animation
Demo (music)Multiplication signGodWeb applicationLoginComputer fileInstance (computer science)Game controllerClient (computing)Event horizonCartesian coordinate systemVisualization (computer graphics)Functional (mathematics)State of matterForm (programming)Self-organizationSpeech synthesisSource code
Application service providerWeightGraph theoryCartesian coordinate systemVirtual machineMeasurementCASE <Informatik>LoginDemonOpen setFunctional (mathematics)Computer animation
Menu (computing)Multiplication signLoginKey (cryptography)Near-ringMereologyDemo (music)Dependent and independent variablesBitMeasurementReal-time operating systemGraph theoryLaptopAliasingQuery languageCartesian coordinate systemPhysical system
Demo (music)Electronic program guidePoint cloudControl flowWeb pageCodeReal numberProduct (business)InformationProcess (computing)Right angleBitEinbettung <Mathematik>MeasurementIntegrated development environmentMultiplication signCodeDemo (music)GodPredictabilityObservational studyComputer animationXMLJSONUML
Metric systemFerry CorstenJSONXMLUML
Transcript: English(auto-generated)
Okay, welcome everybody. I would like to thank you all for joining me during one of the last sessions of the NDC 2015. I'm really glad you're all here. My name is Marc van Straaten and I live in The Hague, the Netherlands. And this talk is intended for engineers who have never or barely used any measuring solutions.
I will be talking and discussing how to instrument your app with metrics and how to stay agile and be able to make more and better informed decisions by this. I work at a project agency called Q42 and we are with 60 people, of which 55 are engineers.
And this means that we were doing DevOps before it was called that way. And all these engineers have the same passion. We all like to create exciting new stuff, which leaves a lasting impression.
And you might have heard about us from one of these more globally visible projects. For instance, the British Design Museum, which we have created and built and hosted on the Azure cloud and which has recently won a Webby award this year.
A little bit older, but still very, very actual is Philips Hue. We are now all very into the Internet of Things and we teamed up with the hardware specialists of Philips to create the first truly internet connected light bulbs. And we built the complete server infrastructure for this project on Google Cloud.
It is large scale Internet of Things in production because every day we have hundreds of thousands of light bulbs which are turned on and off via mobile phones, iPad or your laptop. Or the light switch, of course, still available.
As you can see, we at Q42 develop custom one-off solutions. And because the excitement of creating something new, that is something that keeps us going. This and the fact that we are all engineers has big implications on how we manage our company and how we focus our development efforts.
As such, we are always on the lookout to optimize our work process and to stay agile. One of the big questions that we want to answer and that we always want and keep answering is, how do we make maintenance cheap, fast and fun? We are all primarily software engineers and we like to focus on developing software.
We do not like to focus on updating servers with security patches, machine reboots, removing old log files or fixing stuff after a patch Tuesday. Those for us are big time energy and money drains.
Luckily, we have this shift from traditional IT and now we are using more and more platform as a service. And this was a big game changer for us. It was really good because it lowered our maintenance costs because we had no server upkeep or very, very little. But it also had drawbacks because our original tools were obsolete and needed replacement.
And in case of fire, we had no more service. So we could not remote desktop into a machine, poke into the log files, see what is going on. So we were not in control anymore. We had less control. So I am going to talk about the replacement of one of our tools in our tool chain for measuring and monitoring.
It is called StatsD. We will talk a little bit about why I think you all should use it, how to use it and what to measure.
I would like to have a little bit more information about you guys and girls. Who of you is already using StatsD? You can raise your hand. Great! I got exactly the correct audience in front of me. Is anybody already using a different measuring solution, like for instance Application Insights or Kibana?
I see a few more hands. That is nice. So then you can see maybe in half an hour the differences and the overlap between those solutions. For me this started in the end of 2013. It is a little bit back.
I was the lead developer of the website called denhag.com, which is the portal for tourists who are planning on currently visiting The Hague. And it contains all the events which are happening in our sparkling city and suburbs. And it was built using Scrum on an Azure platform.
And it contains multiple websites because we have data suppliers which provide us with the events. We have a content management system for enriching these events, for making custom pages. We have the website and we have an API for our business partners to also be able to see or to display these events on their own websites or mobile applications.
And during development we had a really big urge to validate if new features that we were building were adding enough value. Because the budget was a government agency, the budget is really moderate. So we need to really be focused on that.
We also had a big need to gain insights in the state of our various business processes which were being ran behind the background of this website. For instance, the time it would take for content to be pushed from our vendors and become visible on our website. Because we had a big pipeline, the data needed to be transformed and translated and then become visible.
Or the number of views that we are serving in the correct translation. Because it's a tourist portal, we want to supply the tourists with the best information possible. Or the actuality of our parking information. Because we have a counter of the amount of parking spots which are available when you want to go to the beach.
We found out Google Analytics is a great tool for websites and web analytics and e-commerce but not for custom metrics like this. And Application Insights did not exist in 2013 and still is not up to par to StatsD when you want to have large amounts of custom metrics.
So we started using StatsD and are still using StatsD to answer these questions and answers and more. So you might wonder why we chose to use StatsD. It's really easy to learn. That's a very important part.
It's simple to implement. And because we are a DevOps oriented organization, it gave lots of power to me and my fellow engineers to just define measurements together with the customer and write them in code and it will be done. It can be hosted on premise if required. I would certainly recommend it if you're in production.
But you can also have a hosted solution. And it's free and open source. StatsD does one thing and does this really well. Because it's in essence just a metrics aggregator. It's a Node.js daemon service which you will submit your measurements towards and it will buffer those measurements for a certain amount of time.
Aggregate the values which you have submitted to it and then ship those aggregated values after the buffer has been exceeded to a configurable backend service or a system for storing.
It was originally written in 2008 by Cal Henderson. And you might know that name because he's now the co-founder of Slack. And in 2011 there was a big rewrite towards Node.js by Etsy. And since then it became really popular because Etsy is all about measuring everything that they do.
And they are really fond of it and speaking a lot about it, blogging a lot about it and being really open about what they do. StatsD does not enforce you to measure what you need. StatsD does not enforce what you need to measure. Because it's not an analytics tool with a specific target like New Relic or Google Analytics.
It's upon you to determine what you want to measure. So you can measure the deployments. You can measure the amount of coffee that's brewed at the office if you can find a correlation. You can measure revenue of your product or the application crashes.
As an engineer you are in control. There is no overhead. Simply send the metric that you need and it will be measured. In our office, for instance, we are actively measuring when people are entering the building and leaving the building.
Using StatsD, the data will be coming from within your application instead of the outside layer. And this gives you lots and lots more options to supply information. Because Google Analytics, for instance, will only be able to monitor what will be visible for the users.
And that's what's happening on the background. StatsD has a speed over consistency. So the measurements from your application are transmitted, default, with UDP packets. And this is really fast because UDP versus TCP has no handshakes, no acknowledgements, and it's literally fire and forget.
And due to these decisions it's really lightweight and decoupling your app from the instrumentation. So if StatsD for whatever reason might be does not answer, your main service, your main application will not be disrupted.
And this is an important feature. And because it's sending these UDP packets, it's really fast. So your users shouldn't have to wait while you are measuring everything that you need to measure. And because it's so fast, you are not limited to the amount of things that you will measure. So you can measure everything that you want.
And it's fast, as in the aggregation buffer is default 10 seconds. So after 10 seconds, these values will be submitted to the backend and are ready to be displayed on dashboards. So you're seeing near real-time values. When we say StatsD, we are actually talking about a family of tools which all combine together.
And this graph displays the default stack which you will encounter if you're searching for StatsD. So your application will submit metrics via UDP to StatsD which will aggregate those, buffer those and aggregate the value.
And will submit it to the default backend which is Carbon. Carbon will then store these in a file database which is Whisper, which is a specialized time series database. And then will be used by Graphite for composing your graphs and rendering them using functions for better display.
And because it's also modular, you can swap out any parts that you don't like. So for instance, if you are running in production, I would recommend that you would swap out the Whisper database for InfluxDB.
Because it's having a very high performance yield for the writing to disk. It's also possible for instance that not your application itself is writing to StatsD. But that you're using StatsD in collaboration with the Kibana stack. So that Logstash will be parsing your log messages and will be emitting measurements from the log lines to StatsD.
So then you have log information in Kibana and separate measurements in StatsD. StatsD is really easy to send measurements to this system.
Because it's using UDP and it's using a really trivial format for sending these values. And the specification is simple because it's just a key with a value and a type. And I will be going deeper upon these types later.
There are lots of client libraries for using StatsD. Because the protocol is really simple but not everybody wants to write its own specific implementation for it in your own language. And for instance in C-Sharp we have a StatsD client which you can download using NuGet.
And it will help you with some more language specific constructs. And for instance in this example I'm showing that it will give you a timer using a using closure. And so you can time the actions which are being done within the using. And this will help you as an engineer to really easily time parts of your code.
Or for instance just a counter. It's that simple. You're just adding one line of code and you have got your own measurements somewhere in place in your system. These libraries exist for many languages. I've named a few. C-Sharp, Scala, Java, Ruby, Python. They are there.
And if you cannot find anything that fancies you, something that misses features that you require. It's not really hard to write it your own. Because it's just a UDP sending protocol. And the specifications of the messages are available on the GitHub profile of Etsy.
We got different types of measurements that you can emit from your system. And the first one and the very basic one that you have is a counter. And when you are sending counters this will end up in the system as two different values.
You get the actual count that you have emitted aggregated in your system. And you get the rate of the messages that you have emitted. So that you can use both in your graphs. And you can answer questions for yourself like the number of views in the correct language. Or the amount of requests that you are getting at your website.
Or image scaling requests if you are implementing your own CDN. Or user logins which fail or succeed. These are typical use cases which you can answer with a simple counter implemented in your own code. And then it might look, if you render it, it might look like something like this.
And something which is really easily seen on this image is that the lines are disconnected. And this means something really important. Because this means that in the time that there was no line there were no values. So it's a null value.
You could alter the behavior of how your graph will be displayed. But this is really up to your interpretation of the values. Because it might also signal that something is wrong. Because an absence of a value might also signal that your server is offline for instance.
And in this case it might make more sense to render these as a bar chart. So that you could see that the value null is just zero. The second most important type is a timer. And the timer is not 100% adequately named because it's not just purely used for timings.
The most obvious use case of course is timing parts of your code. But it can also be used for things like the total revenue of your shopping basket. And that might not make sense yet. But when you have a timer what will happen on the background is that
you will calculate the mean and the sum and count of your measurement. But also the upper and the lower bounds and percentiles. And it will by default calculate the 95 percentile of your measurements. And this will help you by removing the really extreme values.
And giving you a more real average, a real mean of the value which is being emitted. And so you can use this for the shopping basket or the total messages which are being sent by your customers. If you're implementing a Twilio server or something like that.
The duration of the page requests etc. And the graph will look the same like a counter. Because it's just the same kind of representation. There is a big difference though with the third type which is a gauge. And a gauge retains the previous value if there are no new values submitted.
So the lines will not get disconnected. And this is really easy if you have for instance something like we created a new document but we don't want to count the total documents when we are creating a new document.
So we're just incrementing. And this is something that a gauge has available. You can use it like setting the current value or you can submit delta values. You can increment or decrement the value. In the demo on the slide it's just a setting of a counter but you can also say plus X or minus X.
And then it will be subtracted or added to the value which is currently emitted to the backend. And then it might look something like this. So what you see here is that the current value is still being emitted unless a new value is retrieved.
And it will be immediately set to the new value. I talked a little bit about how you as a developer can use this and how easy it looks in the network protocol. And I also told you that we have a big stack of components which you need to get this up and running.
And that might be a burden to get started with this technology. But it's actually quite easy because there are multiple solutions to get up and running really fast. For instance we got Docker containers containing all the components required to use this. We have vagrant scripts to get your box initialized with all the components.
And there is also a very very good hosted solution in which you can just sign up and be up and running in five minutes. One thing really important to say of course is if you choose to use a hosted solution,
which I would not recommend for production, do not emit sensitive data as keys. Because UDP packets over the network, Troy Hunt will kill me. Now that we are familiar with the basic setup of StatsD, we are going to talk about the hardest part of StatsD.
And it's almost the same as Phil Carlton has said. There are only two hard things in computer science, cache invalidation and naming things. And the naming things of your measurements is, that's quite true.
It's quite hard to do this properly. So we start with the basics. Know what you want, oh sorry, you want it back? No problem. Know what you want to measure. You as an engineer, know what you want to measure.
Because you are developing new features, you know exactly what your code is going to do. You know what is interesting. And always keep talking to your customers. Because they have a business they are running and they have business needs. They know what kind of metrics they would really like to have. And if you are doing agile development, make your user stories measurable if possible.
After this story is deployed, we should be able to see an increase of users. Longer retention or more product engagement. These are really big key performance indicators for your software.
Make your metrics really actionable. Because, for instance, you could measure things like the lines of code or the amount of commits or downloads of your last PDF. These are really nice to see, but they are nothing more.
They are called vanity metrics. You cannot really correlate why they are the value that they are. And they are really easily faked or really easily manipulated. Actionable metrics, on the other hand, are something which can trigger you to do something.
And these might be, for instance, the maximum load of the CPU on your server. The amount of bad logins on your site. Or the amount of errors in your application. Or after a deployment. Might be a really big correlation between those two. Or after deployment, the errors are gone. Might also be a big correlation between those two.
All these actionable metrics help you to have business intelligence and to prevent incidents. They can help you to do a root cause analysis when you have problems which are arrived. And all these metrics, all these actionable metrics, generally help you to improve your product.
Like I said, a measurement is a key with a value and a type. And it's really important to know that keys can contain path separators. And in this case, it might be obvious, but the dot is a path separator.
This makes it possible to create groupings of your metrics within parent scopes, like a tree. And this also makes it later on possible to have wildcard matches on parts of your keys when you are creating graphs. But it can also lead to madness.
So, I'll show you a few of my personal failures. Because I failed and I have learned. And this is important to share. I have, on the left side, this is the default graph explorer from Graphite. The tree view that it will render when you are emitting graphs.
And Graphite will read from your database which values are available. And on the left side, I had created timers for every page on my site by converting the slash to a dot. Seemed really good at the time. And this makes it really impossible to use the measurements, actually.
Because I'm having measurements on all different levels of the tree. And by having this, I cannot do any wildcard selection because the query language will not understand it. And it's not actionable because I'm just measuring something, but I cannot see what's going wrong, why, why, how.
It was a disaster. I had to throw it away. So I tried again, a different occasion. And in this occasion, I wanted to track the user usage of the site. So I made a per user key containing a unique key for the user on a very high level in the hierarchy of the keys.
And what I resulted up with, what I ended up with was a very long list, like four or five pages long of users. And if I would have taken the time to explore all my keys, I would have
found out that some users would have very much interaction with the site and some almost none. And this made it really hard to find out what is going on. And also, because all your keys are stored on disk, this really increased the storage size and made rendering of graphs almost impossible due to the sheer amount of files which need to be gathered.
Data points opened, aggregated, resulted, returned, and then rendered. So also, not usable. Both ended in chaos and user's metrics. And what I have learned from this, and that's something that I wanted to teach you or to give you, is use a convention, like something like this.
So currently, I'm employing this convention, and it's serving me pretty well. That's app dot my namespace, something, something, something, dot my target, dot a action. And in this, I like to have a noun as a target and a verb as an action, because this helps me to get actionable metrics.
So for instance, I'm having the translations which are found as a key.
And of course, sometimes you have parts of a metric which you would like to have dynamic. For instance, you would like to know which language was missing in the translations. And if that's the case, then try to keep the dynamic part of your key as deep as possible and on the same level.
This will help you to be able to drill down in the measurements. And at the end, use a wildcard selection to aggregate these values back to the original values and dig deeper if needed.
And also, if you are creating dynamic parts, ask yourself, is this dynamic part actually usable? Because we have lots of one-time events, and are they actually usable if I would make this a dynamic part?
Isn't it just, am I going to use this, or am I going to put this on the dashboard as an aggregated value anyway? Because it might not be as usable as you thought beforehand. We've talked a lot about how we could gather data, what kinds of data we can gather, and how to measure this.
But until now, I've not really talked a lot about the rendering of your graphs itself. And data by itself has no value to anyone, except for maybe the NSA. So I think it would be a good time to start talking about how to render these measurements into graphs and put them on dashboards.
The default stack was with Graphite, but I would like to suggest you all to use an additional tool, and it's called Grafana. And as you can see in this diagram, we still use Graphite because Graphite is really good at what it does.
It's a graph composer and has really strong functionality for transforming the values and rendering graphs. But we put Grafana in front of it, and this is what we are going to interact with. And this is because Grafana is not a graph composer, primarily, but it is a dashboard-centric application instead of a graph composer.
And it's really visually pleasing. So this will help you to have faster adoption within your company. You can sell this to your boss or to your customer. Grafana has access control. This is built in, so you can limit people from displaying the graphs.
And the dashboards are shareable, and you can also share separate graphs. So if you have an incident which is really intriguing, you could make a shareable resource about this and only share this part of the data.
You're not sharing your complete metric system. You're, for instance, only sharing five minutes or a time slice. So by doing this, you can firewall your graphite composer to decrease security risks by using Grafana.
And Grafana looks really visually pleasing because it started out as a fork of Kibana. So this is also why you might see very much similarities between those two. To please the demo gods, I put a really small animation on this slide so that we had no issues.
And here you see in action that a measurement is split upon the dots. And you can select individual parts when you are composing a line of your graph. Or you can use wildcard matching on your graph to gain multiple lines because you are matching multiple values, multiple measurements.
And we can use functional composition to enhance these metrics, these measurements. In this example, I'm using a sum series to aggregate multiple lines back together to one line.
So by doing this, we can modify how our data will be displayed to ourselves or to others who will be using these dashboards. And you get multiple ways of visualizing this data because you can, for instance, get regular charts. But you can also have bar charts, in which case, like I said in the beginning of the talk,
it might make more sense if you have no values, which is a good thing. Then bar charts might make sense. Lines and bars by itself might not be enough to make sense of the data.
So for this, we can have, for making correlations, there are additional functions within the framework of Grafana and Graphite, which you can use. For instance, you could use the time shift function and transpose the values of yesterday over the values of today.
So that you can determine if there is a change for the better or for the worse between those two. And in this graph, you could see that yesterday we had some interesting spikes and today it's really quiet or really average.
Another big feature for collaborating, a big feature for correlating the behavior is with annotations.
Because some actions might have implications on how your graphs will show up. And for instance, a deployment of your code might be the result of large
amounts of errors or amounts of instances which are spinning up, more data usage. It can cause all kinds of problems. It can also solve all kinds of problems. So it would be really nice if you could make this correlation between these two. And with Grafana, this is easily done. Because you can create annotations, which is
the red line in this image, of ordinary measurements which you have made yourself. And the reasoning behind it is really simple. If a value exists for the measurement given, then render a vertical bar and otherwise render nothing.
When you are using StatsD in combination with Grafana and you have determined these measurements for yourself, you might end up with something like this. For the NACH.com, this is something that we ended up with.
And it gave us insights in the requests, in the operations which our vendors are, the updates and deletes the vendors are giving us, the processing of the background processes, our translation concerns, and this helped us greatly to decide what to do from the inside out of our application.
And to emphasize, this might be what will be ending up looking for you. Because everybody's product is different. Yours is unique from mine. And these graphs are something which matters for us now. But they are evolving together with my product. So they might end up looking different in four weeks if we shift focus of the product.
So this is evolving. I have time for a very small demo if the gods allow it.
Here I have the file new web application in Visual Studio. So it's very simple. And I have installed the StatsD client in it. And in the initializer of the application, we configure it as I have said.
And I'm creating an instance started event which is to simulate for us now in a demo what it might be if you are spinning up multiple instances. Or if a deployment has been done so that we can have an annotation of it later. And I'm having a timer at the beginning of the request which will be submitted at the end of the request.
And I have instrumented my account controller with in the login functionality a counter if the login will succeed. And a counter for if the login will fail. So that later on I can see if something fishy is going on.
For instance, if the amount of failed logins is increasing while the amount of successful logins is staying stable. That might indicate that somebody is trying to get in. So if I load up this application, it will perform like a normal application.
And we have the login functionality which still is working as it's supposed to be.
So in this case, I'm not getting in. But in the background, my measurements will be submitted to the StatsD daemon which I'm running in my Vagrant machine. And I've got Grafana open on the other hand on the other tab.
And in this one, I have created two different graphs of the last 15 minutes or last five minutes. And what you can see here is that our application has started up a few minutes ago. And I created a few responses. And the timings were not that good because my laptop is a little bit slow, I'm talking.
And it was booting up. And we see that the amount of login attempts of a failure was steadily increasing. And to give you an idea of how simple it is to manipulate these graphs, I can just edit this one. And here you see exactly the same values that we had on the demo on the slides.
And I can just manipulate this. And I can also make a secondary query by selecting parts of the key. So NDC demo, users, login succeeded.
And what you now see is that the keys which I have used are visible in the legend, which might not be really something visually pleasing. So I can append an alias and just say failure. And now my legend will look really nice.
So this is just to give you a really small demo of how fast these measurements are getting into the system, because it's near real time. And how easy it is to render one of these graphs on a Grafana dashboard.
So to have some takeaways from this session, you as an engineer know what to measure. So it's your job. You are empowered to do this. I've shown you the tools. They are easy. You can easily do this. And DevOps is real. So if you're not doing this already, it's time for you as an engineer to instrument your code with these measurements.
Work together in your team, work with your customer, make it happen. Because this information is really valuable for everyone. StatsD will never break your production code. This is really important to emphasize, because you never want to have your additional processes hamper your production environment.
And you can be enlightened in five minutes. My demo was a little bit shorter than five minutes. But if I would have to deploy my code to Azure within five minutes, I would have some information.
And the hosted solution might not be the perfect solution, but if you are having issues in production right now, just getting the trial account in the hosted solution and embedding these measurements in your code might give you the information that you need to fix what's going on right now in five minutes.
So this is not a solution which will take you many hours as an investment with a large return on investment. You can get literally started in five minutes. So I hope that there are some questions maybe. And otherwise, I would like you to say, have a nice day, have a safe trip home, and don't forget to vote at the exit.
Are there any questions? Okay, then I would have to say, have a safe trip home and see you around.