Merken

A New Kind of Analytics

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
the the end of the track and the and the hello everybody I'm out of my fellow and that cofounder of the company called overlap you can find me on Twitter @ polymer-rich of the and so
on so that about me I'm a developer turned entrepreneur I have been of in the high-tech industry for a long time I love solving hard technical problems and I come originally from Italy that I've been in the US 20 years and so if you don't mind me writing code usually outdoors hiking so this is about
performance and we heard we heard a loud and clear here at the rate is called that faster is better so we all know what performance see is that it's good to understand we the impact of low performance and when I talk about performance here I really mean speed and responsiveness the speed of response units that your application believers to your users so so there is a
famous quote from that Larry Page the thespeed ease product feature number 1 so you really need to focus not only on your functional requirements but also in a non-functional requirements and speed is paramount for any web today and there is a lot of research in data that
backs this up and on shows what is the impact of low performance so impacts visibility deathly affects your SEO ranking it impacts your conversion rates it impacts your brand and the perception that people have of your brand and your brand loyalty your brand advocacy it impacts your cost and resources because the tendency for low performance is usually to overprovision and that's not usually the response the right answer so that's so speed today for web application is paramount an and then if you have a develops
models if you move to a full combined engineering to model where development and QA a combined and i development QA and sysadmin or parts are combined you have through the box model where you have the and you have adopted continues delivery and agile methodology which is like this time there today for web development then it
becomes even more critical so performance to date in the
cloud we have a fully programmable elastic infrastructure and you're adopting continues delivery at becomes even more critical you need to be able to bless every bailed and I make sure that not only works but it works at the right speed so
then what what do you do or how do we have we tackle this problem well the 1st thing is you need data so this is what I quoted actually still with pride from a talk yesterday and I love it in god we trust for everybody else give me data right so of
so at end is the is a good model you deploy and then hope for the best then you have your customers so you users being essentially you're QA department it's not that we know about that I know the company that says you know it's an e-commerce application and it is they say all we know when we have a slow down because I users complain of Facebook well that's not usually the best way to do it so you need data and you need a lot of data so let's get started so there are different
types of data so on their basically on the right-hand side you have your deployments site your production where you're diploid you have your life traffic and that's usually goes and that big umbrella of monitoring so there you have all sort of monitoring data and techniques and then on the left-hand side that's your testing environment it's usually of usually people have a preproduction environment or a staging environment sometimes you can also test on production there you have your synthetic traffic so you're you're simulate you're creating new users in you're doing performance testing so these are the 2 most typical of source of data today's so let's start with monitoring so you have different
many kinds of monitoring so you have have you monitor your stock you monitor your infrastructure you do some sort of log aggregation you monitor what the users are doing with your application and what the user's behavior and what are the most typical user behavior or what not corner cases and then you have to have voice cultivate uh streaming analytics right high-frequency metrics where there are solutions that can be touted the platform as speed so any there's some would be examples of the solutions that exist today this is not we're not associated in any way with any of those but it's just to give you an idea of the wide spectrum of monitoring and and beta instrumentation solutions that you can find all of these complement each other there is not 1 of piece that fits all and it all depends on your application an and there is an interesting problem today you get all of these nice dashboards and how do correlate all of these data and you out exactly what but the 1st step in that the monitoring as they say you 1st instrument and then ask questions however that
monitoring is not enough and why so 1st of all you're track your light is noisy so your life that you have also also uses 2 we all sort of things it's very hard to troubleshoot if you have a scenario you're interested in and it's perhaps problematic at the same time you have other users doing other things at that has such the system responds in unexpected ways the other problem with well lap with monitoring is that it's after the fact so monitoring doesn't help you predict and he doesn't help you prevents the problems that might occur with your application thank so like a
friend of mine saying you know monitoring is like calling triple-A after the accident it's useful but usually you want to prevent the accident instead so
what that means that monitoring is the 1st line of defense the 1st thing you gotta do so then what are you going to do then we're going to pair up performance testing with monitoring so the to complement each other really really well and here's
why so we're gonna look at the left hand side of our datasets that data sources and here we're gonna look at things that extracted so it's not your life you have the ability to create your traffic in you going to do some performance testing any could be on a preproduction environment on a staging environment usually you don't want to makes your synthetic traffic would your life traffic and you don't want a synthetic to have an impact on your real users so that's why you test on preproduction but you could also test on production and for specific applications or of or specific times of the day if
so when performance that basically the the user scenario but the traffic is absolutely reality you have total control over the amount of traffic and the user scenarios the workflow but because that's about what how you design your tests so troubleshooting is simplified here because you have an easy way to reproduce specific scenarios that you thought were problematic and number 2 in terms of being beyond you know which is a typical troubleshooting approach you have an already controlled 2 variables the amount of traffic and what the users are doing and then the other advantage of performance testing is that you get an end to end user metrics so you you're you're measuring exactly what the user sigh experiencing in this is not about server metrics or database metrics or applications metrics over will be metrics it's it's the 2 went away so we we've seen some numbers where there was a factor of 7 and in between the end to end user metrics and the traffic and the server matrix so the server appeared not to be suffering that the users did not get a good performance at all so in order to have a good that the complete you we really need to get to and use of metrics and the
other advantage of so if you contesting create realistic scenarios as close as possible to what users are going to do and then the goal here is to figure out problems in advance before they happen so again 1 of the problem monitoring is that it's after the fact here we're going we're coming before monetary so we're doing things before that they happen so that you have time to optimize and you can't measure unless and until you can't optimize unless and until you measure so you want to realistic scenarios if you have mobile and a mobile applications property of web applications that it's absolutely critical you test your mobile and traffic as well if you are around the world that's a global application you need to test from different geos and then the end to and measure the KPI predict and 20 user experience the parametric
so this is is a lot of that this is around time and so time is a variety of ways of saying this is a response time or some people call latency but essentially it's time to complete transactions time to complete and and specific requests uh averages distribution and you could get through which the number of successfully request 1st test corpora specific time intervals and then you can get also error rates if you're if you see some suffering on this on the server side you can start seeing errors and then again the goal is to resolve resolve issues before you deploy the and then we went to press still
this is out so softer changes all the time and as such it's important to understand whether a specific change is going to impact how you users are going to interact with your platform and it's not just important that the software does with its expected to do but it's also does what is expected to do at the right speed the other point here is that no matter how even if you don't change anything things change around here so applications today yeah spidery there have hundreds of possible optimization points they in plug-ins you're sitting on a cloud infrastructure so and this is a complex problem and the only way around it is to test the offered so test for every change out test if you're going into peak traffic you don't want to go blind into that test if you have any types of infrastructure changes to Europe deployment there is a very good that example where and at some point a while ago several years ago Iraq will change something in their routing system and that change was not publicized the given that oratorically was not openly publicized and and the only impacted a specific set of applications and body impacted them greatly and so and people realize because they started taking measurements and they saw being different so the applications did not change but in that in that specific example the cloud provider I made of the change and the only way to identify these kind of things to measure so
I guess what this is field knocking out and why well you can get results like that's where you say wow
I have a lot of our stand at and the traffic I apply a linear ramp that's the kind of the green and the green bars so I get kind of errors my response time increases dramatically then at some point decreases because the server doesn't respond to requests so or you could get things like
well my test they're telling me I have 10 thousand concurrent users I get my response time deteriorates from 400 millisecond to 2 . 5 seconds and so OK in your test are telling me at your test there telling you that your system is slow or will be slow under specific traffic and scenarios but it's still not actionable you still don't know what to do you just know that you're going to have problems it's almost like I'm gonna tell you well when you have 50 thousand users on your platform you're gonna have a paper but there is no medicine so what if we can extract some more information from these data and find a medicine it so
stay with me so if you if you look at the typical performance troubleshooting process ironically it and where people spend time the majority of the time he spent number 1 in the reproducing the issue with the right they cut and number 2 in isolating the issue and then once you've done that the actual fixing of the problem is relatively straightforward so the reproducing is about and I have a very good example here there is a and at the is and a company that I know and their client would it was a big bank in India and they had a performance problems with the applications they had any took 2 weeks in between the time differences and the engineers on 2 sides of 2 different continents 2 weeks with a whole team in a room and constant conference call because before they were able to just reproduced the problem at hand and have the data so so Richard using is partially it or its addressed by performance testing but then you're left with the issue of isolating the problem and isolating a problem usually takes a lot of time and it's a lot of effort and developers are left with doing a lot of correlation with the top and it turns out to be a minor low and high time-consuming process but then once you're once you're done with isolated than the fixing becomes relatively straightforward so what we
want is actually the ability to go from if you go to the left that's before testing that's your ability to use you don't even know that you're going to have a problem then once you test your like yeah you know we're gonna have a problem I have I found I found out that I would have a fever at the 2000 users and then we want the ability to somehow the localizing the bottlenecks because we know the localized thing is gonna take a long time and then after that we can fix and then that leads to happiness sold so
then we're gonna have a 1st tap here so we talk about monitoring and all the data instrumentation and you can extract data from your application which like traffic we talk about the performance testing and how you can use synthetic testing create the traffic you want to see how the application response and now we're gonna add extract another layer of information from our data to help us to localize so the problem so how so what we want to do is we want leading indicators of performance issues so again we don't want it after the fact you want to figure out these problem beforehand because so you have the time to to fix and to optimize and the lever the performance you want and we have found that the fact if we localize if we are able to pinpoint in this binary applications where the problem resides then we can accelerate the troubleshooting process which is otherwise quite painful and we want actionable be
the so in order to do that we're gonna add something else here so we have our monitoring so what you have in the middle is our monitoring would you have your life and you all have all the monitoring data and you have your data instrumentation and then we already talked about how it appears out really well with performance testing so the 2 go together and now we're adding another layer so we added here some data-mining and machine learning to extract another layer of information from these data and help us to localize good so
this is an exact this is how we do it and this is an example of of our prototype that would be and so for it so you apply a linear ramp a traffic and lot so that you do the synthetic testing at the same time you use the data instrumentation that is usually use for your life traffic but in this case we're gonna use it over your synthetic traffic so it could be on on test environment and then we mix it up altogether but he's denied story Baker for that application that test we use that to and then there is a big time analysis that basically uh makes an attempt at clustering and identify statistically meaningful variations in all of these timing and whether the statistically meaningful evaluations are clustered around a specific component of the application
so this is this is essentially how it works so the 1st you run a test a performance test if your response time is good and you don't have any slowdown band is no problem at all but if you have a slow down right so we go back to the example we had all these rats slow down in arrows then you're left with the problem of figuring out how to fix it so the 1st thing that we're doing is we're removing what we call network and externally facts so we want to see if there is any correlation with data such as network time DNS time SSH time and the other data other kind of external to our stack and if we don't find any correlations with those that those are excluded from the data analysis and then about if we're at so assuming that there is no correlation there then we go for um we look into the dataset and have the data analysis identifies statistically meaningful differences using clustering and longitudinal analysis and identify whether these variations of clusters around a specific sector and then the results are displayed so I think we're ready covered it so
the whole point is out of the thousand and thousand of available metrics we have look at variations in real time and we have a penta clustering banned across as specific what we call sectors that are called opponents In the applications the the
so this is all using aspecific their data analysis techniques so what we use it kind of makes up techniques is not only what they all go under the umbrella of machine learning or unsupervised machine learning or data mining we again it's not on just 1 technique that that when we use a lot of clustering and longitudinal analysis ch so all ready to see somehow Ch the ready to see some real data in a real life example so that so did couple of examples so this is that a typical web application it's a reality cations so it's not a test application 1st we ran some performance tests at with a linear ramp up to a thousand users so this is a thousand concurrent users per 2nd so so that corresponds to usually we say there is a factor of a thousand so it corresponds to a kind of an MIT in the donor about monthly visits that's the type of people that you could expect to have the traffic so and then we are in
some performance test and we see that as we applied linear around the response time deteriorates it's actually 3 times as much at traffic vanities without traffic so this is definitely case that it's worth investigating so then we go over the data instrumentation so that the
beauty of this model is that you could you could apply these methods to pre-match any data instruments station that you have wouldn't you want to use so it's not married to 1 specific and method OR approach and in this case we use a specific as they pass source but again you know you could use anything and the way we look at data i is that there are categorized under sectors so the various components for each sector you have categories and then you have classes in you have matter so you have actually a lot of data that are coming out for each 1 of these sectors and and so these state there is an agent that while the test is running the is an agent that binds these data constantly into our algorithm and the algorithm works are the real time to do this clusterization analysis and so
at the end of the clustering result this is kind of an eyesore but basically you see didn't fight methods that actually have I think it that shows variations with timing at the same time as the test as the response time that's increasing so that correlates well with the performance testing results and with a response with Gant when use of metrics and so
this is this is kind of the end result so as a reminder of what you see on the left are the sectors this sector sa groups the large groups of data you can actually dig down into these data and see at exactly what is the component of this group that created the problem so what we see from here is that for example the 4 although all these tasks was run successfully without arrows and we put a load of a thousand concurrent users at the end but we see that the browser so everything that and what goes under the browser components starts suffering right before 200 users so it starts happening at the very beginning and then he enters the yellows on what we call the keys on a transition zone so that's where it it's kind of our deteriorating but it's not too bad and then he enters a red zone which is way way over the way you were weights expected to be and then the next 1 that's dots is bit the AP stock and that stock is essentially what's happening but would you will be end that starts deteriorating right around 3 100 concurrent users and then entered the red zone later so you could see that even though edit housing users you see a triple response time thing starts deteriorating not a lot sooner and it's also important to understand another very critical data point here is what is the 1st components because sometimes you have chain reaction that if 1 piece lower bounds than the other slow down as well so what is the 1st local born in this part the slowing down and slowing down the system and the specific example these are the browser now the browser again it's a
set of data which is represented here in underneath here you have another hands with the data points so from here you can
actually see big balancing what are exactly that components within the browser that causes these little down so I so this is a so again so the objective here is to identify proactively so this is all the more you have to actually get thousand light users on your platform for identified proactively and under a specific workflow or scenario what is going to happen and what the components of your application are actually the the root cause of the problem so here give you another 1 so this is another application of other categories are the saying just because we look at the same date I don't have the raw data here that you could dig down into that all the methods that actually caused this and here you have an interesting perspective you you still have the browser you have gaps Dr. closely follow but then you have and what we call server and software which goes from green to red so doesn't even entered the T zone there there is almost like a step function where the metrics that go from really well to really bad
so so in summary what we cover today is the day is part of feature number 1 performance is paramount fast there is better how do we tackle that we tackle that developers with the top we started monitoring monitoring that is a good start 1st line of defense are not enough and performance testing complements well with monitoring technique that's still not enough because you what you want is you want some help in localizing the problem so here we have a performance test last data instrumentation last machine learning we have another layer here that we can extract from our data which we have called predictive performance analytics and we got to see it in action in a couple of examples and so thank you I think I can take some questions now that you can find me on Twitter that the polymer at the 3 end they're happy to hear you're questions and the feedback the who was a kind of a man and he is
Weg <Topologie>
Twitter <Softwareplattform>
Vorlesung/Konferenz
Analysis
Softwareentwickler
Einheit <Mathematik>
Endogene Variable
Kartesische Koordinaten
Skalarprodukt
Bitrate
Softwareentwickler
Code
Computeranimation
Umsetzung <Informatik>
Benutzerfreundlichkeit
Web-Applikation
Globale Optimierung
Zahlenbereich
Page, Larry
Umsetzung <Informatik>
Biprodukt
Bitrate
Page, Larry
Computeranimation
Benutzerbeteiligung
Rechter Winkel
Endogene Variable
Ordnungsreduktion
Geschwindigkeit
Streuungsdiagramm
Software
Informationsmodellierung
Quader
Mereologie
Datenmodell
Computersicherheit
Systemverwaltung
Datenmanagement
Web-Designer
Kolmogorov-Komplexität
Softwareentwickler
Computeranimation
Streuungsdiagramm
Softwaretest
Fokalpunkt
Grundsätze ordnungsmäßiger Datenverarbeitung
Datenmodell
Datenmanagement
Streuungsdiagramm
Computeranimation
Softwaretest
Videospiel
Web Site
Informationsmodellierung
Facebook
Typentheorie
Datentyp
Kartesische Koordinaten
Quellcode
Biprodukt
Programmierumgebung
Quick-Sort
Computeranimation
Videospiel
Linienelement
Betafunktion
Linienelement
NP-hartes Problem
Kartesische Koordinaten
Physikalisches System
Analytische Menge
Punktspektrum
Login
Systemplattform
Quick-Sort
Computeranimation
Weg <Topologie>
Rechter Winkel
Typentheorie
Störungstheorie
Hilfesystem
Softwaretest
Softwaretest
Besprechung/Interview
Gerade
Computeranimation
Softwaretest
Umwandlungsenthalpie
Matrizenrechnung
Videospiel
Linienelement
Datenhaltung
Linienelement
Zahlenbereich
Kartesische Koordinaten
Quellcode
Biprodukt
Term
Teilbarkeit
Computeranimation
Variable
Reelle Zahl
Typentheorie
Server
Gamecontroller
Ordnung <Mathematik>
Simulation
Programmierumgebung
Umwandlungsenthalpie
Softwaretest
App <Programm>
Distributionstheorie
Fehlermeldung
Vervollständigung <Mathematik>
Kategorie <Mathematik>
Web-Applikation
Zahlenbereich
Kartesische Koordinaten
Bitrate
Räumliche Anordnung
Computeranimation
Endogene Variable
Transaktionsverwaltung
Mittelwert
Server
Resolvente
Response-Zeit
Parametrische Erregung
Varietät <Mathematik>
Fehlermeldung
Resultante
Umwandlungsenthalpie
Softwaretest
Punkt
Minimierung
Mathematisierung
Kartesische Koordinaten
Plug in
Physikalisches System
Systemplattform
Cloud Computing
Computeranimation
Datenfeld
Menge
Software
Datentyp
Ablöseblase
Messprozess
Störungstheorie
Einflussgröße
Mittelwert
Softwaretest
Umwandlungsenthalpie
Fehlermeldung
Punkt
Zwei
Systemplattform
Physikalisches System
Systemplattform
Computeranimation
Endogene Variable
Softwaretest
Zahlenbereich
Endogene Variable
Server
Response-Zeit
Information
Fehlermeldung
Softwaretest
Prozess <Physik>
Zahlenbereich
Systemaufruf
Kartesische Koordinaten
Zeitzone
Computeranimation
Konstante
Client
Rückkopplung
Loop
Rechter Winkel
Softwareentwickler
Korrelationsfunktion
Softwaretest
Videospiel
Prozess <Physik>
Prozess <Informatik>
Virtuelle Maschine
Prognostik
p-V-Diagramm
Kartesische Koordinaten
Computeranimation
Data Mining
Endogene Variable
Information
Indexberechnung
Algorithmische Lerntheorie
Ordnung <Mathematik>
Resultante
Softwaretest
TVD-Verfahren
Videospiel
Subtraktion
Datennetz
Datenanalyse
Kartesische Koordinaten
Computeranimation
Gruppe <Mathematik>
Direkte numerische Simulation
Zeitrichtung
Identifizierbarkeit
Zusammenhängender Graph
Response-Zeit
Cluster <Rechnernetz>
Programmierumgebung
Korrelationsfunktion
Analysis
Leistungsbewertung
Prototyping
TVD-Verfahren
Punkt
Datenanalyse
Web-Applikation
Maschinelles Lernen
Kartesische Koordinaten
Analysis
Computeranimation
Data Mining
Virtuelle Maschine
Reelle Zahl
Datennetz
Datentyp
Schwellwertverfahren
Algorithmische Lerntheorie
Analysis
Softwaretest
Videospiel
Multifunktion
Linienelement
Linienelement
Teilbarkeit
Endogene Variable
Echtzeitsystem
TVD-Verfahren
Parametersystem
Data Mining
Mittelwert
Softwaretest
Server
Kategorie <Mathematik>
Klasse <Mathematik>
Browser
Quellcode
Cluster-Analyse
Sichtenkonzept
Computeranimation
Endogene Variable
Software
Informationsmodellierung
Algorithmus
Echtzeitsystem
Gamecontroller
Arbeitsplatzcomputer
Zusammenhängender Graph
Skalarprodukt
Response-Zeit
Aggregatzustand
Resultante
TVD-Verfahren
Punkt
Gewicht <Mathematik>
Browser
App <Programm>
Gruppenoperation
Gruppenkeim
Regulärer Graph
Computeranimation
Datenhaltung
Gebundener Zustand
Task
Endogene Variable
Zeitrichtung
Zusammenhängender Graph
Response-Zeit
Softwaretest
Linienelement
Browser
Physikalisches System
Zeitzone
Keller <Informatik>
Skalarprodukt
Verkettung <Informatik>
Last
Mereologie
Zeitzone
Schlüsselverwaltung
Server
Punkt
App <Programm>
Browser
Regulärer Graph
Kartesische Koordinaten
Systemplattform
Computeranimation
Datenhaltung
Software
Perspektive
Gamecontroller
Zusammenhängender Graph
Wurzel <Mathematik>
Umwandlungsenthalpie
Lineares Funktional
Linienelement
Physikalischer Effekt
Kategorie <Mathematik>
Browser
Zeitzone
Sichtenkonzept
Keller <Informatik>
Summengleichung
Objekt <Kategorie>
Datensatz
Software
Menge
Server
Zeitzone
Softwaretest
Rückkopplung
Gruppenoperation
Virtuelle Maschine
Prognostik
Zahlenbereich
Analytische Menge
Computeranimation
Gruppenoperation
Virtuelle Maschine
Softwaretest
Prognoseverfahren
Twitter <Softwareplattform>
Mereologie
Biprodukt
Softwareentwickler
Gerade
Hilfesystem
Metropolitan area network

Metadaten

Formale Metadaten

Titel A New Kind of Analytics
Untertitel Actionable Performance Analysis
Serientitel RailsConf 2015
Teil 79
Anzahl der Teile 94
Autor Moretto, Paola
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/30635
Herausgeber Confreaks, LLC
Erscheinungsjahr 2015
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Applications today are spidery and include thousands of possible optimization points. No matter how deep performance testing data are, developers are still at a loss when asked to derive meaningful and actionable data that pinpoint to bottlenecks in the application. You know things are slow, but you are left with the challenge of figuring out where to optimize. This presentation describes a new kind of analytics, called performance analytics, that provide tangible ways to root cause performance problems in today’s applications and clearly identify where and what to optimize.

Ähnliche Filme

Loading...