Bestand wählen

What’s my App *Really* doing in production?

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Erkannte Entitäten
0 my so and and I OK well welcome everyone and thanks for coming on you know this is the end of a really long day of the last day of the conference role tired and so we do have to save your
energy for errands keynotes so felt always lot of plants the message but I'm glad you're here we do have some other good things so important things to talk about so
also the start of who is this person appear my name is Daniel I've been the Ruby developer for about well this year's a so other languages for longer than that I spent most of that time in early stage startups are a really small companies are currently well I work at a different kind of company on you might like a company such a small company a so it's a bit careful about its messaging so some quick notes I'm not speaking as the representative but this is my talk and my user my own not my employers but that said my
employer does own my code and for whatever reason that actually includes examples on slides so I should mention that so the code samples talk are copyright Google and licensed under Apache 2 . 0 so that's about all the way from I have to
say that I was really impressed by David's David's opening keynote on so I liked what he was saying about the importance of belief systems of those underlying values that search at Ankara sent an anchor our our decision making the importance of recognizing what those values all are and how they define as a so you yeah I I really appreciated his message and 1 thing that made me think about what was what the values that have rubbed off on me on through all the various small companies and not so small companies I worked for when the common themes I think of across all these organizations of big or small especially the good ones that has been the the importance of
measurements of data I remember my my 1st real start up was back in the beginning of 2007 our so that we launched our 1st real sites and and so 2007 we're on rails 1 . 0 so the good old days and I remember a few months I have to be launched our CEO was on national TV alive on national TV for something totally unrelated to our company on so he was he was on this on this interview and so he happens to accidentally drop the name of our sights on on on national TV so sorely unplanned intend to do it but it was on national TV and so the show was actually pretty popular at the time so as you might guess within a few minutes I get a page of from our monitoring system and yes so our site started to get really sluggish and took a look at our logs and we're seeing a multiple orders of magnitude traffic spike on the so this was a
and this is of course the municipal the good thing and not so good right but it was in 2007 and so we didn't have these these nice auto-scaling cloud services that we do now we're on physical servers in the coal at the time and so on I spent the next couple of hours logged into those servers I was trying everything I could think of that increase our capacity so I was we care load balancers on and I was you know disabling features I was spending a more Mongrels per machine of the we were running under mongrel 1 . 0 the time was that of state of the art at the time the nothing really worked so we were just a religious extremely lagging and that I think it is eventually after a few hours traffic pedestal down a bit stands in the fight recovered but in the meantime we were flailing around in the dark I not later when we did our
postmortem investigation of we we determines of among other things
that are Mongrels were actually memory memory-constrained on so we had so we had these in the re caches of within each model process and as traffic went up the sketches of gets squeezed start thrashing also known we just see it it would just start falling over answer my attempt to fix the problem by spinning more Mongrels I was just squeezing memory more ends up making the problem even worse on the worst
part of this was I couldn't even tell that this is happening and I we had a caching strategy or we thought it would work but that I didn't really have good data on its actual behavior in production and and as a result we were really able to respond well when we ran into a production issue so it's crucial to have real data on what your app is actually doing in production when it's underproduction load this is true whether you're just starting out as we were or whether you're all 1 of the largest and most successful companies and have the largest products some of the largest products in the industry you have to know what's going on so at that the small company that I work for now we we measure everything and if you're a if you're just starting off if you just want your 1st app you also need to measure everything so how do you do that's well to begin with the obvious that there are a number of excellent services out there obviously are that the system monitoring application monitoring on this screenshot here shows stack driver which of full disclosure is acting developed by my current small employer of but there are a number of other companies and there are a number of other products that are very very good and he spent some time down the hall at the exhibits the problem at some of them are performance monitoring services at the error monitoring on of who who here some kind of monitoring service and production OK that's that's good this seems like a majority of the other services generally do a really good job of collecting data and ends up providing visualization and analysis tools but that said that there will be times when you do need to customize so we need to measure something that a general-purpose tool of won't give you out of the box so this afternoon what we're gonna do is we're gonna take a peek under the hood at some of the techniques that so these monitoring of services use that instrument your application and you'll see how you can use some of these techniques to perform your own monitoring so or you can customize the existing monitoring of 4 of the services that you're using to fit your applications needs this so here's will cover of will learn techniques for instrument in your app I will look at how the gathered data without disrupting you're running apps behavior of production the and then finally will not discuss what sort of things you should be measuring so this talk about instrumentation
the this is an instrument but it's an old electoral retina graph machine it's used to diagnose a variety of of retinal problems of modern versions of this instruments a a lot smaller maybe a little bit less scary but old or modern of these machines do actually have 1 thing in common that they they have electrodes and this electrodes need to be in direct contact your eyes on in Mars to measure things so generally patient given an esthetic eye drops off so little bit scary on but necessary and similarly while when you're collecting data from running application on measurements and you need to be in direct contact with the code that's being and that that's involved and that's the job of instrumentation API the it it's to to collect some data from running application just it gives you the ability to inject of measuring code at key points in your application to put that measurement code are in direct contact with the code that's being run well that's what I
can use instrumentation API called active support notifications and see how the CPI works was take a look at an example so remember
my including cash on how would you know if your cash is working the way that you expect in my case our caches
river running out of space and when that happens you probably see a lot of cache misses and indeed your cache hit rate is a good indicator of the health of your cash in general so let's measure and we'll do this by using notifications to count the cache hits in the cache misses so here's what that would look like are whenever rails reads from a cash what it calls this method actor support notifications instruments this call takes a measurement of of the cash encode it notes that the cash was was red I records how long that took and records other information such as the the cast you that was red and whether it was resulted in a cache-hit or a cache miss but it gives all that measurement data and name in this case cash read . back to support the rails that she already doing all this for us we save instruments on its cash code so in your app of which you can do is you can subscribe to this measurement you do this by calling of actors for notification subscribe to us you give it the name of the measurements are that you're interested in and you give the block and whenever that measurements taken notifications will call your subscriber block and give you a chance to do something with that measurement data no I won't go into all the details of the API that's something that you can look up on your own and in this example and all we're doing is we're taking whether they got a cache-hit this right yeah in this example already doing is thick taking with the cashier missed I re-logging at so now we have a lot that might look something like this after what afterwards we can run simple tools like rap word count on our log and analyze that data to get useful statistics no rails actually already instruments a number of things for you but in addition to cash read us so from an image measurements all you have to do subscribe to them but you can also call instrument yourself by an instrument to run code and it is that particularly important when you're writing here on rails plug-ins on it's a good idea to instrument your plugin also the applications that use your code can measure its activity and formants alright so so far were receiving notifications of all cash it's and cache misses all re-logging them so we have an overall measurements of the cache hit rate this is already interesting data and as you can see it's very easy to get just a couple lines of code but to make it more useful we sometimes need to collect a bit of context so let's take a look
at another old medical instruments that this is a vintage X-ray machine Julie from a from about 1900 so by that time x-ray machines were a little more than a tube of radioactive material that's the doctor was cut position over the patients on so this is obviously many the early experimenters with the X-ray imagery weren't really aware of some of the hazards of radiation exposure on there so there were some illnesses and some deaths in of both patients as well as the doctors and researchers are around this time now these of course of when the x-ray were very careful and we're very specific about targeting exactly the that the part of the body that we need to measure this is something which also wanna do when you're instrument in your app on most apps have a
number of controllers and actions action was just talking to someone a few days ago is a company had monorail with around 400 controllers of most of us don't have acted that large but are still you often more focused on your measurements of a little more than a than measuring that entire application of so 1 particular controller or maybe even 1 particular actions so what if I wanted to measure cache hit rate for just 1 particular action that was interesting so the 1st option that might come to mind
is OK with list for notifications on at the start of the action and turn them off at the end of the trouble is notifications are global the they apply to all threads at once and so are there includes a thread that might be running other requests on a lot of us are probably on multithreaded so web service at this point and so this morning work in those cases so what we do so here's a technique
that you can use but let's start with the existing cash subscriber so right now it's logging and every cash read the regardless of which action is being executed now we can determine the action of by subscribing to a different measurement so in this case the stock processing events this is the measurement that's taken at the start of processing request by action controller it captures various information such as which controller and which action is going to be executed of so we can determine here or whether to take a cash measurement now we need to communicate that information to our cash read blog right and we need to do that on a per-thread basis so we can use a global variable was so for that purpose
act to support provides a per-thread module at tributes so you might see module tributes l like a matter the reader in matter accessor of normal model that serve basically just global variables and their attached to a model but there's also the service version that can have a different value per thread it's actually just a convenience wrap around the thread-local variable the familiar with that from early on but using this pattern we can now communicate between our subscriber blocks on a per request basis on obviously there so some caveats these are still low and even if the threads scope so use them with care and the so for example number servers I think was 1 of them on reuse threads across multiple requests also make sure that this kind of data actually cleaned up or reset between requests it's not a perfect solution but it's good enough I think for this purpose so now we have a technique
for measuring hashing data and for doing so for a particular action it was taken a step further is another interesting
looking instruments this one's from about 1960 on it's hard to see here in this photo but the subject is wearing contact lenses and they have miniature lamps are connected to them a was actually able to work to capture eye movements and I reflexes but at the same time of the whole contraction moves and the motion causes visual allusions us so the device is actually gathering of multiple different sources of data of combining I measurements and machine motion is using that to study some of the mechanics of the visual perception combining information from multiple sources correlating that information something that you need to do quite often when you're instrumenting your application of for example
it's helpful to know your cache hit rates but it would also be good to know how much cash it actually buys us on is the request latency are actually any different so if you if your cash it from so again
here's the code that we're just looking at how were determining which action is running and then measuring whether the cache hit cache miss and we log that information we can then get the request latency and in a number of different ways of in this case so we can subscribe to yet another events of the process action events on this measurement is taken at the end of an action by action control of so it can provide information about what happened with the action such as feature to the results or the latency I so in this case in this example we if I added the controller observed that it's subscriber to of the processor action events and were logging other request latency so we have to log lines now like in Kashipur miss logging the latency make this more useful we should combine these 2 beasts of data cache at a cache miss and latency and load them together and once again we do that by using a per-thread attributes to communicate between subscribers so now those 2 pieces of data are lumped together and when we run this introduction how we might get log data that looks something like this on so you can see there's kind of a clear latency when when we get a cash at all but it looks like our cache hit rates might be a little bit lower than we would hope so now we have actually useful information that we can apply towards improving our cash behavior or as my P friends of my small company like to say we have actionable data the so we've gone through an example using notifications so there are several other instrumentation API is that might be useful of every 2 1 of the
simplest ways to instrument and app is to use controller filters so here's a simple around filter that is measures the latency of the action of so this is the easiest way to get a really simple information about our request as a whole I you can also write erect middleware this is useful if you want to measure the behavioral latency of other middleware because you can insert your middleware any place in the ware stack all you can also use this to install instrumentation code of in other frameworks us Sinatra adrenal an army of any non Rails framework that might not use active support are finally this trace points I think of trace point as kind the sledgehammer of all instrumentation API is as it is a bit different is not part of a framework not part of a web framework car it's actually part of the review virtual machine on it works similarly uh it so you provide code so that gets run when certain events happen in this case at the language level so events like an exception was thrown or a method was called a return from method or even move to the next line in the source code of so it's so it's a very powerful API that specialized on from the more commonly used for debuggers than from monitoring I my small company we actually use a pretty extensively to build a cloud-based debugger product because it lets us instrument down at the source code level but it's probably not something you use that often but for monitoring so it is extremely powerful our however in that serve brings up an important issue so that another
image this devices from 1940 I actually measures waves i think i in this case it was measuring so bring for someone who had gone some PTSD due to wartime of wartime experiences on but when we look at the images like this and so some of them were a little bit scary right we wonder if these machines are safe wonder if if they can observe my brain program my brain on so this is an important question but when instrumenting in production it's critical that we can take measurements are in production against real traffic but it's also critical that we don't change the behavior of our ap in the process will bring programmer apt because we're taking measurements safety isn't is
incredibly important with instrumenting so we'll talk about that a little bit of 1 major components of of safety is that of keeping the latency affects to a minimum so here's some tips for for going about that of 1st of all as we said we sort of isolates and spotlights the interesting use cases but oftentimes they don't need data from the entire act but maybe just a few particular controllers of interest a few particular actions so isolate those so I encourage you to experiment with instant instrumenting new things and gathering new pieces of data are however always go circle back and re-evaluate if it turns out that some measurements is not really giving an interesting information after all Adobe afraid to delete it on I tend to treat instrumentation myself like tests on many of them should live on indefinitely because of their their monitoring critical systems are but there some that's really are only useful temporarily maybe because they're part of a one-time investigation that you did our maybe because you put them in and it turns out they don't they're not actually is useful as you thought they would be but you leave them around too long build a slow you down just like you test of might of my just take longer than you really want us a practice making good judgment calls but don't be afraid to spin that new instrumentation but don't be afraid to delete them as well sample your data if you can the more often you can get away with measuring only 1 in 100 instances are 1 in the thousands points can be particularly dangerous but because many of its events conspire extremely often remember of moves to the next line in the source code of so use only when you have no other choice but if you do need to use tracepoint here's a proto but it's global by default just like we saw with the active support notifications it applies to all threads at what's however there is an alternative tracepoint API that you can use of that's lets you instrument just a single thread of the time captures only available at as far as I know the last time I checked off as a C API you can call directly from Ruby so it's harder to use but it is available is worth investigating if you want to use tracepoints temporarily for specific requests finally of course pay attention to how you recording your measurements if you just logging to the file system that usually pretty fast but as your app so you might wonder since start sending data to use are remote analytics service that you might want to start sending data to your application monitoring service that you're using and so when you do that make sure you don't block on those calls usually be the API of gems for the for the monitoring services will have a non-blocking version of those of of the clients that you can use but if you have to use just strange to be calls be careful of it meant posts because it is blocking it waits for the for the the the response to come back on you don't wanna do that but so use an asynchronous clients so you can scan of a background thread us in your data in batches of that's allowed by your EPI in general just be very careful about how your reporting your measurements on another element of safety is of winning side effects of those changes to your apps behavior so on 1 hand it's kind of obvious no don't modify your application states are in in your subscribers of but side effects can take many forms of a database query and addition to potentially adding latency but she should also consider the side effects might not phenomenon we change the state of your application of but it does change its behavior because you're invoking party your system that you otherwise wouldn't so be careful about those sorts of things and in particular are calling methods on Active Record models some of them might initiate additional database calls the 1 careful about so we talked a lot about how to measure so we have a few more minutes so with spend than those on what to measure I thank you
we this is really the ECG machine electrocardiograph machine I think some people for say EKG so this is from 1901 was actually the 1st of machine that was sensitive enough of a practical for medical use of I think scientists heads experimented with some of this a similar technology raise the late 17 hundreds of but it wasn't until about 1901 so that it became practical of again it's that measuring electrical activity of the electrodes here are actually and and these middle basins on next to the patient so you would have all 1 foot in both hands immersed in salt solutions are in these basins ECG is the of course still widely used today I'm not just to diagnose heart conditions but also to monitor patients who are critically ill or undergoing general anesthesia and this is of course because the heart is an indicator right how something is going wrong on the often shows up in the hearts behavior so what are the
indicators for your application 1 of those things that can tell us how when things are not healthy or when something unexpected is happening or something is gone critically wrong
so here some of the things that you can measure 1st results responses that you're sending back from your request make sure they're in line with what you expect are 1 indicator that sometimes really important is how big are your responses are the sizes so what you expect the also pay special attention to air responses and obviously if you 500 then there there's there's problems going on but but don't ignore your 400 levels of 400 error levels as well so when I was younger and less experienced so I tend to make the mistake of ignoring my 404 error rates as I was thinking eyes is just clients there's not servers is not my problem but you might not necessarily want to get page uh if you see a 404 spike of but you do want to know that is going on because of what it could mean that there's a broken link someplace or it could mean someone's trying to hack your site I is indicator that something is going on you should least know about it on your final responses are important indicators but not the only ones sometimes you have intermediate results are there can be useful another thing is pay attention to rendering the rails templates are really not the fastest thing in the world and has come to realize that over you years of using them are rendering can have a significant impact on on your apps latency us a measure it not too long ago I was actually working on an API that I occasionally adjust occasionally ran really slowly times as we dug into this but it turned out that was actually the adjacency realization that was taking up the bulk of the latency some weird of some weird interaction between all when my data looks like and some are issues with my Jason library and so you have something like that is going on with you be able to tell of course major interfaces with external systems and that means your database external API is an internal API is Michael's story in inward services API is all your caches file system spelled out of the box of monitoring products so will will capture many of the things for you obviously but not everything they typically CPO focus on the performance of these external dependencies I like how long piece queries tend to take but was often missing is your usage is your application using the external system in the way that you expect are you hitting your cash as often as you think you should be and finally errors and exceptions are important don't throw any errors away don't throw any errors way that includes expected errors includes errors that's how you handle internally in your system and you don't bubble up to your users they can still be indicators and important example that's often gets overlooked is retries the often when you call an external API how you infinite retry logic right because network and the pikey no various things need flaky but if you re trie don't throw that information away instruments here retry code make sure that that information shows up on your monitoring dashboards on your retry rate is an indicator you get a retry spike something's going on you wanna know about so we covered a lot here and really we just scratched the surface so on a number of things but I hope a communicated the importance of measuring again something that's my small company right now we just the just do out all the time out of habit on getting started is really simple but it's ascribed to enact a support notifications we saw just a few lines of code measure something in logic or many the commercial products that are out there have free trials go check 1 out however you do it start measuring the if nothing else the it'll be interesting to look at but also save you a big headache in the future so that's all I have there so if the
review under the research
Perfekte Gruppe
Elektronischer Programmführer
App <Programm>
Kontextbezogenes System
Motion Capturing
Dienst <Informatik>
Einheit <Mathematik>
Rechter Winkel
Lesen <Datenverarbeitung>
Analytische Menge
Mathematische Logik
Virtuelle Maschine
Reelle Zahl
Produkt <Mathematik>
Endogene Variable
Äußere Algebra eines Moduls
Installation <Informatik>
Attributierte Grammatik
Plug in
Binder <Informatik>
Wort <Informatik>
Prozess <Physik>
Formale Sprache
Kartesische Koordinaten
Element <Mathematik>
Web Services
Prozess <Informatik>
Filter <Stochastik>
Konfiguration <Informatik>
Strategisches Spiel
Varietät <Mathematik>
Explosion <Stochastik>
Web Site
Interaktives Fernsehen
Framework <Informatik>
Digitale Photographie
Zusammenhängender Graph
Bildgebendes Verfahren
Elektronischer Datenaustausch
Hasard <Digitaltechnik>
Physikalisches System
Cloud Computing
Endogene Variable
Design by Contract
Formale Sprache
Innerer Punkt


Formale Metadaten

Titel What’s my App *Really* doing in production?
Serientitel RailsConf 2017
Teil 51
Anzahl der Teile 86
Autor Azuma, Daniel
Lizenz CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
DOI 10.5446/31301
Herausgeber Confreaks, LLC
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract When your Rails app begins serving public traffic, your users will make it behave in mysterious ways and find code paths you never knew existed. Understanding, measuring, and troubleshooting its behavior in production is a tough but crucial part of running a successful Rails app. In this talk, you’ll learn how to instrument, debug, and profile your app, using the capabilities of the Rails framework and the Ruby VM. You'll also study techniques for safely instrumenting a live running system, keeping latency to a minimum and avoiding side effects.

Ähnliche Filme