## Statistics 101 for System Administrators

Video in TIB AV-Portal: Statistics 101 for System Administrators

 Title Statistics 101 for System Administrators Title of Series EuroPython 2014 Part Number 101 Number of Parts 120 Author License CC Attribution 3.0 Unported:You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. Identifiers 10.5446/20026 (DOI) Publisher Release Date 2014 Language English Production Place Berlin

 Subject Area Computer Science Abstract Roberto Polli - Statistics 101 for System Administrators Python allows every sysadmin to run (and learn) basic statistics on system data, replacing sed, awk, bc and gnuplot with an unique, reusable and interactive framework. The talk is a case study where python allowed us to highlight some network performance points in minutes using itertools, scipy and matplotlib. The presentation includes code snippets and a brief plot discussion. ----- #Statistics 101 for System Administrators ## Agenda * A latency issue * Data distribution * 30 seconds correlation with pearsonr * Combinating data * Plotting and the power of color ## An use case - Network latency issues - Correlate latency with other events Keywords EuroPython Conference EP 2014 EuroPython 2014
Robots Berlin (carriage) Computer animation administrations orders administrations inner bits limitations statistics systems
Correlation Computer animation correlations modularity elements
message-based statistics means standards Computer animation Software communication barrel number conditions
Statechart time loss Spitze <Mathematik> solid inverted fields number second means Void's series structure descriptions standards distribution maximal indicators lines applications statistics timestamps means message-based Computer animation boom
histograms distribution time plots functions bin category period events Computer animation topology different topology libraries
distribution relation distribution Spitze <Mathematik> bin number means Correlation events period Computer animation different Normierte Räume series descriptions
lines time correlations lines parity fields product means Formula Computer animation Formula different correlation coefficient series box extent
degree relation Computer animation Correlation plots lines Handlungen scattering product God
Correlation Computer animation case cellular correlation coefficient functions part systems record
Ranges lines statistics correlations coefficients Correlation Computer animation Correlation correlation coefficient functions Indikatoren system negative position
Computer animation suite gaps boom combination analysis sets fields model table
message-based Correlation Computer animation Super combination Results
point relation Super NET statistical correlations print statistical correlations total Correlation Computer animation correlation coefficient Correlation Blog Relation game
plots relation information NET plots graphs formating combination share statistical correlations combination scattering Handlungen powerful Computer animation PIT correlation coefficient Correlation series Results
boss relation NET analysis correlations indicators lines second CPUs Computer animation rates case buffer
point distribution Graph Computer animation time buffer Right cycle part
Blocks cycle Ranges scattering Coloured Handlungen Computer animation case sets Dimension Right source code reading classes
Computer animation plots time Hidden Markov model speech correlations indicators second systems
point relation plots cellular correlations lines clients part scattering voting processes Correlation Computer animation boom rank family systems
Computer animation corresponds correlations part number
Computer animation plots Blog correlations clients part second classes number
Computer animation lattice plots time operations moment lines part report Results systems
Computer animation configuration time combination Right systems
Peer-to-peer screens Computer animation Super ring combination
time combination correlations number computational different root level information input God unit CAP relation Slides sin Blocks coma argument bin attitudes open subsets CAN-bus errors Computer animation real vector Software Normierte Räume Synchronous WPAN
Correlation Computer animation different indicators lines localization systems number
welcome to the next session you have no way of saying to the buckets
and its rebuttal Polly talking about the 1 a lot of systems administration and social focusing on Python that not that ABK order here and so on yeah I enjoy a bit hi how everybody that
I have about the world then i walking barbaric which is a proud sponsor of this talk and of my hope that the today you will see how to use American elements of cities seeking it's not a sadistic courses that would buy them there before starting new also I would like to apologize for my English I hope they're English-speaking people and can forgive the good thing were London see issue
that affects the 0 1 of of our our customers and Ali in of the few minutes we were able to understand what was happening and what was not happening that we understood world all of those things using correlation and combining data than we provide a lot of nice so they allow them or customer to to say that all of them were were happening that wasn't useful everything there was done with sci-fi and not to then there is problem
the customer problems was because it is all the network latency issues that we had looked phrases with a message size is the number of years of the communication and the number of retransmission and and the arts in his network that the customer asking us do we need to scale there are those that in the issues related to there some of the condition
and well we found evidence for using Python but how would you because of them provides basic statistics like the meaning that we will be known to with the barrel on the except and using standard deviation which
is actually there an indicator of of all the is about it is a good uh descriptor of our data series the mean is good descended reaction is low there is the mean is not a good indicator that the standard the reaction is uh that variable contains an egg structure of our data there is a time stamp alignments indicator in seconds and the number of years and there are other conjugated just like the message solid number of retransmissions the you can see that didn't think
that a Bayes description of all of those the fields is really just 1 line because Python provides a maximum minimum indicator and mean and standard deviation I believe in their side of now the distribution the 2nd thing that you do is to create a distribution that is uh on the axis you have got some times loss for for example this 1 is up the being wrongfully distribution
it stays we have tree uh being written uh 158 and 159 of the segments for beings return between 159 and 160 and so on they differ faster way to create a distribution with by is using uh much which is 0 and that is a plotting
library when we plot the histogram for example and histogram of love and sees being round-trip time is actually a lot and that of like the in the USA and we have go to a 1 output this is the plot yeah the output is a triple the interests and values in this time of differences that is how many uh beings returns trees forever and C 4 is differently C a 2 is
a fragmented and that means the beans are just like yet been so all buckets and axis the 158 to 150 and 9 and so on to get a distribution just uses z which tied together
through are examples no correlation we have good description of all data that but nowhere else onto the other series related Is there a relation between the number of retries and the redundancy in or whatever that year we know for sure enough you that that's just like the difference from on item in the series and the meaning of this of this and that was a statistician
uncertain with this formula the is field comprehensively if you will but
if you're a high school time on the problem but if you just reminded back to your eyes school it's actually quite easy and just check you've never use of the elites and that it last uh move together on the same lines the used for example a box the the extent of it you know move uh together they start those differences start with negative values so the product is positive and then move on and if they reach the meaning together they would
be 0 together and if they move together on the product we think the positive so and if you the try yeah I buy them console uh with some datasets you actually find that this for money is quite reasonable and so on role defines the values move together on the same lines but anyway you must belong these are the values of the scatter
plot that we don't Busan value on the 1st line we can see that the other 1 relation value and then there was the God began to be unrelated don't that value goes to 0 and then that is 1st to be again and again and negative value when the relationship is not directly but uh in so that when 1 the dataset grows and the other degrees the by even on the
cases where we have as 0 correlation made but actually we could find that there those laid out really or address some part of the so you always should the probability of the given by the side of a provides a
correlation functions this function return to read the 1st 1 is the correlation coefficient that we just described is values obviously minus 1 in a before when 1 that grows in the other decrees and a blast while when both that the rows together then there is 1 of the value the probability and here the is it's definition it's quite freaky but let's say that these values tell cells with such kind of that of the after that produced by an correlated system some if the probabilities the
system on correlated if the probability Israel there those values are unlikely produced by anchoring the so these you know if you have golden by the shall you again just trying the and check and
that experience uh what you can get it the and the values are just like a straight line and the other 1 operation as you're probably that that is it's unlikely that random data can produce a straight line while getting uh to random values to random datasets where we can see that the correlation is going to be I don't care if it's positive or negative but it is a substantive value is low but the probability that those that are really it is quite i is about 70 besides the research now
combination we don't to all original problem we have got various datasets we want to be a understand
reach of then and if there are really there when we should the this is what we should do such kind of analysis the other tools model is yeah Our goal I got a good place to check their combinations were quite an intuitive concept there they just find all every possibilities in which again meets a set of items without repetition uh we use it to combine the whole table so we we combined the air land and sea uh with yeah arts of these
that parents with the message size and so on in this is how we get Our results seem very we
use a combination do not feature for all possible correlation probability values between people and the other
uh if the correlation is of the we print something or you what the probability is that no demonstration that game will bring those values the the is just a starting point but we are concentrating on that our customers wanted to know something the quickly uh really this started with
concentrating on what could be more likely that our relation with the 11 so the the the relation between lot and parents is higher or not there is is clear I think if requested by the way it is by the well remember that like before now the linear correlation is not everything uh we should use our and that actually must not leave that allows us to save the blogs so what do you do when we do where
are you saying all the possible combinations all our data in our data look at the on the plots all the possible information so this a relation indicating the probability indicates that better series remix and then
that could produce total 40 graphs by we can just watch we that when you all whatever your the major result the share FIL and will and the power the you can easily reach out you the dead lock tells you something the this is an example plot we
the buffer size and the CPU the case the reason I relations indicator that and I 0 probability and he does that are probably
related there we can see that when the city way Israel the buffer is constant but when there is idea the boss for a increased so there is actually a relation is that relation is a straight line all did the relation is just like the morning from weights at a constant rate on the buffer size and then when they receive you wait starts to be for 3 or 4 seconds at 5 30 40 % then then the buffer starts to to roll well this is a Florida sub of analysis but for example if I if you have a search and something that had these kind of
talk something you should know is a good starting point for a distribution then the
what level like in the in previous graph was called 1st and then data the and here we have a lot of time so we actually I don't know if the right side is the 1 is the starting point in the inside is the and applying for example because after this if you walk me flashing the for example or is it the left part is the starting point and the right to the end pollen and they stop myself is it together that that was a uh a buffer was working using a course I can understand that what what's happening there and what they did idea to cycle made and and the durable but continues through
the available sold colors next class the next the varied on the GE and again that it's a simple case is just I just raised morning afternoon and night this morning with the rest of afternoon with reading and neither the blue I I dont you had just used those compression text tools datasets in previous chunks and then I the
1st one in the morning there in red using the Lebanese that we drank I could even I that Peter son and probability of the data on the single charge and that he has always said that I don't have the block and so on but you're
going to during the I was fast so this is 1 simple plot there we go the and see but exact cities and throughput on the HMM axes the decoder and denotes the time in the day the we can see clearly that is uh if we look at hi you're not in the above 3 seconds because uh it's not and neither true or awesome OK
the the high latency next with lower so moreover a reduced not really you know and in an indicator of the ability of the system the of the speech of the system because we can see that if we followed was just of the 1st time slot between 0 and
1 said we can see that there is actually and the influence of true vote on land and sea but each of these influences and after 1 and a half a 2nd the and don't don't line it could be could be so true poet be uh of the system that we can't see it more although they're all the plot all the clients of the red points there we go uh and a high true vote buying the same part of the day so if
for example we check that does the job of rank them all in there is a problem there we those kind of data that we the brother of cell was then points so as to have precise part of the data have to check another 1 a correlation is another scatter plot read size of the patient and enduring tries we can see that there is no relation
the light and the problem was not related to the size on the packets is we have seen moreover the the highest size corresponds to
the I all lower number of retries so wondered target sizes I don't know problems but the problems of retries are concentrated the the reason that part of this
so we can check that that in that part of the day uh could have been some problem on the number for example for some of the problem originates from are part of our clients and all those plots were produced in 20 seconds class
so once you have the data past those in the books that you know for 2 blogs and then tell you almost everything so yet
again lattice it wasn't related to but its size the central air through an operator its size we haven't underscored system true book using those straight line a kept in the plots all these years during means the other time was just passing along so it was the
hardest part of the of the problem the report you something achieves you we're going to by the user moment is it just lost no plot and then yes continue to collect results
OK kind of earnings next option what you enjoy them I don't know if you have questions about like that of some time for questions questions got comes microphones the working talking this is what I didn't understand why they're using combination can you give like 3 examples of what pairs of combinations you were trying to what you have to run the the gain as it did not
there system along the 1st thing to do was to combine all right does
that ring using a combination of will return later in the year ladies and parents peers and parents there may be the screen is more no days
so so we can use to OK let's imagine instead of ABC of God ritualized latency time hoops but can so the combination let's meet times every possible then you everybody will better serve with another 1 there may be syndrome looking major a is lot b is through both CD is referred have a look at all the possible combinations of doing day that then I can't it was only the relation values or blocks on every possible has been and
what there is you when he's the US was the number of the computer in the network there another question yeah couple quick question I sit that with these fermentations you got a lot of different combinations did you have any problem with this you use correlations with high significance levels could you along the vector false correlations not really politically of actually
during no close correlation because there it's just 1 number and the differences in the question the Busan indicator is just applying the performance that tells you you've goes down the house was plotting this that you get something like a straight line for these reasons I always say that you should not because 1 of the you should have done and learn how the system works designed economy I need for the idea on the systems so I mean I needed something and in the very start to say with that to the rewards of local stands there still quite good then they start by to find the thank you the center much that of