Deep Learning your Broadband Network @HOME

Video thumbnail (Frame 0) Video thumbnail (Frame 2860) Video thumbnail (Frame 3879) Video thumbnail (Frame 7712) Video thumbnail (Frame 8633) Video thumbnail (Frame 10828) Video thumbnail (Frame 12816) Video thumbnail (Frame 17901) Video thumbnail (Frame 19657) Video thumbnail (Frame 23239) Video thumbnail (Frame 24515) Video thumbnail (Frame 26083) Video thumbnail (Frame 27414) Video thumbnail (Frame 28259) Video thumbnail (Frame 30677) Video thumbnail (Frame 31670) Video thumbnail (Frame 39961) Video thumbnail (Frame 48252) Video thumbnail (Frame 49548) Video thumbnail (Frame 52631) Video thumbnail (Frame 54604) Video thumbnail (Frame 57302) Video thumbnail (Frame 58593) Video thumbnail (Frame 61511) Video thumbnail (Frame 62412) Video thumbnail (Frame 63236)
Video in TIB AV-Portal: Deep Learning your Broadband Network @HOME

Formal Metadata

Deep Learning your Broadband Network @HOME
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Deep Learning your Broadband Network @HOME [EuroPython 2017 - Talk - 2017-07-14 - Anfiteatro 1] [Rimini, Italy] Most of us have broadband internet services at home. Sometimes it does not work well, and we visit speed test page and check internet speed for ourselves or call cable company to report the service failure. As a Python programmer, have you ever tried to automate the internet speed test on a regular basis? Have you ever thought about logging the data and analyzing the time series ? In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period. Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory). The goal is to provide basic idea how to run speed test and collect metrics by automated script in Python. Also, I will provide high level concept of the methodologies for analyzing time series data. Also, I would like to motivate Python people to try this at home. This session is designed to be accessible to everyone, including anyone with no expertise in mathematics, computer science. Understandings of basic concepts of machine learning and some Python tools bringing such concepts into practice might be helpful, but not necessary for the audience
Point (geometry) Email Software engineering Intel Service (economics) Multiplication sign Time series Mathematical analysis Mass Computer font Code Software bug Internetworking Software Autocorrelation Moving average Nichtlineares Gleichungssystem Endliche Modelltheorie Machine learning Service (economics) Software bug Computer font Concurrency (computer science) Server (computing) Surface Projective plane Shared memory Computer network Computer programming System call Process modeling Process (computing) Software Prediction Series (mathematics) Internet service provider Universal product code Helmholtz decomposition Software testing Figurate number Multivariate Analyse Physical system
Slide rule State observer Greatest element State of matter Multiplication sign Outlier Characteristic polynomial Set (mathematics) Time series Drop (liquid) Shape (magazine) Regular graph Shift operator Software bug Medical imaging Frequency Mathematics Type theory Different (Kate Ryan album) Energy level Software testing Endliche Modelltheorie Software bug Graph (mathematics) Outlier State of matter Computer network Basis <Mathematik> Connected space Type theory Software Personal digital assistant Series (mathematics) Smartphone Pattern language Energy level Metric system Resultant
State observer Computer font Software bug Key (cryptography) Forcing (mathematics) Login Mathematical analysis Process modeling Response time (technology) Roundness (object) Series (mathematics) Autocorrelation Moving average Software testing Software testing Multivariate Analyse Resultant
Frame problem State of matter Multiplication sign Insertion loss Function (mathematics) Mereology 2 (number) Wave packet Different (Kate Ryan album) Software testing Punched card Social class Area Parsing Building Prisoner's dilemma Electronic mailing list Login Symbol table Frame problem Subject indexing Personal digital assistant Function (mathematics) Right angle Iteration Metric system
State observer Service (economics) Code State of matter Plotter Multiplication sign Time series Price index Streaming media Mereology Plot (narrative) Frequency Component-based software engineering Thermal fluctuations Personal digital assistant Pattern language Cuboid Software testing Endliche Modelltheorie Error message Graph (mathematics) Validity (statistics) Channel capacity Special unitary group Median Line (geometry) Cartesian coordinate system Oscillation Frame problem Subject indexing Number Pointer (computer programming) Frequency Visualization (computer graphics) Personal digital assistant Series (mathematics) Pattern language Object (grammar) Resultant
Logical constant Addition Noise (electronics) Random number Asynchronous Transfer Mode Multiplication Divisor Key (cryptography) Direction (geometry) Physical law Time series Twitter Component-based software engineering Data model Arithmetic mean Component-based software engineering Frequency Series (mathematics) Pattern language Noise Series (mathematics) Endliche Modelltheorie Bounded variation Bounded variation
State observer Trail Functional (mathematics) Code Multiplication sign Set (mathematics) Time series Mereology Plot (narrative) Twitter Wave packet Element (mathematics) Data model Component-based software engineering Machine learning Helmholtz decomposition Cross-validation (statistics) Moving average Software testing Endliche Modelltheorie Traffic reporting Predictability Electronic mailing list Mortality rate Process (computing) Helmholtz decomposition Pattern language Iteration Moving average Curve fitting
Point (geometry) Standard deviation State observer Multiplication sign Outlier Range (statistics) Sheaf (mathematics) Median Codebuch Mereology Residual (numerical analysis) Different (Kate Ryan album) Software testing Endliche Modelltheorie Absolute value Predictability Area Noise (electronics) Software bug Distribution (mathematics) Standard deviation Normal distribution Range (statistics) Median Line (geometry) Residual (numerical analysis) Absolute value
Point (geometry) Standard deviation Software bug Standard deviation Data storage device Range (statistics) Thresholding (image processing) Number
Web page Standard deviation Software bug Standard deviation Outlier Autocovariance Outlier Median Median Mereology Power (physics) Number Residual (numerical analysis) Arithmetic mean Term (mathematics) Absolute value Nichtlineares Gleichungssystem Absolute value
State observer Linear regression Multiplication sign Parameter (computer programming) Mereology Variance Variable (mathematics) Leak Data model Mathematics Invariant (mathematics) Different (Kate Ryan album) Core dump Moving average Endliche Modelltheorie Series (mathematics) System identification Error message Social class Computer font Software bug Linear regression Parameter (computer programming) Maxima and minima Bit Flow separation Process modeling Degree (graph theory) Category of being Autocorrelation Arithmetic mean Frequency Funktor Series (mathematics) Cross-correlation Order (biology) Inference Software testing Endliche Modelltheorie Right angle Multivariate Analyse Electric current Associative property Statistics Functional (mathematics) Implementation Identifiability Disintegration Autocovariance Time series Similarity (geometry) Mathematical analysis Average Differenz <Mathematik> Raw image format Graph coloring Number Sound effect Residual (numerical analysis) Cross-correlation Autocorrelation Software testing output Condition number Noise (electronics) Distribution (mathematics) Graph (mathematics) Autocovariance Cellular automaton Graph (mathematics) Theory Variance Timestamp Similarity (geometry) Residual (numerical analysis) Component-based software engineering Error message Function (mathematics) Partial derivative Abstraction Window Library (computing)
Meta element State observer Greatest element Distribution (mathematics) Code Multiplication sign Range (statistics) Parameter (computer programming) Software bug Data model Estimator Casting (performing arts) Bayesian network Core dump Square number Endliche Modelltheorie Series (mathematics) System identification Error message Oracle Software bug Linear regression Normal distribution Interior (topology) Parameter (computer programming) Picture archiving and communication system Statistics Measurement Autocorrelation Arithmetic mean Cross-correlation Order (biology) Inference Right angle Multivariate Analyse Metric system Thermal conductivity Resultant Electric current Probability density function Associative property Probability distribution Ocean current Point (geometry) Empennage Functional (mathematics) Vapor barrier Divisor Diagonal Distance Scattering Sound effect Cross-correlation Well-formed formula Autocorrelation Software testing Mathematical optimization Greedy algorithm Distribution (mathematics) Graph (mathematics) Inheritance (object-oriented programming) Information Characteristic polynomial Projective plane Theory Variance Residual (numerical analysis) Component-based software engineering Inclusion map Personal digital assistant Function (mathematics) Partial derivative Local ring
Software bug Distribution (mathematics) Functional (mathematics) Sigma-algebra Thresholding (image processing) Thresholding (image processing) Software bug Arithmetic mean Multivariate Analyse Website Energy level Multivariate Analyse Condition number
Greatest element Multiplication sign Time series Mathematical analysis Function (mathematics) Average Element (mathematics) Formal language Response time (technology) Read-only memory Semiconductor memory Autocorrelation Moving average Cuboid Software testing Endliche Modelltheorie Series (mathematics) Computer font Software bug Multiplication Cellular automaton Term (mathematics) Cartesian coordinate system Sequence Process modeling Residual (numerical analysis) Type theory Vector space Series (mathematics) Helmholtz decomposition output Figurate number Multivariate Analyse
State observer Context awareness Service (economics) Multiplication sign Time series Student's t-test Parameter (computer programming) Mereology Wave packet Data model Mathematics Read-only memory Arrow of time Software testing Endliche Modelltheorie Email Block (periodic table) Term (mathematics) Statistics Process modeling Linker (computing) Personal digital assistant Series (mathematics) output Endliche Modelltheorie Bounded variation Resultant
Word Software Videoconferencing Moving average Curve fitting
Divisor Software Observational study Series (mathematics) Plotter Thermal fluctuations Pattern language MiniDisc Pattern language Right angle Software testing Measurement
Series (mathematics) Multiplication sign Pattern language Flow separation Resultant Collective intelligence Connected space
Software bug Computer font 1 (number) Login Computer network Mathematical analysis Average Shift operator Process modeling Connected space Type theory Series (mathematics) Helmholtz decomposition Autocorrelation Moving average Software testing Right angle Energy level Multivariate Analyse
thank you might need some Julie from South Korea so this is who I am but I have to skip that because I I have a lot of things to
talk so just moving around and today
and then I'll share my point project which is a logging metric social networks and analyzing the data and doing some for cares for detecting anomalies here's the outline for a whole process from the data collection followed by the time series analysis followed by the forecasting and then the modeling that the of anomaly detection the what are we gonna go through all these items under each step so as long as time allows but instead of completing everything for each stage I will give a brief overview a surface of 1st and gradually get deeper into each process by iterating the steps so you will see a lot of figures at the beginning and then some texts and codes later there would be almost no there are some but almost no mass equations as we don't get into that much deeper who to star is certain I reproach of anomaly detection so let me show you how this project started at the very 1st beginning when I was living in long called Korean but I I lived for more than 2 years and will become a one-day Internet started to fail continuously so I made a call to the service provider and India came and he tested at the network was his own device that but at at the time he was not just no way is is it was just normal and I just could not reproduce the failure and from the next
day I installed a the test set on my smartphone and started to capture the test result every time when the network went down then I called and year again and showed in the captured images of failures this time he says the wireless devices not just not reliable so he asked me to test is a wire device I was just pissed off and at that time the only a wire device I had was raspberry pi was a land port so I ran the test on a regular basis and cataloging for a few days before the before years next visit but this is the
graph i showed engineer at that time in 2015 uh it is you see in the graph there we can see that this connection is repeated several times in the day in the opera opera graph there's a red crosses at the bottom that's the disconnections at last the engineer so we replace the model and then the interests of his mind will so in this case is this connections are normally but there are other types always in time series data we will see in the next slides and there were actually forward in this kind of a a analyzing this and it's just not are they not state analyzing but because there's no pork has the the just we just I just waited for some expected failures that to be repeated so therefore it's just a nivel approach and uh before we go more deeper and the I the problem and consider what we should be care about what the problem is detecting nobles states of full network it on more general way we can say anomaly detection for time series so what is time series the time series data is a set of observations on the value at different times and such observation have to be collected at regular time intervals and I for anomalies there are several types so the anomalous patterns in situ time-series mistaker look 1 by 1 so firstly the additive outliers which use unexpected spikes and drops but this connections that we just saw this this typical type of this type of anomaly next is a temporal changes as so unusual low or high observations for some short period of time and next to uh the little shelf and this case the metrics doesn't change the shape of a poral value of the period changes as a statistical characteristic has been changed borders should and so we must there must be many things to be said again after detecting exhaustion anomalies so the level ships of very important type of normally we have to deal with and
this go to the next step i the the 2nd round of starting with the data collection OK I used as the
test clean which is a command line tools written in Python Warrington a speed test is simply gives you a metric the response time and can test and download speed and oblast being a key you can see the results and I ran the test by using counter for every 5 minutes and I collected almost 20 thousand observation force romance OK this
is the log output looks like
at each test is separable from the next test by dilemma history right symbols in series uh some of you may have noticed that the tested and started at the exact time
I found there are many cases of test the test on 1 or a few seconds later but it does not make a huge difference and can be easily corrected later we'll see and this since that iterator class values which is reading the loss training on to the next lunar happens and parts and stored metrics and the times this time to build a data-dependent data-frame was panned that's uh I make a list of speakers object starting the loss of this but I mean the passing the loss train and in the next time build the data that at the time index for data frame here this is how I managed with that incorrect starting time bytes would be explicitly setting Gero 2nd interim microseconds for each state of art index is very important for cancers state as I mentioned before so by definition has to be a regular the punch areas so here is the chart
of the graph showing the overall data and the upper blue 1 is the pink test and oranges on download speed and the green 1 is to almost be so of they're actually have to handle some missing data the handling missing data in data scientists very important sometimes you raise unexpected error on your coat and size possibly lead us incorrect result which is even of wars so I We obviously see some accidental missing parts for a few days actually the 1st part was a failure of the the part and the 2nd 1 is I don't know just over this not responsive and in case and in this case I cannot just fill up those missing data is too huge with arbitrary values so I just this I think it's enough to train a model of the 1st part is plenty enough for training the data and I use the 2nd part is the validation and the last part of the test on test data and in
the ordered there are a few cases of missing we can hardly knows the sun visualizations but but we have to examine carefully the missing data like this so so by using and that of the code at the don't just the 1st line we can examine if there is any missing data in the data frame and I manage by propagating just the last valid observations of forward to the missing this so 1 typical way to do and here is how I handled the pound thus of was the data frame was the date time index is there was actually it was yesterday there was talk about the hand that indexing there was really enjoyable to me and handling time series with hand those super convenient so I can chop off all the time series and resampling and they could grow for a certain period to and do some aggregations and these are a few examples are used frankly speaking at a few years that at the time when I don't know much about and actually I was avoiding it because it it gives you nose too much confusion so that at that time I used to put the date time stream or daytime object is the individual column and then the service that they just a data frame to get the unnumbered index and then Curry again so it was ridiculous but I did actually so I don't be scared about and the more we know the less pain you again now let's look at let's have a look into the data so this since the alley plot for each day from Monday and to some for a week 24 hours from durable cloth on the on the x-axis y-axis show the download speed and megabits per 2nd as you see there are no specific pattern repeating each day but of maybe you can notice that there are less fluctuation at nite time and the right side of this chart and and the test capacity remains high next I draw the box plot for each state we can find a pattern in a week so this is the the Sunday but on the mouse pointer doesn't go from there so you can see this every Sunday uh focusing the orange line which is the median Dallas for each state it shows a regular oscillations and the median of Saturday and Sunday schools higher than the weak base so it is this shows clear pattern like this kind of repeating patterns we can categorize some pattern consisting the time series data um enough service time series can be decomposed into 3 components the
trend exists when there is increasing or decreasing direction in the series as such trend components does not have to be linear it could be exponential or it can be fixed
or decreased by law and the seasonal pattern exists when a series not influenced by seasonal factor and lastly the random noise this this component of the time series of tained after other components have been removed so it's not completely random and has 0 mean and constants variation which plays a very important role for anomaly detection we'll see later so the time series can be formally defined was like additive model multiplicative model we will do with these components more later but for now we just try to decompose the components with this uh was a Python and see if there's there there are a trend and seasonal seasonal key on our time series here I tried to
decompose the daily download time series for week from Monday to Sunday into seasonal component and try and component I use I used to seasonal decompose functions instead small package and so you can see that there is this exists a seasonal pattern and clear trend even if it was not clear was visualizing original data on your left side and it's time to build a model and but before we go deeper into modeling I was in itself we need to think about how modeling pop process of time series is different from that of the original machine learning process with the time-invariant dataset Our because we the training dataset into a training set and testing set and use the training set to fit the model and generate a prediction for each element in the test this is a 1 general way that's a train and validate the model so say we have a we we divided the tightest report BBC then train a model was part a and B and validate the model was part c or repeat the same process but with this time was BNC for training data and partly as a test data this is this is the typical process called cross validation anyone who have expertise in machine learning and I should be familiar with this however the cross-validation cannot be used for time series data because of the time dependency of part a has nothing to do with Part B and C it is so it is on on reasonable or to test the model was partly as a test set after training the model the worst part B and C and so the model that is trained by all data affects less than that of some recent data so we have to recreate at arima mortal after each new observation is received this is so cold rolling forecast so here's the piece of code running the
running the Rolling forecasts we keep track of all observation and a list history uh that is seeded with as the training data initially and later the observations are appended for each iterations the that will work each new observation and test set and then build and update model was the previous observations and was updated model reforecasts once they will have for at time t and then store the forecast value to a list Leslie ative history updated with the new observation at time t this is how we do that you know Roy corpus on the
left side as a testing result the blue line I represents all original data we saw before and the origin line showing our predictions starting from the middle middle middle the week so and that and just more important point here is the residuals on the right side the codebook for collating the residuals and putting the residual distribution the on the right side which is residuals are the difference between actual observation at time t and predicted value at time t if all those normal distribution you you see the bell called and meaning itself a just a white noise this very important as I mentioned before or or knowledge section so it can be used for anomaly detection after getting residual based on a Robles forecasting model so now we get residual with
Gaussian random noise the this was so residuals allied detection can be done with several ways by using the inter-quartile range or standard deviation and median absolute Aviation that or into Porter range it's quite popular by sorting the data their that their median is in the middle and the 1st part concert quartile opposition that 20 per 25 per cent lower and 75 per cent of per respectively that is if the data point is in red AD area I think it is considered to be too far from the center of value to be reasonable Kansas outlier I can implement like
this was non-PPI or pipe of pundits and with the standard
deviation uh if a value is such a number of standard deviation away from the media the data point is identified as outlier and discuss the number of standard aviation is called threshold usually uh we use 3 standard deviations it for this reason the deviation is most common thing and also it was called we can
obtain outliers like this was on fire or part of a planned pandas the pay
for a median absolute deviation it's the most powerful thing and i've approach so we have indeed there is covariate dataset and demand is defined as the median of absolute deviations from the data medium that as a get that the term medium 1st and then take the residuals for each data an and the median absolute deviation is the medium for the absolute value of values of the rest of residuals so is more clear was equations and so if the value is a certain number of them and median absolute deviation away cease to be mad from the median of the RES residuals that value is classified as an outlier the there is a short paper detecting outliers uh do not use standard aviation around the mean use absolute deviation around the median is published in 2013 it gives it just as I as I remember remember just houses 4 pages and it gives a super clear idea why we should Usenet honored and the other other ways that i hi recommended I I I will highly commented to read it if you are interested so the next step
uh we go through the I remind you a very nice a class of statistical model this is just a classic if it developed maybe 60 years ago but yet is very powerful and it is can be used to modeling and analyzing and forecasting Constas data the Arima performs well with stationary time series so we need to understand the meaning of stationary time series and how to transform non-stationary they tied to stationary data to understand the stationary
data here the artistry trait criterion of stationarity the mean variance and co-variance of the series on the part time should be time-variant meaning uh the mean of those series should not be a function of time so in the graph on the left hand graphs satisfying the conditions are whereas the graph on the right right side in red color has a time dependent I mean and the mean value continues to increase as time goes on next the variance of the series should not be a function of time as you can see in the chart on the following that the graph depicts that uh the the the the blue graph a stationary straight and we can notice that the bearing spread of distributed distribution and the right hand graph yeah which is non-stationary and lastly the core variance In the time and the time the series should not be a function of time so in the following graph you would know it noise that spread this the the speed the spread spread becomes closer as the time increases known as the covariance is not a constant so it's time for a we can test the stationary of for a time series Python library In statistics we have Dickey-Fuller test for testing the stationary and stats mortal packet has implementation of the test so when the test that takes you see bottom esthetics goes below 1 % critical value then we can consider the time series is stationary there but what if what at the time series is the time series is nonstationary the the main problem was still times is data is not there just nonstationary so we we have to make a stationary or doing something and so uh so when the data is nonstationary there's statistical properties like mean variance and maximum or minimum value changes over time in general was the series which is stationary after being differentiated uh that the times it can be up I mean the non-stationary data can be a stationary by differencing the value on for certain order so is it is set to be integrated D and denoted I of the which is this abstraction of y at time T minus 0 y 0 time minus the so the integrated here is what the character i in the middle of a remark stands for I the order of to alter regressive alter regression it and to simplify the altar regression is just a linear regression of the cell or P times than up pp time steps of lag times so that or alto means the self in Asian Greek so the linear regression has several features but in auto regression there's no feature but the timeseries but it's regressing by itself over time a moving average simply so this is doing the similar way they're moving moving average itself linear regression not actual observation but was the number of residual error I'm previous timestamps so putting the altogether is to summarize our model and it's been in the required parameters we need value for number of lax observations included in the model or alter regressions and d in degree of difference in the number of times that raw observation of difference or integrated and lastly the pitch a Q the size of moving average window in well actually it's a bit hard to understand those concepts but so maybe it's just enough to I to study how to identify such parameters thing which is not simply the in but we can be we have the autocorrelation function and partial or latent function that tells us how many legs we should consider for for testing the basically the current correlation of a time series observations is populated with
value of the same series of time that is why we call oral a correlation so the oracle latent function is the correlation between the current time step with the previous time step and the partial or a correlation function that's the same as the autocorrelation function but this time it removes Oracle lation of intermediate time lag between current time t and current time at the previous time how the mice q and sometimes filtering the ACF and PACS gives us hint for selecting a real parameter so this is a simplified guidelines for selecting P and q by of putting a sea PACS and also in the reference the article I which is more precise and this 1 I give you the super summerize but there so a long story and but I recommend you to read it if you want to study our of order so I'll give you a simple exact example which is the easy case for identifying the parameters the and this data is not from my own project but it gives you a clear idea so the upper upper profesora correlation function which is tails off and those bottom it as partial Oracle latent function cuts off factor like to use the see the 3rd 1 is like to the first one is the self so the correlation should be should be 1 the current times the times there is exactly the same as the current 1 so the lecture role should be 1 and it costs of that led to which means it's better to use a moving average then an auto regression so we can parameterize like Gero for P and 2 for a q but it does not go it it always not goes like that simple yeah so this comes from our data we saw previously and it is more complicated so I just use the greedy search to find the parameters you do know what the research is OK the search a it is it's it's as the finding of optimal parameters uh 1st we take a certain range of parameters and conduct exhaustive search until we get the based best result so we can measure the best result by an arbitrary measurement like mean square error or Bayesian information Criteria so long so it's quite effective for searching optimal parameters for a nice well and now now say we have 2 residuals by testing download speed and oblast be separately with oral model was the 2 uni-variate data this time to do anomaly detection again sometimes the nite and I know I've approaches I introduce before this not work well depends on data distribution because the usually data are they show you some are highly skewed data is more common than normal distribution however was still residuals distributed according to Gaussian we can get more robust results 1 may use the parameter estimation so say we have on the in the blue graph say it's a distribution of down speed and the 4 warrants the Jewish distribution for all those people To be more precise it's actually residual casting upload speed then we have we can estimate the mu that the mean of that distribution and variance of each has its distribution and so we can have a probability density function of each and then by multiplying then we can have a model and then when the new observation comes to we can test it by cutting off the stretch hold however this this method has a problem when the data points core variant and scatter around a certain parents say the diagonal left this you see in the in the graph then the opera left bottom right data points should be anomalies while upper right and bottom left just normal but this basically is in the same distance from the middle so how can we deal with this we can we can solve this problem with Gaussian distribution say this time we can estimate the mean and get a local barriers metrics sigma and then with some formula of we can get the problem of probability distribution functions then do the same test this OK the code
moral maybe it's more simple so this is the Gaussian other multivariate calcium distribution elderly detection an you see we can was the site I package you can estimate the Gaussian the the the mean and the sigmod and then we can
copulate the multivariate conscience problems did this to the problem of probability distributed function and then find the anomalies by In by conditioning with the threshold they're the finding the threshold is is another another level so it's not covered in this talk because a
almost finished more faster than I expected uh we can replace the model with others such as well as him there are terrorists there are many ways to for the time series but 1 trendy a technologist long short-term memory of which is 1 of deep learning technique the ls is useful
for sequence learning which enables to learn long dependency and it outperforms other methods in applications such as language modeling and be recognized uh as you see that in in in the figure the blue boxes in the bottom of time series inputs and the green box is in the middle are asked himself and the yellow boxes represent the cells the outputs which is propagated to the next cell so it has a memory of considering the previous time step and finally the red boxes predicted output so we feed a series of type T times there's problem jural to T minus 1 for predicting the target value at time t in the red box the beauty of LST EN it is each element in time series I can be a vector was multiple features so we can train and predict the download and almost speed and response time at once and do the multi a variant Galusha and efficiently so before we about why we are using our model we have to do the for testing and the taking the residuals for each for future for downloading uploading and can test but was that STM basically just born at once
there is called block the just 1 simple and but actually this meaningless because there are so many variations meaning there are a lot of steam to study and understand to get wobbles result out of arrow a student so actually I could not get almost uh result and honesty and neither I I I haven't so anybody and have done so I have showed good results achieved are several papers they are and they succeed is forecasting of is for almost a result in time series but it's not reproducible because they didn't did in the open how they get on the train the model or sometimes they don't the describe about how to to get how to train out to get the hyperparameters to find for trying tuning so that it is just that they they are assisting of succeed so actually this
ongoing research requires a lot of work to build a model for time series but it will it will allow to models of the model the softening the Suffolk sophisticated and seasonal dependencies in time series as well so as I mentioned this very helpful with small to pool of time series and still there are change challenges it can take a long time to run so it could be very expensive to do a wall importance because whenever a new observation comes you have to update them all and when it when it costs a lot and we can follow up the the observation and it often requires more data trained in other models and have a lot of input parameters to tune right to will be prepared before calling engineers for service failure and Pisoni star has a lot of all powerful tools to all this and also to understand few concept before using the tools that the most difficult part we need to study and deep learning for for testing time series is just a still ongoing research and most importantly do try this at home case there's my context I'm not familiar with social networks so this e-mails Nelson it contact me by e-mail or I have a a few more minutes to get some questions a few few you mean you and me for
them collection was a question the thanks the talk did you consider the other traffic on your home and network may have interfered with the data that you were generated in other words if you watching videos for example that may have contended would the download speed that you measure for a while and so it is really hard to understand shared to this and he made a very good question so
caring when I get this plot so I was curious about while I'm downloading or doing the heavy stuff some network it would affect is far from it and and yeah so I Internet what would affect of speed test and yes it does affect way that when I'm downloading or doing heavy stuff was my uh network then issue uh the measurements should go down but actually I found some interesting things and and In the last 2 days on Saturday and Sunday at that they I was not at home I was meant to for the trouble but still they are a fluctuation in the daytime so so my assumption is are more than my personal use I think it the more factors that affect for my ability neighbors who share of the back on so so such pattern of my neighbors is just the just a random so therefore we can have sourced patterns if I don't have such patterns and it only affects with of text was my own uses then that could cannot be the random so uh disks study can make sense because it is more affected by it and my neighbors the right so the Commission what I want to mention
India the engendered did you
fix your entire connection in the eye that and at the end and the whole called I finally our could manage the connection problem but actually there was no such severe connection problems at this time that does this for fun so that the 4 but maybe I can do some more things and pork as if I get more robust result then I
can I can predict when you will be failed in future or but maybe we can collect the data from the from the different houses and gather some collective intelligence our lives that could be interesting so your protect the question OK well thank you a lot showing us what's happening
in the connections and I we have Nextel about right it's the
ones with the thank you few