Merken
Deep Learning your Broadband Network @HOME
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:05
thank you might need some Julie from South Korea so this is who I am but I have to skip that because I I have a lot of things to
00:16
talk so just moving around and today
00:20
and then I'll share my point project which is a logging metric social networks and analyzing the data and doing some for cares for detecting anomalies here's the outline for a whole process from the data collection followed by the time series analysis followed by the forecasting and then the modeling that the of anomaly detection the what are we gonna go through all these items under each step so as long as time allows but instead of completing everything for each stage I will give a brief overview a surface of 1st and gradually get deeper into each process by iterating the steps so you will see a lot of figures at the beginning and then some texts and codes later there would be almost no there are some but almost no mass equations as we don't get into that much deeper who to star is certain I reproach of anomaly detection so let me show you how this project started at the very 1st beginning when I was living in long called Korean but I I lived for more than 2 years and will become a oneday Internet started to fail continuously so I made a call to the service provider and India came and he tested at the network was his own device that but at at the time he was not just no way is is it was just normal and I just could not reproduce the failure and from the next
01:56
day I installed a the test set on my smartphone and started to capture the test result every time when the network went down then I called and year again and showed in the captured images of failures this time he says the wireless devices not just not reliable so he asked me to test is a wire device I was just pissed off and at that time the only a wire device I had was raspberry pi was a land port so I ran the test on a regular basis and cataloging for a few days before the before years next visit but this is the
02:37
graph i showed engineer at that time in 2015 uh it is you see in the graph there we can see that this connection is repeated several times in the day in the opera opera graph there's a red crosses at the bottom that's the disconnections at last the engineer so we replace the model and then the interests of his mind will so in this case is this connections are normally but there are other types always in time series data we will see in the next slides and there were actually forward in this kind of a a analyzing this and it's just not are they not state analyzing but because there's no pork has the the just we just I just waited for some expected failures that to be repeated so therefore it's just a nivel approach and uh before we go more deeper and the I the problem and consider what we should be care about what the problem is detecting nobles states of full network it on more general way we can say anomaly detection for time series so what is time series the time series data is a set of observations on the value at different times and such observation have to be collected at regular time intervals and I for anomalies there are several types so the anomalous patterns in situ timeseries mistaker look 1 by 1 so firstly the additive outliers which use unexpected spikes and drops but this connections that we just saw this this typical type of this type of anomaly next is a temporal changes as so unusual low or high observations for some short period of time and next to uh the little shelf and this case the metrics doesn't change the shape of a poral value of the period changes as a statistical characteristic has been changed borders should and so we must there must be many things to be said again after detecting exhaustion anomalies so the level ships of very important type of normally we have to deal with and
05:10
this go to the next step i the the 2nd round of starting with the data collection OK I used as the
05:20
test clean which is a command line tools written in Python Warrington a speed test is simply gives you a metric the response time and can test and download speed and oblast being a key you can see the results and I ran the test by using counter for every 5 minutes and I collected almost 20 thousand observation force romance OK this
05:48
is the log output looks like
05:51
at each test is separable from the next test by dilemma history right symbols in series uh some of you may have noticed that the tested and started at the exact time
06:04
I found there are many cases of test the test on 1 or a few seconds later but it does not make a huge difference and can be easily corrected later we'll see and this since that iterator class values which is reading the loss training on to the next lunar happens and parts and stored metrics and the times this time to build a datadependent dataframe was panned that's uh I make a list of speakers object starting the loss of this but I mean the passing the loss train and in the next time build the data that at the time index for data frame here this is how I managed with that incorrect starting time bytes would be explicitly setting Gero 2nd interim microseconds for each state of art index is very important for cancers state as I mentioned before so by definition has to be a regular the punch areas so here is the chart
07:17
of the graph showing the overall data and the upper blue 1 is the pink test and oranges on download speed and the green 1 is to almost be so of they're actually have to handle some missing data the handling missing data in data scientists very important sometimes you raise unexpected error on your coat and size possibly lead us incorrect result which is even of wars so I We obviously see some accidental missing parts for a few days actually the 1st part was a failure of the the part and the 2nd 1 is I don't know just over this not responsive and in case and in this case I cannot just fill up those missing data is too huge with arbitrary values so I just this I think it's enough to train a model of the 1st part is plenty enough for training the data and I use the 2nd part is the validation and the last part of the test on test data and in
08:35
the ordered there are a few cases of missing we can hardly knows the sun visualizations but but we have to examine carefully the missing data like this so so by using and that of the code at the don't just the 1st line we can examine if there is any missing data in the data frame and I manage by propagating just the last valid observations of forward to the missing this so 1 typical way to do and here is how I handled the pound thus of was the data frame was the date time index is there was actually it was yesterday there was talk about the hand that indexing there was really enjoyable to me and handling time series with hand those super convenient so I can chop off all the time series and resampling and they could grow for a certain period to and do some aggregations and these are a few examples are used frankly speaking at a few years that at the time when I don't know much about and actually I was avoiding it because it it gives you nose too much confusion so that at that time I used to put the date time stream or daytime object is the individual column and then the service that they just a data frame to get the unnumbered index and then Curry again so it was ridiculous but I did actually so I don't be scared about and the more we know the less pain you again now let's look at let's have a look into the data so this since the alley plot for each day from Monday and to some for a week 24 hours from durable cloth on the on the xaxis yaxis show the download speed and megabits per 2nd as you see there are no specific pattern repeating each day but of maybe you can notice that there are less fluctuation at nite time and the right side of this chart and and the test capacity remains high next I draw the box plot for each state we can find a pattern in a week so this is the the Sunday but on the mouse pointer doesn't go from there so you can see this every Sunday uh focusing the orange line which is the median Dallas for each state it shows a regular oscillations and the median of Saturday and Sunday schools higher than the weak base so it is this shows clear pattern like this kind of repeating patterns we can categorize some pattern consisting the time series data um enough service time series can be decomposed into 3 components the
11:58
trend exists when there is increasing or decreasing direction in the series as such trend components does not have to be linear it could be exponential or it can be fixed
12:11
or decreased by law and the seasonal pattern exists when a series not influenced by seasonal factor and lastly the random noise this this component of the time series of tained after other components have been removed so it's not completely random and has 0 mean and constants variation which plays a very important role for anomaly detection we'll see later so the time series can be formally defined was like additive model multiplicative model we will do with these components more later but for now we just try to decompose the components with this uh was a Python and see if there's there there are a trend and seasonal seasonal key on our time series here I tried to
13:09
decompose the daily download time series for week from Monday to Sunday into seasonal component and try and component I use I used to seasonal decompose functions instead small package and so you can see that there is this exists a seasonal pattern and clear trend even if it was not clear was visualizing original data on your left side and it's time to build a model and but before we go deeper into modeling I was in itself we need to think about how modeling pop process of time series is different from that of the original machine learning process with the timeinvariant dataset Our because we the training dataset into a training set and testing set and use the training set to fit the model and generate a prediction for each element in the test this is a 1 general way that's a train and validate the model so say we have a we we divided the tightest report BBC then train a model was part a and B and validate the model was part c or repeat the same process but with this time was BNC for training data and partly as a test data this is this is the typical process called cross validation anyone who have expertise in machine learning and I should be familiar with this however the crossvalidation cannot be used for time series data because of the time dependency of part a has nothing to do with Part B and C it is so it is on on reasonable or to test the model was partly as a test set after training the model the worst part B and C and so the model that is trained by all data affects less than that of some recent data so we have to recreate at arima mortal after each new observation is received this is so cold rolling forecast so here's the piece of code running the
15:32
running the Rolling forecasts we keep track of all observation and a list history uh that is seeded with as the training data initially and later the observations are appended for each iterations the that will work each new observation and test set and then build and update model was the previous observations and was updated model reforecasts once they will have for at time t and then store the forecast value to a list Leslie ative history updated with the new observation at time t this is how we do that you know Roy corpus on the
16:24
left side as a testing result the blue line I represents all original data we saw before and the origin line showing our predictions starting from the middle middle middle the week so and that and just more important point here is the residuals on the right side the codebook for collating the residuals and putting the residual distribution the on the right side which is residuals are the difference between actual observation at time t and predicted value at time t if all those normal distribution you you see the bell called and meaning itself a just a white noise this very important as I mentioned before or or knowledge section so it can be used for anomaly detection after getting residual based on a Robles forecasting model so now we get residual with
17:26
Gaussian random noise the this was so residuals allied detection can be done with several ways by using the interquartile range or standard deviation and median absolute Aviation that or into Porter range it's quite popular by sorting the data their that their median is in the middle and the 1st part concert quartile opposition that 20 per 25 per cent lower and 75 per cent of per respectively that is if the data point is in red AD area I think it is considered to be too far from the center of value to be reasonable Kansas outlier I can implement like
18:20
this was nonPPI or pipe of pundits and with the standard
18:28
deviation uh if a value is such a number of standard deviation away from the media the data point is identified as outlier and discuss the number of standard aviation is called threshold usually uh we use 3 standard deviations it for this reason the deviation is most common thing and also it was called we can
18:53
obtain outliers like this was on fire or part of a planned pandas the pay
19:02
for a median absolute deviation it's the most powerful thing and i've approach so we have indeed there is covariate dataset and demand is defined as the median of absolute deviations from the data medium that as a get that the term medium 1st and then take the residuals for each data an and the median absolute deviation is the medium for the absolute value of values of the rest of residuals so is more clear was equations and so if the value is a certain number of them and median absolute deviation away cease to be mad from the median of the RES residuals that value is classified as an outlier the there is a short paper detecting outliers uh do not use standard aviation around the mean use absolute deviation around the median is published in 2013 it gives it just as I as I remember remember just houses 4 pages and it gives a super clear idea why we should Usenet honored and the other other ways that i hi recommended I I I will highly commented to read it if you are interested so the next step
20:32
uh we go through the I remind you a very nice a class of statistical model this is just a classic if it developed maybe 60 years ago but yet is very powerful and it is can be used to modeling and analyzing and forecasting Constas data the Arima performs well with stationary time series so we need to understand the meaning of stationary time series and how to transform nonstationary they tied to stationary data to understand the stationary
21:10
data here the artistry trait criterion of stationarity the mean variance and covariance of the series on the part time should be timevariant meaning uh the mean of those series should not be a function of time so in the graph on the left hand graphs satisfying the conditions are whereas the graph on the right right side in red color has a time dependent I mean and the mean value continues to increase as time goes on next the variance of the series should not be a function of time as you can see in the chart on the following that the graph depicts that uh the the the the blue graph a stationary straight and we can notice that the bearing spread of distributed distribution and the right hand graph yeah which is nonstationary and lastly the core variance In the time and the time the series should not be a function of time so in the following graph you would know it noise that spread this the the speed the spread spread becomes closer as the time increases known as the covariance is not a constant so it's time for a we can test the stationary of for a time series Python library In statistics we have DickeyFuller test for testing the stationary and stats mortal packet has implementation of the test so when the test that takes you see bottom esthetics goes below 1 % critical value then we can consider the time series is stationary there but what if what at the time series is the time series is nonstationary the the main problem was still times is data is not there just nonstationary so we we have to make a stationary or doing something and so uh so when the data is nonstationary there's statistical properties like mean variance and maximum or minimum value changes over time in general was the series which is stationary after being differentiated uh that the times it can be up I mean the nonstationary data can be a stationary by differencing the value on for certain order so is it is set to be integrated D and denoted I of the which is this abstraction of y at time T minus 0 y 0 time minus the so the integrated here is what the character i in the middle of a remark stands for I the order of to alter regressive alter regression it and to simplify the altar regression is just a linear regression of the cell or P times than up pp time steps of lag times so that or alto means the self in Asian Greek so the linear regression has several features but in auto regression there's no feature but the timeseries but it's regressing by itself over time a moving average simply so this is doing the similar way they're moving moving average itself linear regression not actual observation but was the number of residual error I'm previous timestamps so putting the altogether is to summarize our model and it's been in the required parameters we need value for number of lax observations included in the model or alter regressions and d in degree of difference in the number of times that raw observation of difference or integrated and lastly the pitch a Q the size of moving average window in well actually it's a bit hard to understand those concepts but so maybe it's just enough to I to study how to identify such parameters thing which is not simply the in but we can be we have the autocorrelation function and partial or latent function that tells us how many legs we should consider for for testing the basically the current correlation of a time series observations is populated with
26:40
value of the same series of time that is why we call oral a correlation so the oracle latent function is the correlation between the current time step with the previous time step and the partial or a correlation function that's the same as the autocorrelation function but this time it removes Oracle lation of intermediate time lag between current time t and current time at the previous time how the mice q and sometimes filtering the ACF and PACS gives us hint for selecting a real parameter so this is a simplified guidelines for selecting P and q by of putting a sea PACS and also in the reference the article I which is more precise and this 1 I give you the super summerize but there so a long story and but I recommend you to read it if you want to study our of order so I'll give you a simple exact example which is the easy case for identifying the parameters the and this data is not from my own project but it gives you a clear idea so the upper upper profesora correlation function which is tails off and those bottom it as partial Oracle latent function cuts off factor like to use the see the 3rd 1 is like to the first one is the self so the correlation should be should be 1 the current times the times there is exactly the same as the current 1 so the lecture role should be 1 and it costs of that led to which means it's better to use a moving average then an auto regression so we can parameterize like Gero for P and 2 for a q but it does not go it it always not goes like that simple yeah so this comes from our data we saw previously and it is more complicated so I just use the greedy search to find the parameters you do know what the research is OK the search a it is it's it's as the finding of optimal parameters uh 1st we take a certain range of parameters and conduct exhaustive search until we get the based best result so we can measure the best result by an arbitrary measurement like mean square error or Bayesian information Criteria so long so it's quite effective for searching optimal parameters for a nice well and now now say we have 2 residuals by testing download speed and oblast be separately with oral model was the 2 univariate data this time to do anomaly detection again sometimes the nite and I know I've approaches I introduce before this not work well depends on data distribution because the usually data are they show you some are highly skewed data is more common than normal distribution however was still residuals distributed according to Gaussian we can get more robust results 1 may use the parameter estimation so say we have on the in the blue graph say it's a distribution of down speed and the 4 warrants the Jewish distribution for all those people To be more precise it's actually residual casting upload speed then we have we can estimate the mu that the mean of that distribution and variance of each has its distribution and so we can have a probability density function of each and then by multiplying then we can have a model and then when the new observation comes to we can test it by cutting off the stretch hold however this this method has a problem when the data points core variant and scatter around a certain parents say the diagonal left this you see in the in the graph then the opera left bottom right data points should be anomalies while upper right and bottom left just normal but this basically is in the same distance from the middle so how can we deal with this we can we can solve this problem with Gaussian distribution say this time we can estimate the mean and get a local barriers metrics sigma and then with some formula of we can get the problem of probability distribution functions then do the same test this OK the code
32:12
moral maybe it's more simple so this is the Gaussian other multivariate calcium distribution elderly detection an you see we can was the site I package you can estimate the Gaussian the the the mean and the sigmod and then we can
32:38
copulate the multivariate conscience problems did this to the problem of probability distributed function and then find the anomalies by In by conditioning with the threshold they're the finding the threshold is is another another level so it's not covered in this talk because a
33:05
almost finished more faster than I expected uh we can replace the model with others such as well as him there are terrorists there are many ways to for the time series but 1 trendy a technologist long shortterm memory of which is 1 of deep learning technique the ls is useful
33:32
for sequence learning which enables to learn long dependency and it outperforms other methods in applications such as language modeling and be recognized uh as you see that in in in the figure the blue boxes in the bottom of time series inputs and the green box is in the middle are asked himself and the yellow boxes represent the cells the outputs which is propagated to the next cell so it has a memory of considering the previous time step and finally the red boxes predicted output so we feed a series of type T times there's problem jural to T minus 1 for predicting the target value at time t in the red box the beauty of LST EN it is each element in time series I can be a vector was multiple features so we can train and predict the download and almost speed and response time at once and do the multi a variant Galusha and efficiently so before we about why we are using our model we have to do the for testing and the taking the residuals for each for future for downloading uploading and can test but was that STM basically just born at once
35:06
there is called block the just 1 simple and but actually this meaningless because there are so many variations meaning there are a lot of steam to study and understand to get wobbles result out of arrow a student so actually I could not get almost uh result and honesty and neither I I I haven't so anybody and have done so I have showed good results achieved are several papers they are and they succeed is forecasting of is for almost a result in time series but it's not reproducible because they didn't did in the open how they get on the train the model or sometimes they don't the describe about how to to get how to train out to get the hyperparameters to find for trying tuning so that it is just that they they are assisting of succeed so actually this
36:25
ongoing research requires a lot of work to build a model for time series but it will it will allow to models of the model the softening the Suffolk sophisticated and seasonal dependencies in time series as well so as I mentioned this very helpful with small to pool of time series and still there are change challenges it can take a long time to run so it could be very expensive to do a wall importance because whenever a new observation comes you have to update them all and when it when it costs a lot and we can follow up the the observation and it often requires more data trained in other models and have a lot of input parameters to tune right to will be prepared before calling engineers for service failure and Pisoni star has a lot of all powerful tools to all this and also to understand few concept before using the tools that the most difficult part we need to study and deep learning for for testing time series is just a still ongoing research and most importantly do try this at home case there's my context I'm not familiar with social networks so this emails Nelson it contact me by email or I have a a few more minutes to get some questions a few few you mean you and me for
38:14
them collection was a question the thanks the talk did you consider the other traffic on your home and network may have interfered with the data that you were generated in other words if you watching videos for example that may have contended would the download speed that you measure for a while and so it is really hard to understand shared to this and he made a very good question so
39:10
caring when I get this plot so I was curious about while I'm downloading or doing the heavy stuff some network it would affect is far from it and and yeah so I Internet what would affect of speed test and yes it does affect way that when I'm downloading or doing heavy stuff was my uh network then issue uh the measurements should go down but actually I found some interesting things and and In the last 2 days on Saturday and Sunday at that they I was not at home I was meant to for the trouble but still they are a fluctuation in the daytime so so my assumption is are more than my personal use I think it the more factors that affect for my ability neighbors who share of the back on so so such pattern of my neighbors is just the just a random so therefore we can have sourced patterns if I don't have such patterns and it only affects with of text was my own uses then that could cannot be the random so uh disks study can make sense because it is more affected by it and my neighbors the right so the Commission what I want to mention
41:01
India the engendered did you
41:04
fix your entire connection in the eye that and at the end and the whole called I finally our could manage the connection problem but actually there was no such severe connection problems at this time that does this for fun so that the 4 but maybe I can do some more things and pork as if I get more robust result then I
41:38
can I can predict when you will be failed in future or but maybe we can collect the data from the from the different houses and gather some collective intelligence our lives that could be interesting so your protect the question OK well thank you a lot showing us what's happening
42:12
in the connections and I we have Nextel about right it's the
42:20
ones with the thank you few
00:00
Server
Punkt
Prozess <Physik>
Gemeinsamer Speicher
Gleichungssystem
Maschinelles Lernen
Dienst <Informatik>
EMail
Wald <Graphentheorie>
Analysis
Service provider
Computeranimation
Internetworking
Intel
Physikalisches System
Informationsmodellierung
Softwaretest
Zeitreihenanalyse
Font
Flächentheorie
Datennetz
Interprozesskommunikation
Figurierte Zahl
Datennetz
Ruhmasse
Systemaufruf
Gleitendes Mittel
Optimierung
Programmfehler
Reihe
Dienst <Informatik>
Software
Programmfehler
COM
Login
Codierung
Projektive Ebene
Software Engineering
01:54
Resultante
Subtraktion
Mathematisierung
Aggregatzustand
Computeranimation
Übergang
Informationsmodellierung
Ausreißer <Statistik>
Regulärer Graph
Zeitreihenanalyse
Typentheorie
Datennetz
Datentyp
Minimum
Mustersprache
LuenbergerBeobachter
Tropfen
Bildgebendes Verfahren
Einfach zusammenhängender Raum
Softwaretest
Verschiebungsoperator
Shape <Informatik>
Datennetz
Graph
Linienelement
Übergang
Frequenz
Programmfehler
Rechenschieber
Reihe
Ausreißer <Statistik>
Menge
Basisvektor
Charakteristisches Polynom
Smartphone
Aggregatzustand
05:08
Mittelwert
Softwaretest
Resultante
Unrundheit
Gleitendes Mittel
Wald <Graphentheorie>
Analysis
Computeranimation
Reihe
Softwaretest
Forcing
Login
LuenbergerBeobachter
ResponseZeit
Schlüsselverwaltung
05:45
Softwaretest
Subtraktion
Einfügungsdämpfung
Wellenpaket
Lochstreifen
Rahmenproblem
Linienelement
Zwei
Klasse <Mathematik>
Parser
Iteration
MailingListe
Symboltabelle
Computeranimation
Gefangenendilemma
Funktion <Mathematik>
Flächeninhalt
Rechter Winkel
Automatische Indexierung
Login
Mereologie
Aggregatzustand
Funktion <Mathematik>
07:13
Resultante
Quader
Rahmenproblem
Kartesische Koordinaten
Code
Computeranimation
Streaming <Kommunikationstechnik>
Informationsmodellierung
Zeitreihenanalyse
Mustersprache
Visualisierung
LuenbergerBeobachter
Plot <Graphische Darstellung>
Zusammenhängender Graph
Zeiger <Informatik>
Gerade
Softwaretest
URN
Graph
Fluktuation <Physik>
Validität
Indexberechnung
Kanalkapazität
Plot <Graphische Darstellung>
Frequenz
Frequenz
Medianwert
Objekt <Kategorie>
Mustersprache
Reihe
Dienst <Informatik>
Automatische Indexierung
Zahlenbereich
Mereologie
Pendelschwingung
Aggregatzustand
Fehlermeldung
11:56
Addition
TVDVerfahren
Reihe
Frequenz
Rauschen
Gesetz <Physik>
Teilbarkeit
Computeranimation
Richtung
Konstante
Arithmetisches Mittel
Reihe
Multiplikation
Informationsmodellierung
Zufallszahlen
Twitter <Softwareplattform>
Zeitreihenanalyse
ATM
TVDVerfahren
Mustersprache
Modem
Zusammenhängender Graph
Schlüsselverwaltung
13:06
Prozess <Physik>
Wellenpaket
Sterbeziffer
Iteration
Element <Mathematik>
Wald <Graphentheorie>
Code
Computeranimation
Virtuelle Maschine
Informationsmodellierung
Weg <Topologie>
Prognoseverfahren
Zeitreihenanalyse
Mustersprache
LuenbergerBeobachter
Plot <Graphische Darstellung>
Zusammenhängender Graph
Algorithmische Lerntheorie
Gleitendes Mittel
Softwaretest
Lineares Funktional
Datenmodell
MailingListe
Gleitendes Mittel
Ausgleichsrechnung
Kreuzvalidierung
HelmholtzZerlegung
Twitter <Softwareplattform>
Menge
Mereologie
Verkehrsinformation
16:21
Codebuch
Distributionstheorie
Subtraktion
Punkt
Betrag <Mathematik>
Computeranimation
Medianwert
Informationsmodellierung
Spannweite <Stochastik>
Prognoseverfahren
Standardabweichung
Weißes Rauschen
LuenbergerBeobachter
Gerade
Softwaretest
Rauschen
Medianwert
Residuum
Spannweite <Stochastik>
Ausreißer <Statistik>
Normalverteilung
Betrag <Mathematik>
Flächeninhalt
Residuum
Mereologie
Garbentheorie
Standardabweichung
18:17
Spannweite <Stochastik>
Schwellwertverfahren
Punkt
Standardabweichung
Hypermedia
Zahlenbereich
Computeranimation
Standardabweichung
18:50
Kovarianzfunktion
Zahlenbereich
Betrag <Mathematik>
Gleichungssystem
Medianwert
Term
Computeranimation
Homepage
Arithmetisches Mittel
Medianwert
Ausreißer <Statistik>
Ausreißer <Statistik>
Betrag <Mathematik>
Standardabweichung
Residuum
Mereologie
Standardabweichung
Leistung <Physik>
20:27
Distributionstheorie
Kovarianzfunktion
Bit
Extrempunkt
Desintegration <Mathematik>
Partielle Differentiation
Ungerichteter Graph
Wald <Graphentheorie>
Analysis
Computeranimation
Softwaretest
Lineare Regression
Bildschirmfenster
Schlussfolgern
Zeitstempel
Korrelationsfunktion
Funktor
Softwaretest
Parametersystem
Lineares Funktional
Statistik
Kategorie <Mathematik>
Abstraktionsebene
Reihe
Ähnlichkeitsgeometrie
Partielle Differentiation
Gleitendes Mittel
Variable
Residuum
Invariante
Arithmetisches Mittel
Reihe
Rohdaten
Funktion <Mathematik>
Rechter Winkel
Autokorrelation
Physikalische Theorie
Konditionszahl
EinAusgabe
Ablöseblase
Identifizierbarkeit
Ordnung <Mathematik>
Fehlermeldung
Lineare Abbildung
Subtraktion
Klasse <Mathematik>
Mathematisierung
Zellularer Automat
Zahlenbereich
Implementierung
Geräusch
RFID
Informationsmodellierung
Zeitreihenanalyse
Subtraktion
Programmbibliothek
LuenbergerBeobachter
Varianz
Lineare Regression
Graph
Datenmodell
Mathematisierung
Frequenz
Kovarianzfunktion
Minimalgrad
Residuum
Parametersystem
Mereologie
Speicherabzug
Kantenfärbung
Modelltheorie
26:38
Resultante
Distributionstheorie
Punkt
Minimierung
Partielle Differentiation
Steuerwerk
Computeranimation
Lineare Regression
Minimum
Schlussfolgern
GreedyAlgorithmus
Einflussgröße
Korrelationsfunktion
BayesNetz
Feuchteleitung
Softwaretest
Distributionstheorie
Lineares Funktional
Parametersystem
Reihe
Stellenring
Dichte <Stochastik>
Strömungsrichtung
Partielle Differentiation
Auswahlverfahren
Teilbarkeit
Arithmetisches Mittel
Normalverteilung
Funktion <Mathematik>
Autokorrelation
Rechter Winkel
Physikalische Theorie
Projektive Ebene
Information
Ordnung <Mathematik>
Diagonale <Geometrie>
Multivariate Analyse
Wärmeleitfähigkeit
Fehlermeldung
PACS
Code
Ausdruck <Logik>
Informationsmodellierung
Spannweite <Stochastik>
Vererbungshierarchie
LuenbergerBeobachter
Abstand
Varianz
Schätzwert
Diskrete Wahrscheinlichkeitsverteilung
Graph
Linienelement
Streuung
Datenmodell
Programmfehler
Quadratzahl
Residuum
Parametersystem
Speicherabzug
Steuerwerk
Innerer Punkt
Orakel <Informatik>
32:10
Distributionstheorie
Lineares Funktional
Web Site
Schwellwertverfahren
Computeranimation
Programmfehler
Übergang
Arithmetisches Mittel
Multivariate Analyse
Konditionszahl
Schwellwertverfahren
SigmaAlgebra
Multivariate Analyse
33:02
Folge <Mathematik>
Quader
Formale Sprache
Zellularer Automat
Kartesische Koordinaten
Element <Mathematik>
ROM <Informatik>
Wald <Graphentheorie>
Analysis
Computeranimation
Informationsmodellierung
Multiplikation
Softwaretest
Zeitreihenanalyse
Datentyp
Minimum
ResponseZeit
Figurierte Zahl
Funktion <Mathematik>
Softwaretest
Reihe
Vektorraum
EinAusgabe
Gleitendes Mittel
Reihe
Residuum
ROM <Informatik>
Term
35:05
Softwaretest
Resultante
TVDVerfahren
Parametersystem
Wellenpaket
Mathematisierung
Datenmodell
tTest
pBlock
Kontextbezogenes System
EinAusgabe
ROM <Informatik>
Computeranimation
Reihe
Informationsmodellierung
Dienst <Informatik>
Zeitreihenanalyse
Mereologie
LuenbergerBeobachter
Zeitrichtung
EMail
Term
38:12
Datennetz
Besprechung/Interview
Datenmodell
Wort <Informatik>
Gleitendes Mittel
Ausgleichsrechnung
Wald <Graphentheorie>
Computeranimation
Videokonferenz
39:06
Softwaretest
Beobachtungsstudie
Mustersprache
Reihe
Datennetz
Rechter Winkel
Fluktuation <Physik>
MiniDisc
Mustersprache
Plot <Graphische Darstellung>
Teilbarkeit
Einflussgröße
41:03
Resultante
Einfach zusammenhängender Raum
Mustersprache
Reihe
Ablöseblase
Schwarmintelligenz
Computeranimation
42:09
Mittelwert
Einfach zusammenhängender Raum
Verschiebungsoperator
Übergang
Gleitendes Mittel
Wald <Graphentheorie>
Analysis
Computeranimation
Eins
Reihe
Softwaretest
Rechter Winkel
Login
Typentheorie
Datennetz
Metadaten
Formale Metadaten
Titel  Deep Learning your Broadband Network @HOME 
Serientitel  EuroPython 2017 
Autor 
Lee, Hongjoo

Lizenz 
CCNamensnennung  keine kommerzielle Nutzung  Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nichtkommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben 
DOI  10.5446/33720 
Herausgeber  EuroPython 
Erscheinungsjahr  2017 
Sprache  Englisch 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Deep Learning your Broadband Network @HOME [EuroPython 2017  Talk  20170714  Anfiteatro 1] [Rimini, Italy] Most of us have broadband internet services at home. Sometimes it does not work well, and we visit speed test page and check internet speed for ourselves or call cable company to report the service failure. As a Python programmer, have you ever tried to automate the internet speed test on a regular basis? Have you ever thought about logging the data and analyzing the time series ? In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period. Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory). The goal is to provide basic idea how to run speed test and collect metrics by automated script in Python. Also, I will provide high level concept of the methodologies for analyzing time series data. Also, I would like to motivate Python people to try this at home. This session is designed to be accessible to everyone, including anyone with no expertise in mathematics, computer science. Understandings of basic concepts of machine learning and some Python tools bringing such concepts into practice might be helpful, but not necessary for the audience 