Merken

How can machine learning help to predict changes in size of Atlantic herring ?

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
welcome everyone my name is olga standard I work as a post-doc in
Ireland so I will go to show you how machine learning can be applied in sciences and after the previous talk if you've been here that's a nice introduction about all kinds of ensemble methods so here I'm going to show you 1 specific case on the gradient boosting OK so is a background of the problem in the past 60 years observed decline in size of fish by about 4
centimeters on a rich so thing about having which is about 20 long 4 centimeters a lot of a lot of reduction so we would like to find out what's the problem why is it happening and
we're going to use machine learning to answer this question so why is it the problem is because having is very important species for consumption and we know that if it does decrease it has a consequences for 1st of production it means there'll be less fish in the future so we can consume and we don't know what's
called declined but we are presuming there is interactive effect of various uh factors such as the surface
temperature may change and much
of was happening over there has a like bombers may change
efficient bombers may change or fishing pressure you OK so to answer
this question i'm going to use data from uh for the
past 60 years from 1959 uh 9 to
2012 and the data is spread throughout the year to such that should have a cake
and so I'm going to use this data and there's the way data has been
collected is uh it has
neglected from commercial vessels from uh taken at time them 50 to 100 samples at the time and um total sample size about 15 solves into individual features so imagine a dataset of 50 thousand euros um of them OK so study it is this is where the data comes
from its cold Celtic Seas just on the sources are learned and is bounded by this and your channel in the channel and so it's just a you can imagine where we are now since about the study
area size and there some objective is identified wouldn't factors which underlies this problem and to answer this
question i'm going to use a gradient boosting integration these which is 1 of the ensemble algorithms which is available as is the case uh white because we don't have a collection of don't have
1 thing but they have a collection of trees so I and the final and model is in the interests of the final models improved because we have a collection of interlinked trees so in this case as opposed to other methods such as bagging or integration uh over the random forest with the independent analysis methods all trees are dependent on the ways that
a is it also want to be so unexplained part of the model is and this as the input in the next 3 so we have a sequence of interconnected trees which is a nice feature it allows to reduce variance it allows to introduce bias the only
problem is this is because of their internally sequentially come to realize that our algorithm because they all depend on each other OK so and so
advantages of gradient boosting regression trees are basically more or less the same as all of those of us in symbol methods which means uh just to mention a few we can detect a nonlinear feature interaction is just because of the underlying feature selection which is going on in the algorithm i it is
resistant inclusive irrelevant features which means we can include as many variables as like and islands there won't be selected so we don't care OK so which is nice it is it is good to the deal
is data with different scales and you don't have to standardize data we may have you may wish to standardize but you don't have to because they are abused and if you for instance the
user normal uh like linear regression model will explode so in this case is this idyllic what advantage but
also robust to outliers so that any data points which are not fitting data it may be because it's a mistake or maybe some special and we don't care at all it's more accurate and we can use different loss functions like for instance the least square or others which
is an implementation gradient what integration theories which is nice OK disadvantages it requires careful tuning it takes a lot of time to get there with models it's all detained but at a faster predict and also you after I finish my the top part of my talk I'll so you implementation that by the noble curve OK so a little bit of a creations so as a formal specification of the model and we have it is in additive model so we have a sequence
all of these and they're each these the weighted error so that it's a it's a is it as we get to compare a sample of trees they all combined through this grammar weighted can see here and each
individual is shown as the UN's part of the equation and then we build an additive model in afforestation frustrations of size is said to be at each the sequentially reserves parameter epsilon which sitting patrols enormous learning data we know we'll talk about learning a distance learning rate so in learning later allows to control and uh a speed call fast we descend along the gradient and finally
at each stage the weak learner is chosen to minimize some
loss function in my case I took a least square because it's a natural choice but it can be any other function which you can do differentiate and is that this is part of the model and the aided by the negative gradient descent of the I won't go into that that but it's simple about promoting the OK so parameters which I finally selected in my case I needed to about 500 iterations and
learning rate of about 0 . 0 5 and this this the parameters are referred to to as the regularization parameters of k and affect degree your feet and there is therefore effect well of each other which is a bit complicated because if I increase the number of iteration let's say by a factor of 10 it doesn't mean that learning degrees but fucked up there it's not proportional so which is difficulty we you may increase the iterations but the learning rate might be a decrease by different proportion and that's why it's getting thinking OK so next
parameters Moxon 3 there reaches in my case a
6 and uh for this particular all that it is known from theory and from different simulation models that uh 3 prongs so it means that the response print only perform best OK which is nice so you don't need any deep trees but in some cases you may need to go from 4 to 6 Moxon 8 uh is uh a uh the data the rates
of in my case at 6 it means that my model can accommodate up to 5 years interactions of is what means OK next round
subsample in my case 75 per cent it's in all optional if you specify in English the monarch means that you get a stochastic model so we introduced some randomness it can be nice because it allows to reduce
variance and then reduce bias and that is practically I found out that this was a better result therefore I introduced so and basically my model is stochastic gradient boosted regression the with size OK and
loss function is least squares as I mentioned it's a natural choice nice to start it's easy to interpret but that can be any
other loss functions and their nicely implemented this I could learn and it's very easy to to change OK so if is to make our model in this case I it's pretty my data in 3 parts uh you know if I have enough time I'll show you how I did it splits into parts sales have results and they're very similar which is nice shows that was the so my model but in this case I um I data 50 per cent for training 25 tested 25 an addition there is no particular reason why because I have 57 throws I can
I just I just can't if you have less data they you don't you may choose for maybe of only about 2 0 consolidation of some of the methods which are more specific of smaller datasets but I have a big datasets and you can see I have uh indices mean squared error which is the beauty of a few
say well it's so I'm I'm I'm happy enough is my model and I can see that after some interactions that my model
and a flat knowledge so there is it is no being as it is no change in there and see which means that I have enough iterations and R square it tells me I proportion of variance which is explained by model and therefore training set is slightly higher which may indicate a bit of overfitting but it's not a big gap between them so I I'm satisfied and that's but this all those who follow each other very closely so it means it'll marriage my model is doing a good job of it and there is some of if I Fred induces variability in data I see that R square goes up so this is
basically an effect of so a little bit of results so if I plot here a lens of
the fish on its axis and you can see that it's maybe around uh from 20 to 30 centimeters so imagine and my model predicts fees from 22 to 28 so basically it is what it says on every street here a correct value if you have experienced still smaller due to be the 1 that predicted correctly OK so it's 50 per cent of the R
square each what what's the reflectance graph OK and if you want to find out which variables play
a role in the and in my model this is what I wanted to find out and the this is the way it's performance and each variable is used 1 the most important 1 is used to speed to the more often is used to speedup the if account times it's used we can say OK so that means it's more important in this case I have a color
coding here so it is 1st as parent is basically moms
OK so we know that is something that in attributed I could see it's a 100 % of
the cases it has been used after that we have sea-surface temperature uh which is
uh I'll show you next graph how it's affected but is basically some relationship and other things I have
food availability so that is a doubtful to see and then abundance of fish so how the how many topical large population etc. so most important message here is to remember is that tent is important 1 and after that we have the sea surface temperature and the and food OK so
if you thorough visualize the variables in them partial dependence plots so um the 1st throw he is the 1 they partial dependence plots basically where Paul each feature against air our might explain the data dependent variable said lands of the fish we can see that uh becomes really see a need particular relationship here it does this is a relationship but it shows a high degree of interaction is the way
how it's uh it's dependent child so we
don't really the company dependence here but we do become here so I highlighted here circles this 2 areas um uh it means that maybe if you if you can see about 14 degrees so if it's a surface temperature is below 40 degrees there is a positive relationship Sophie's gets larger so likes the temperature up of wood and it
is in this case if it gets too warm feature is is a
negative relationship so it does it is it that it's definitely
shows some kind of dependence between length of the fish and the temperature well I don't want to talk about climate change here because it's very debatable issue but you can imagine if temperature you know a global warming if temperature without that may have an effect on the future and their on us eventually because we can't consume future like OK so
this is an interesting message and the final
layer here is this is 1 of the food sources say in this particular case phytoplankton is what the sheets if you focus on this area uh while worldwide focus here not focus here because uh my most of my data is concentrated over here is you see because this little Dixon deciles so it's where is
concentrated on making the goal up to here just because I
have some lie but I don't care because I know my model is reversed so just idle upper part so if I look at this part I don't see any dependence I think it's just
because in this case uh it's not a limiting factor obviously they have less of food it effect but in case of Celtic see there is a lot of phytoplankton so if he doesn't is not dependent on the OK and then the 2nd here we have a 2 way interaction plots is plot each feature against each
other justified to see if I can pick up any interaction between those of
OK so we can see here is basically the same story pieces of temperature about putting degrees here you see that something is happening so uh is uh what it
says is this analysis tells me well I know that is that I broadened features but I can't really say why is it so buys of base effects at Trent is important it tells me that I might need to going use maybe time modeling to find out is the way it's the pencil icon answers questions Machine learning collect can do is to be copters features out of the badge of other features on the big datasets and it's as far as it goes so there are limitations to how you can apply it and so conclude the
season there are 3 important features which I just stand time tends to
surface temperature and food availability something is going on this temperature which is clearly about 14 degrees and uh is there is a high degree of interaction between these features and the members that this this method we can't and find the cause-effect relationship but we have a relative importance of the variables so from a bunch of variables I picked up the ones which are more important and they can take away I think it is me for the next type of analysis
and OK so this is the 1st part of my talk and not
on the show how much time I have I would like to show you um a little bit of how it has been implemented and some of them have 5
minutes so it's basically the 1st part of this is so what I've shown you in the my presentation it's a 3 way splits what my data set
so I'll go a bit confused large enough OK so
you know and I'm sure it's all familiar to you it's a virtual libraries and the BD producible because work in science they need to succeed because I want to run it again against him results appeared in the data and ICT about about 50 cells and throws and about 15 features in my case um I haven't discussed this but I do check multi-collinearlity which means it has 2 features which are related their dependent um full of normal
integration these like when I have 1 the only may area to it not made real for sure blow your model and you can't allow that to a new model for uh um assemble measures for this particular algorithm it doesn't matter but if you can detect analytically and is that they call variables which are you know
which and yeah so is basically how I did here I construct the metrics of this and more for the moment relation that is just what a cold and idea is that it here and I can find out which variables so the higher uh in multi-collinearlity is a more intensive color is basically uh there is no rule but having a buffer at 2 per cent of 0 . 8 may indicate multi-collinearlity so I see here is this is the eyes are red or not Our so is basically those variables I just a part of my model of and this 1 as well OK so I do more terms and I do the create so 50 per cent 25 25 for each part of the model and and
I think my model OK so as this is the final parameters but it took me if you uh if you nations for sure to be satisfied I have and then how I found out how many is to me this I need here because it is the usual rule
is to set the learning rate as low as possible and to get a number for estimators of number theory as high as possible and if you do
that you model around forever but you should end up to something feasible and and you can start playing around by reducing the fate of how found others
500 is used
apply all that means which is called early stopping in that set available it comes in little bit
later on OK so so much
more OK so yes and the trade and budget OK I just touch a button that just pushed on so OK and the same graph again you've seen it before and again I think
what's and what's interesting the show quickly and other part of so this is a stopping which I mentioned earlier the was part of many do it
the way split because to pretty something which is down to my opinion morphogens here
uh into is pretty only have trained and tested all have validation set of the air and to identify
parameters for this part I used a grid search so specified the range of parameters you can specify for all but you like but only to regularization parameters because those almost a year difficult ones so I specify here not that's which I know I had 6 so I do want to derive want to the
left and I know from duties it should be higher than 8 so I don't go there OK and that learning in a I have 0 . 5 now 0 . 0 5 and I want to be inclusive details and see how
it works so what happens is that we get where it begins you to get a um confusion methods and look at the but they have a metrics of different combination of parameters and each time fit the model around on the run and eventually the 1 which gives the highest accuracies chosen and it tells me which parameters such as the state so you
can see the output is best hyper-parameters it says that learning should be about 10 % of 0 . 1 instead of 0 . 0 5 and that's that can be a bit more shallow but it's very
close and close if I feel those parameters and all other parameters keep uh you
to say mn I eat what I
get is very similar results again here you can see
it's about 50 per cent or 50 per cent for the training and test data which is good OK so again we have to see the same graph which is good it means
I have the same algorithm but I applied to different types of data partitioning 1 time I did 3 ways create Rasera stopping to find a number of iterations of and otherwise splits into into parts and they used the research to find the best parameters I change parameters and steal my model does uh I think I have to finish my model does give similar
results which is good at this moment all that miserable OK I think I'll stop here because it's not doing well so think much for attention if if you have questions
yes because of the did you compare
the results with our new random forest
trees I II
III around some results is just a normal and for a standard mean there is slightly higher but I didn't tell you because I I was a bit stressed this all this thing but I didn't normally square regression and again it does shows that the model does it is less and so yes it is OK and do you
have the data on the local global yes it's some ideas that you can see it you
put my ears on them right so
it's a this thing to person number of
so when you said there is a link between the temperature and the fish just does that help you then you can overcome the more research is that it's not kind of the able this yes and basically this was a
kind of this idea very interested to find out of that I have it in terms of this is a time series data for 60 years this is very unique and science is similar to the term collection so he wanted to find out which variables are more important so that we can use our dataset to more filament once and then I could take into some kind of automatic time-series analysis it there's no 1
object and you have some sort of 4 quality of research efforts and those you that was blamed machines
Portscanner
Gruppe <Mathematik>
Luenberger-Beobachter
Vorlesung/Konferenz
Algorithmische Lerntheorie
Gradient
Standardabweichung
Virtuelle Maschine
ART-Netz
Biprodukt
ART-Netz
Ordnungsreduktion
Computeranimation
Flächentheorie
ART-Netz
Soundverarbeitung
Vorlesung/Konferenz
Druckverlauf
Vorlesung/Konferenz
Beobachtungsstudie
Total <Mathematik>
Stichprobenumfang
Flächeninhalt
Gerichteter Graph
Lineare Regression
Gradient
Teilbarkeit
Variable
Computeranimation
Integral
Gradient
Residuum
Objekt <Kategorie>
Algorithmus
Flächeninhalt
Gruppe <Mathematik>
Vorlesung/Konferenz
Lineare Regression
Folge <Mathematik>
Wald <Graphentheorie>
Stochastische Abhängigkeit
Gradient
Ein-Ausgabe
Variable
Computeranimation
Integral
Residuum
Netzwerktopologie
Informationsmodellierung
Mereologie
Varianz
Analysis
Algorithmus
Trennschärfe <Statistik>
Entscheidungsmodell
Interaktives Fernsehen
Symboltabelle
Computeranimation
Gradient
Nichtlineares System
Zentrische Streckung
Subtraktion
Variable
Computeranimation
Instantiierung
Lineares Funktional
Einfügungsdämpfung
Subtraktion
Quadratzahl
Punkt
Lineare Regression
Instantiierung
Umwandlungsenthalpie
Bit
Folge <Mathematik>
Formale Grammatik
Implementierung
Physikalische Theorie
Computeranimation
Integral
Gradient
Netzwerktopologie
Informationsmodellierung
Stichprobenumfang
Mereologie
Vorlesung/Konferenz
Kurvenanpassung
Addition
Informationsmodellierung
Geometrische Frustration
Mereologie
Datenmodell
Systemaufruf
Gleichungssystem
Abstand
Bitrate
Computeranimation
Gradient
Grenzwertberechnung
Subtraktion
Einfügungsdämpfung
Iteration
Zahlenbereich
Extrempunkt
Computeranimation
RFID
Informationsmodellierung
Negative Zahl
Iteration
Regulärer Graph
Gradientenverfahren
Auswahlaxiom
Soundverarbeitung
Parametersystem
Lineares Funktional
Datenmodell
Stichprobe
Bitrate
Teilbarkeit
Minimalgrad
Quadratzahl
Funktion <Mathematik>
Mereologie
ATM
Quadratzahl
Bitrate
Parametersystem
Subtraktion
Informationsmodellierung
Iteration
Funktion <Mathematik>
Endogene Variable
Hochdruck
Stichprobe
Simulation
Quadratzahl
Bitrate
Bitrate
Physikalische Theorie
Computeranimation
Informationsmodellierung
Iteration
Funktion <Mathematik>
Zufallsvariable
Stichprobenumfang
Stichprobe
Interaktives Fernsehen
Unrundheit
Quadratzahl
Bitrate
Computeranimation
Stochastischer Prozess
Resultante
Lineares Funktional
Einfügungsdämpfung
Informationsmodellierung
Quadratzahl
Lineare Regression
Auswahlaxiom
Varianz
Gradient
Addition
Lineares Funktional
Einfügungsdämpfung
Wellenpaket
Datenmodell
Stichprobe
Computeranimation
Informationsmodellierung
Softwaretest
Iteration
Funktion <Mathematik>
ATM
Mereologie
Indexberechnung
Quadratzahl
Bitrate
Fehlermeldung
Bit
Wellenpaket
Mathematisierung
Datenmodell
Interaktives Fernsehen
Iteration
Computeranimation
Informationsmodellierung
Variable
Softwaretest
Menge
Prozess <Informatik>
ATM
Bestimmtheitsmaß
Varianz
Soundverarbeitung
Resultante
Informationsmodellierung
Bit
Vorlesung/Konferenz
Kartesische Koordinaten
Computeranimation
Informationsmodellierung
Variable
Spiegelung <Mathematik>
Graph
Plot <Graphische Darstellung>
Kantenfärbung
Variable
Metropolitan area network
Vererbungshierarchie
Plot <Graphische Darstellung>
Variable
Graph
Plot <Graphische Darstellung>
Variable
Message-Passing
Computeranimation
Metropolitan area network
Variable
Minimalgrad
Partielle Differentiation
Interaktives Fernsehen
Plot <Graphische Darstellung>
Exogene Variable
Plot <Graphische Darstellung>
Partielle Differentiation
Computeranimation
Metropolitan area network
Kreisfläche
Minimalgrad
Flächeninhalt
Ortsoperator
Partielle Differentiation
Vorlesung/Konferenz
Plot <Graphische Darstellung>
Computeranimation
Soundverarbeitung
Dicke
Partielle Differentiation
Vorlesung/Konferenz
Plot <Graphische Darstellung>
Computeranimation
Flächeninhalt
Partielle Differentiation
Plot <Graphische Darstellung>
Fokalpunkt
Message-Passing
Soundverarbeitung
Informationsmodellierung
Mereologie
Partielle Differentiation
Interaktives Fernsehen
Vorlesung/Konferenz
Plot <Graphische Darstellung>
Plot <Graphische Darstellung>
Teilbarkeit
Lie-Gruppe
Computeranimation
Informationsmodellierung
Minimalgrad
Partielle Differentiation
Inverser Limes
Interaktives Fernsehen
Plot <Graphische Darstellung>
Algorithmische Lerntheorie
Bildschirmsymbol
Computeranimation
Analysis
Variable
Minimalgrad
Datentyp
Interaktives Fernsehen
Vorlesung/Konferenz
Variable
Computeranimation
Invariante
Analysis
Eins
Metropolitan area network
Bit
Menge
Mereologie
Mathematik
Extrempunkt
Kombinatorische Gruppentheorie
Große Vereinheitlichung
Computeranimation
Inverser Limes
Resultante
Metropolitan area network
Elektronische Bibliothek
Gruppe <Mathematik>
Indexberechnung
Zellularer Automat
Blase
Extrempunkt
Normalvektor
Versionsverwaltung
Computeranimation
Linienelement
Momentenproblem
Relativitätstheorie
Schlussregel
Term
Computeranimation
Integral
Metropolitan area network
Puffer <Netzplantechnik>
Informationsmodellierung
Variable
Algorithmus
Flächeninhalt
Mereologie
Vorlesung/Konferenz
Kantenfärbung
Wahrscheinlichkeitsmaß
Parametersystem
Informationsmodellierung
Vorlesung/Konferenz
Schlussregel
Schätzwert
Metropolitan area network
Elementare Zahlentheorie
Datenmodell
Zahlenbereich
Winkel
Bitrate
Computeranimation
Metropolitan area network
Bit
Graph
Menge
Datenmodell
Vorlesung/Konferenz
Cloud Computing
Computeranimation
Gammafunktion
Pell-Gleichung
Metropolitan area network
Heegaard-Zerlegung
Mereologie
Extrempunkt
Computeranimation
Parametersystem
Gruppe <Mathematik>
Singularität <Mathematik>
Validität
Extrempunkt
Computeranimation
Eins
Metropolitan area network
Quader
Spannweite <Stochastik>
Softwaretest
Menge
Regulärer Graph
Mereologie
Vorlesung/Konferenz
Metropolitan area network
Parametersystem
Subtraktion
Linienelement
Schaltnetz
Gradient
Ext-Funktor
Computeranimation
Aggregatzustand
Metropolitan area network
Parametersystem
Bit
Computeranimation
Funktion <Mathematik>
Softwaretest
Resultante
Metropolitan area network
Softwaretest
Wellenpaket
Graph
Datenmodell
Vorlesung/Konferenz
Extrempunkt
Computeranimation
Resultante
Parametersystem
Subtraktion
Informationsmodellierung
Algorithmus
Momentenproblem
Datentyp
Mereologie
Zahlenbereich
Iteration
Vorlesung/Konferenz
Resultante
Metropolitan area network
Kappa-Koeffizient
Wald <Graphentheorie>
Extrempunkt
Marketinginformationssystem
Personal Area Network
Computeranimation
Arithmetisches Mittel
Resultante
Netzwerktopologie
Bit
Informationsmodellierung
Lineare Regression
Metropolitan area network
Finite-Elemente-Methode
Zahlenbereich
Vorlesung/Konferenz
Binder <Informatik>
Hilfesystem
Computeranimation
Objekt <Kategorie>
Virtuelle Maschine
Variable
Zeitreihenanalyse
Term
Quick-Sort
Metropolitan area network
Computeranimation

Metadaten

Formale Metadaten

Titel How can machine learning help to predict changes in size of Atlantic herring ?
Serientitel EuroPython 2016
Teil 160
Anzahl der Teile 169
Autor Lyashevska, Olga
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/21202
Herausgeber EuroPython
Erscheinungsjahr 2016
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Olga Lyashevska - How can machine learning help to predict changes in size of Atlantic herring ? This talk is a case-study of how Python (Pandas, NumPy, SciKit-learn) can be implemented to identify the influence of the potential drivers of a decline in size of Atlantic herring populations using Gradient Boosting Regression Trees. ----- A decline in size and weight of Atlantic herring in the Celtic Sea has been observed since the mid-1980’s. The cause of the decline remains largely unexplained but is likely to be driven by the interactive effect of various endogenous and exogenous factors. The goal of this study is to interrogate a long time-series of biological data obtained from commercial fisheries from 1959 to 2012. We use gradient boosting regression trees to identify important variables underlying changes in growth from various potential drivers, such as: - Atlantic multidecadal oscillation; - sea surface temperature; - salinity; - wind; - zooplankton abundance; - fishing pressure. This learning algorithm allows to quantify the influence of the potential drivers of change with the test error lower when compared to other supervised learning techniques. The predictor variables importance spectrum (feature importance) helps to identify the underlying patterns and potential tipping points while resolving the external mechanisms underlying observed changes in size and weight of herring. This analysis is a useful case-study of how Python can be implemented in academia. The outputs of the analysis are of relevance to conservation efforts and sustainable fisheries management which promotes species resistance and resilience.

Ähnliche Filme

Loading...