Merken
How can machine learning help to predict changes in size of Atlantic herring ?
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:01
welcome everyone my name is olga standard I work as a postdoc in
00:05
Ireland so I will go to show you how machine learning can be applied in sciences and after the previous talk if you've been here that's a nice introduction about all kinds of ensemble methods so here I'm going to show you 1 specific case on the gradient boosting OK so is a background of the problem in the past 60 years observed decline in size of fish by about 4
00:28
centimeters on a rich so thing about having which is about 20 long 4 centimeters a lot of a lot of reduction so we would like to find out what's the problem why is it happening and
00:38
we're going to use machine learning to answer this question so why is it the problem is because having is very important species for consumption and we know that if it does decrease it has a consequences for 1st of production it means there'll be less fish in the future so we can consume and we don't know what's
00:56
called declined but we are presuming there is interactive effect of various uh factors such as the surface
01:03
temperature may change and much
01:06
of was happening over there has a like bombers may change
01:11
efficient bombers may change or fishing pressure you OK so to answer
01:19
this question i'm going to use data from uh for the
01:23
past 60 years from 1959 uh 9 to
01:26
2012 and the data is spread throughout the year to such that should have a cake
01:45
and so I'm going to use this data and there's the way data has been
01:49
collected is uh it has
01:52
neglected from commercial vessels from uh taken at time them 50 to 100 samples at the time and um total sample size about 15 solves into individual features so imagine a dataset of 50 thousand euros um of them OK so study it is this is where the data comes
02:12
from its cold Celtic Seas just on the sources are learned and is bounded by this and your channel in the channel and so it's just a you can imagine where we are now since about the study
02:22
area size and there some objective is identified wouldn't factors which underlies this problem and to answer this
02:30
question i'm going to use a gradient boosting integration these which is 1 of the ensemble algorithms which is available as is the case uh white because we don't have a collection of don't have
02:43
1 thing but they have a collection of trees so I and the final and model is in the interests of the final models improved because we have a collection of interlinked trees so in this case as opposed to other methods such as bagging or integration uh over the random forest with the independent analysis methods all trees are dependent on the ways that
03:06
a is it also want to be so unexplained part of the model is and this as the input in the next 3 so we have a sequence of interconnected trees which is a nice feature it allows to reduce variance it allows to introduce bias the only
03:19
problem is this is because of their internally sequentially come to realize that our algorithm because they all depend on each other OK so and so
03:29
advantages of gradient boosting regression trees are basically more or less the same as all of those of us in symbol methods which means uh just to mention a few we can detect a nonlinear feature interaction is just because of the underlying feature selection which is going on in the algorithm i it is
03:49
resistant inclusive irrelevant features which means we can include as many variables as like and islands there won't be selected so we don't care OK so which is nice it is it is good to the deal
04:01
is data with different scales and you don't have to standardize data we may have you may wish to standardize but you don't have to because they are abused and if you for instance the
04:11
user normal uh like linear regression model will explode so in this case is this idyllic what advantage but
04:18
also robust to outliers so that any data points which are not fitting data it may be because it's a mistake or maybe some special and we don't care at all it's more accurate and we can use different loss functions like for instance the least square or others which
04:32
is an implementation gradient what integration theories which is nice OK disadvantages it requires careful tuning it takes a lot of time to get there with models it's all detained but at a faster predict and also you after I finish my the top part of my talk I'll so you implementation that by the noble curve OK so a little bit of a creations so as a formal specification of the model and we have it is in additive model so we have a sequence
05:00
all of these and they're each these the weighted error so that it's a it's a is it as we get to compare a sample of trees they all combined through this grammar weighted can see here and each
05:12
individual is shown as the UN's part of the equation and then we build an additive model in afforestation frustrations of size is said to be at each the sequentially reserves parameter epsilon which sitting patrols enormous learning data we know we'll talk about learning a distance learning rate so in learning later allows to control and uh a speed call fast we descend along the gradient and finally
05:37
at each stage the weak learner is chosen to minimize some
05:40
loss function in my case I took a least square because it's a natural choice but it can be any other function which you can do differentiate and is that this is part of the model and the aided by the negative gradient descent of the I won't go into that that but it's simple about promoting the OK so parameters which I finally selected in my case I needed to about 500 iterations and
06:06
learning rate of about 0 . 0 5 and this this the parameters are referred to to as the regularization parameters of k and affect degree your feet and there is therefore effect well of each other which is a bit complicated because if I increase the number of iteration let's say by a factor of 10 it doesn't mean that learning degrees but fucked up there it's not proportional so which is difficulty we you may increase the iterations but the learning rate might be a decrease by different proportion and that's why it's getting thinking OK so next
06:37
parameters Moxon 3 there reaches in my case a
06:41
6 and uh for this particular all that it is known from theory and from different simulation models that uh 3 prongs so it means that the response print only perform best OK which is nice so you don't need any deep trees but in some cases you may need to go from 4 to 6 Moxon 8 uh is uh a uh the data the rates
07:05
of in my case at 6 it means that my model can accommodate up to 5 years interactions of is what means OK next round
07:14
subsample in my case 75 per cent it's in all optional if you specify in English the monarch means that you get a stochastic model so we introduced some randomness it can be nice because it allows to reduce
07:28
variance and then reduce bias and that is practically I found out that this was a better result therefore I introduced so and basically my model is stochastic gradient boosted regression the with size OK and
07:40
loss function is least squares as I mentioned it's a natural choice nice to start it's easy to interpret but that can be any
07:46
other loss functions and their nicely implemented this I could learn and it's very easy to to change OK so if is to make our model in this case I it's pretty my data in 3 parts uh you know if I have enough time I'll show you how I did it splits into parts sales have results and they're very similar which is nice shows that was the so my model but in this case I um I data 50 per cent for training 25 tested 25 an addition there is no particular reason why because I have 57 throws I can
08:17
I just I just can't if you have less data they you don't you may choose for maybe of only about 2 0 consolidation of some of the methods which are more specific of smaller datasets but I have a big datasets and you can see I have uh indices mean squared error which is the beauty of a few
08:34
say well it's so I'm I'm I'm happy enough is my model and I can see that after some interactions that my model
08:42
and a flat knowledge so there is it is no being as it is no change in there and see which means that I have enough iterations and R square it tells me I proportion of variance which is explained by model and therefore training set is slightly higher which may indicate a bit of overfitting but it's not a big gap between them so I I'm satisfied and that's but this all those who follow each other very closely so it means it'll marriage my model is doing a good job of it and there is some of if I Fred induces variability in data I see that R square goes up so this is
09:18
basically an effect of so a little bit of results so if I plot here a lens of
09:25
the fish on its axis and you can see that it's maybe around uh from 20 to 30 centimeters so imagine and my model predicts fees from 22 to 28 so basically it is what it says on every street here a correct value if you have experienced still smaller due to be the 1 that predicted correctly OK so it's 50 per cent of the R
09:49
square each what what's the reflectance graph OK and if you want to find out which variables play
09:55
a role in the and in my model this is what I wanted to find out and the this is the way it's performance and each variable is used 1 the most important 1 is used to speed to the more often is used to speedup the if account times it's used we can say OK so that means it's more important in this case I have a color
10:15
coding here so it is 1st as parent is basically moms
10:19
OK so we know that is something that in attributed I could see it's a 100 % of
10:25
the cases it has been used after that we have seasurface temperature uh which is
10:31
uh I'll show you next graph how it's affected but is basically some relationship and other things I have
10:38
food availability so that is a doubtful to see and then abundance of fish so how the how many topical large population etc. so most important message here is to remember is that tent is important 1 and after that we have the sea surface temperature and the and food OK so
10:56
if you thorough visualize the variables in them partial dependence plots so um the 1st throw he is the 1 they partial dependence plots basically where Paul each feature against air our might explain the data dependent variable said lands of the fish we can see that uh becomes really see a need particular relationship here it does this is a relationship but it shows a high degree of interaction is the way
11:23
how it's uh it's dependent child so we
11:27
don't really the company dependence here but we do become here so I highlighted here circles this 2 areas um uh it means that maybe if you if you can see about 14 degrees so if it's a surface temperature is below 40 degrees there is a positive relationship Sophie's gets larger so likes the temperature up of wood and it
11:50
is in this case if it gets too warm feature is is a
11:54
negative relationship so it does it is it that it's definitely
11:57
shows some kind of dependence between length of the fish and the temperature well I don't want to talk about climate change here because it's very debatable issue but you can imagine if temperature you know a global warming if temperature without that may have an effect on the future and their on us eventually because we can't consume future like OK so
12:17
this is an interesting message and the final
12:20
layer here is this is 1 of the food sources say in this particular case phytoplankton is what the sheets if you focus on this area uh while worldwide focus here not focus here because uh my most of my data is concentrated over here is you see because this little Dixon deciles so it's where is
12:39
concentrated on making the goal up to here just because I
12:44
have some lie but I don't care because I know my model is reversed so just idle upper part so if I look at this part I don't see any dependence I think it's just
12:54
because in this case uh it's not a limiting factor obviously they have less of food it effect but in case of Celtic see there is a lot of phytoplankton so if he doesn't is not dependent on the OK and then the 2nd here we have a 2 way interaction plots is plot each feature against each
13:12
other justified to see if I can pick up any interaction between those of
13:17
OK so we can see here is basically the same story pieces of temperature about putting degrees here you see that something is happening so uh is uh what it
13:25
says is this analysis tells me well I know that is that I broadened features but I can't really say why is it so buys of base effects at Trent is important it tells me that I might need to going use maybe time modeling to find out is the way it's the pencil icon answers questions Machine learning collect can do is to be copters features out of the badge of other features on the big datasets and it's as far as it goes so there are limitations to how you can apply it and so conclude the
13:56
season there are 3 important features which I just stand time tends to
14:01
surface temperature and food availability something is going on this temperature which is clearly about 14 degrees and uh is there is a high degree of interaction between these features and the members that this this method we can't and find the causeeffect relationship but we have a relative importance of the variables so from a bunch of variables I picked up the ones which are more important and they can take away I think it is me for the next type of analysis
14:26
and OK so this is the 1st part of my talk and not
14:30
on the show how much time I have I would like to show you um a little bit of how it has been implemented and some of them have 5
14:39
minutes so it's basically the 1st part of this is so what I've shown you in the my presentation it's a 3 way splits what my data set
14:49
so I'll go a bit confused large enough OK so
14:54
you know and I'm sure it's all familiar to you it's a virtual libraries and the BD producible because work in science they need to succeed because I want to run it again against him results appeared in the data and ICT about about 50 cells and throws and about 15 features in my case um I haven't discussed this but I do check multicollinearlity which means it has 2 features which are related their dependent um full of normal
15:21
integration these like when I have 1 the only may area to it not made real for sure blow your model and you can't allow that to a new model for uh um assemble measures for this particular algorithm it doesn't matter but if you can detect analytically and is that they call variables which are you know
15:39
which and yeah so is basically how I did here I construct the metrics of this and more for the moment relation that is just what a cold and idea is that it here and I can find out which variables so the higher uh in multicollinearlity is a more intensive color is basically uh there is no rule but having a buffer at 2 per cent of 0 . 8 may indicate multicollinearlity so I see here is this is the eyes are red or not Our so is basically those variables I just a part of my model of and this 1 as well OK so I do more terms and I do the create so 50 per cent 25 25 for each part of the model and and
16:24
I think my model OK so as this is the final parameters but it took me if you uh if you nations for sure to be satisfied I have and then how I found out how many is to me this I need here because it is the usual rule
16:40
is to set the learning rate as low as possible and to get a number for estimators of number theory as high as possible and if you do
16:47
that you model around forever but you should end up to something feasible and and you can start playing around by reducing the fate of how found others
16:56
500 is used
16:58
apply all that means which is called early stopping in that set available it comes in little bit
17:06
later on OK so so much
17:11
more OK so yes and the trade and budget OK I just touch a button that just pushed on so OK and the same graph again you've seen it before and again I think
17:38
what's and what's interesting the show quickly and other part of so this is a stopping which I mentioned earlier the was part of many do it
17:46
the way split because to pretty something which is down to my opinion morphogens here
17:53
uh into is pretty only have trained and tested all have validation set of the air and to identify
18:00
parameters for this part I used a grid search so specified the range of parameters you can specify for all but you like but only to regularization parameters because those almost a year difficult ones so I specify here not that's which I know I had 6 so I do want to derive want to the
18:17
left and I know from duties it should be higher than 8 so I don't go there OK and that learning in a I have 0 . 5 now 0 . 0 5 and I want to be inclusive details and see how
18:28
it works so what happens is that we get where it begins you to get a um confusion methods and look at the but they have a metrics of different combination of parameters and each time fit the model around on the run and eventually the 1 which gives the highest accuracies chosen and it tells me which parameters such as the state so you
18:46
can see the output is best hyperparameters it says that learning should be about 10 % of 0 . 1 instead of 0 . 0 5 and that's that can be a bit more shallow but it's very
18:57
close and close if I feel those parameters and all other parameters keep uh you
19:04
to say mn I eat what I
19:07
get is very similar results again here you can see
19:10
it's about 50 per cent or 50 per cent for the training and test data which is good OK so again we have to see the same graph which is good it means
19:19
I have the same algorithm but I applied to different types of data partitioning 1 time I did 3 ways create Rasera stopping to find a number of iterations of and otherwise splits into into parts and they used the research to find the best parameters I change parameters and steal my model does uh I think I have to finish my model does give similar
19:43
results which is good at this moment all that miserable OK I think I'll stop here because it's not doing well so think much for attention if if you have questions
20:01
yes because of the did you compare
20:09
the results with our new random forest
20:11
trees I II
20:13
III around some results is just a normal and for a standard mean there is slightly higher but I didn't tell you because I I was a bit stressed this all this thing but I didn't normally square regression and again it does shows that the model does it is less and so yes it is OK and do you
20:35
have the data on the local global yes it's some ideas that you can see it you
20:40
put my ears on them right so
20:42
it's a this thing to person number of
20:51
so when you said there is a link between the temperature and the fish just does that help you then you can overcome the more research is that it's not kind of the able this yes and basically this was a
21:03
kind of this idea very interested to find out of that I have it in terms of this is a time series data for 60 years this is very unique and science is similar to the term collection so he wanted to find out which variables are more important so that we can use our dataset to more filament once and then I could take into some kind of automatic timeseries analysis it there's no 1
21:27
object and you have some sort of 4 quality of research efforts and those you that was blamed machines
00:00
Gruppe <Mathematik>
LuenbergerBeobachter
Vorlesung/Konferenz
Algorithmische Lerntheorie
Gradient
Standardabweichung
00:28
Virtuelle Maschine
ARTNetz
Biprodukt
ARTNetz
Ordnungsreduktion
Computeranimation
00:54
Gerichteter Graph
Funktion <Mathematik>
Flächentheorie
Flächentheorie
Kommunalität
EinAusgabe
ARTNetz
Vorlesung/Konferenz
Gasdruck
01:11
Druckverlauf
01:51
Beobachtungsstudie
Total <Mathematik>
Stichprobenumfang
Gerichteter Graph
02:21
Objekt <Kategorie>
Lineare Regression
Algorithmus
Flächeninhalt
Gruppe <Mathematik>
Gradient
Variable
Teilbarkeit
Computeranimation
Residuum
Gradient
Integral
02:41
Lineare Regression
Folge <Mathematik>
Wald <Graphentheorie>
Stochastische Abhängigkeit
Gradient
EinAusgabe
Variable
Computeranimation
Integral
Residuum
Netzwerktopologie
Informationsmodellierung
Mereologie
Varianz
Analysis
03:19
Inklusion <Mathematik>
Lineare Abbildung
Algorithmus
Funktion <Mathematik>
Ausreißer <Statistik>
Trennschärfe <Statistik>
Entscheidungsmodell
Interaktives Fernsehen
Symboltabelle
Computeranimation
Nichtlineares System
Gradient
03:49
Inklusion <Mathematik>
Lineare Abbildung
Zentrische Streckung
Subtraktion
Variable
Funktion <Mathematik>
Ausreißer <Statistik>
Computeranimation
Instantiierung
04:10
Lineares Funktional
Subtraktion
Einfügungsdämpfung
Punkt
Quadratzahl
Lineare Regression
Instantiierung
04:30
Inklusion <Mathematik>
Umwandlungsenthalpie
Lineare Abbildung
Bit
Folge <Mathematik>
Formale Grammatik
Implementierung
Physikalische Theorie
Computeranimation
Integral
Gradient
Netzwerktopologie
Informationsmodellierung
Gewicht <Mathematik>
Funktion <Mathematik>
Ausreißer <Statistik>
Stichprobenumfang
Mereologie
Kurvenanpassung
05:11
Addition
Informationsmodellierung
Gewicht <Mathematik>
Geometrische Frustration
Mereologie
Datenmodell
Systemaufruf
Gleichungssystem
Abstand
Bitrate
Gradient
Grenzwertberechnung
05:39
Soundverarbeitung
Lineares Funktional
Parametersystem
Einfügungsdämpfung
Subtraktion
Datenmodell
Iteration
Zahlenbereich
Bitrate
Stichprobenumfang
Teilbarkeit
Computeranimation
Netzwerktopologie
Negative Zahl
Informationsmodellierung
Iteration
Quadratzahl
Minimalgrad
Gewicht <Mathematik>
Funktion <Mathematik>
Regulärer Graph
Mereologie
Gradientenverfahren
Quadratzahl
Strom <Mathematik>
Bitrate
Auswahlaxiom
06:35
Parametersystem
Subtraktion
Hochdruck
Bitrate
Stichprobenumfang
Physikalische Theorie
Computeranimation
Netzwerktopologie
Informationsmodellierung
Iteration
Funktion <Mathematik>
Endogene Variable
Simulation
Quadratzahl
Bitrate
07:03
Interaktives Fernsehen
Unrundheit
Stichprobenumfang
Computeranimation
Stochastischer Prozess
Netzwerktopologie
Informationsmodellierung
Iteration
Funktion <Mathematik>
Zufallsvariable
Stichprobenumfang
Quadratzahl
Bitrate
07:27
Resultante
Lineares Funktional
Einfügungsdämpfung
Informationsmodellierung
Quadratzahl
Lineare Regression
Auswahlaxiom
Varianz
Gradient
07:46
Lineares Funktional
Addition
Einfügungsdämpfung
Wellenpaket
Streuungsmaß
Stichprobenumfang
Computeranimation
Netzwerktopologie
Informationsmodellierung
Softwaretest
Iteration
Funktion <Mathematik>
Wellenpaket
Mereologie
Indexberechnung
Quadratzahl
Bitrate
Fehlermeldung
08:33
Bit
Wellenpaket
Mathematisierung
Interaktives Fernsehen
Iteration
Streuungsmaß
Computeranimation
Informationsmodellierung
Variable
Iteration
Softwaretest
Menge
Wellenpaket
Prozess <Informatik>
Bestimmtheitsmaß
Varianz
09:18
Soundverarbeitung
Resultante
Informationsmodellierung
Bit
Kartesische Koordinaten
Computeranimation
09:48
Informationsmodellierung
Variable
Spiegelung <Mathematik>
Graph
Kantenfärbung
10:14
Vererbungshierarchie
10:31
Graph
MessagePassing
Computeranimation
10:56
Variable
Minimalgrad
Interaktives Fernsehen
Exogene Variable
Plot <Graphische Darstellung>
Partielle Differentiation
11:26
Minimalgrad
Kreisfläche
Ortsoperator
Flächeninhalt
Computeranimation
11:53
Soundverarbeitung
Dicke
Computeranimation
12:16
Flächeninhalt
Fokalpunkt
MessagePassing
12:39
Soundverarbeitung
Informationsmodellierung
Mereologie
Interaktives Fernsehen
Plot <Graphische Darstellung>
Teilbarkeit
LieGruppe
Computeranimation
13:11
Soundverarbeitung
Informationsmodellierung
Minimalgrad
Flächentheorie
Inverser Limes
Interaktives Fernsehen
Algorithmische Lerntheorie
Bildschirmsymbol
Computeranimation
Analysis
13:55
Variable
Minimalgrad
Flächentheorie
Datentyp
Minimalgrad
Interaktives Fernsehen
Variable
Computeranimation
Invariante
Analysis
Eins
14:26
Bit
Fehlermeldung
Benutzerfreundlichkeit
Kombinatorische Gruppentheorie
Extrempunkt
Computeranimation
Menge
Einheit <Mathematik>
Code
EinAusgabe
Mereologie
Programmiergerät
Punkt
Druckertreiber
14:47
Resultante
Korrelationsfunktion
Fehlermeldung
Diagonale <Geometrie>
Elektronische Bibliothek
Gruppe <Mathematik>
Default
Matrizenrechnung
Zellularer Automat
Systemplattform
Indexberechnung
EMail
Computeranimation
Bildschirmmaske
Lemma <Logik>
Umkehrung <Mathematik>
Koeffizient
Bit
Standardabweichung
Korrelation
Korrelationskoeffizient
Dreieck
Plot <Graphische Darstellung>
Normalvektor
Versionsverwaltung
Orientierung <Mathematik>
Gammafunktion
Shape <Informatik>
15:21
Korrelationsfunktion
Diagonale <Geometrie>
Momentenproblem
Matrizenrechnung
Term
Computeranimation
Puffer <Netzplantechnik>
Informationsmodellierung
Variable
Algorithmus
Koeffizient
Biprodukt
Wahrscheinlichkeitsmaß
Orientierung <Mathematik>
Linienelement
Relativitätstheorie
Schlussregel
Programmierumgebung
Umsetzung <Informatik>
Variable
Integral
Vertikale
Rahmenproblem
Matrizenring
Flächeninhalt
Zellularer Automat
Mereologie
Korrelationskoeffizient
Dreieck
Kantenfärbung
16:21
Parametersystem
Informationsmodellierung
Vorlesung/Konferenz
Schlussregel
16:39
Schätzwert
Elementare Zahlentheorie
Gruppe <Mathematik>
Total <Mathematik>
Datenmodell
Speicherabzug
Stichprobe
Zahlenbereich
Sichtbarkeitsverfahren
Bitrate
Bitrate
Hochdruck
Zentraleinheit
Computeranimation
16:58
Bit
Graph
Gruppe <Mathematik>
Konvexe Hülle
Datenmodell
Stichprobe
Sichtbarkeitsverfahren
Stichprobenumfang
Hochdruck
Computeranimation
Menge
Total <Mathematik>
Speicherabzug
Vorlesung/Konferenz
Zentraleinheit
17:36
Fehlermeldung
Gruppe <Mathematik>
Partielle Differentiation
TexturMapping
Dateiformat
Stichprobenumfang
Computeranimation
Iteration
Mereologie
HeegaardZerlegung
ATM
Plot <Graphische Darstellung>
COTS
HillDifferentialgleichung
Bitrate
17:52
Schätzwert
Parametersystem
Fehlermeldung
Spannweite <Stochastik>
Viereck
Gruppe <Mathematik>
Regulärer Graph
Menge
Mereologie
Validität
Computeranimation
Eins
18:16
Parametersystem
Fehlermeldung
Subtraktion
Gruppe <Mathematik>
Linienelement
Parametersystem
Schaltnetz
Bitrate
Computeranimation
Aggregatzustand
18:45
Parametersystem
Fehlermeldung
Bit
Gruppe <Mathematik>
Parametersystem
Bitrate
Computeranimation
Modallogik
Funktion <Mathematik>
19:02
Resultante
Softwaretest
Schätzwert
Wellenpaket
Graph
JensenMaß
Synchronisierung
Stichprobenumfang
Dateiformat
Computeranimation
Iteration
Softwaretest
Modallogik
19:19
Resultante
Parametersystem
Informationsmodellierung
Subtraktion
Algorithmus
Momentenproblem
Datentyp
Mereologie
Iteration
Zahlenbereich
19:52
Resultante
Wald <Graphentheorie>
Code
Punkt
Computeranimation
20:11
Arithmetisches Mittel
Resultante
Netzwerktopologie
Informationsmodellierung
Bit
Lineare Regression
20:35
Code
Programmiergerät
Zahlenbereich
Punkt
Binder <Informatik>
Hilfesystem
Computeranimation
21:01
Objekt <Kategorie>
Virtuelle Maschine
Variable
Zeitreihenanalyse
Term
QuickSort
21:36
Computeranimation
Metadaten
Formale Metadaten
Titel  How can machine learning help to predict changes in size of Atlantic herring ? 
Serientitel  EuroPython 2016 
Teil  160 
Anzahl der Teile  169 
Autor 
Lyashevska, Olga

Lizenz 
CCNamensnennung  keine kommerzielle Nutzung  Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nichtkommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben 
DOI  10.5446/21202 
Herausgeber  EuroPython 
Erscheinungsjahr  2016 
Sprache  Englisch 
Inhaltliche Metadaten
Fachgebiet  Informatik 
Abstract  Olga Lyashevska  How can machine learning help to predict changes in size of Atlantic herring ? This talk is a casestudy of how Python (Pandas, NumPy, SciKitlearn) can be implemented to identify the influence of the potential drivers of a decline in size of Atlantic herring populations using Gradient Boosting Regression Trees.  A decline in size and weight of Atlantic herring in the Celtic Sea has been observed since the mid1980’s. The cause of the decline remains largely unexplained but is likely to be driven by the interactive effect of various endogenous and exogenous factors. The goal of this study is to interrogate a long timeseries of biological data obtained from commercial fisheries from 1959 to 2012. We use gradient boosting regression trees to identify important variables underlying changes in growth from various potential drivers, such as:  Atlantic multidecadal oscillation;  sea surface temperature;  salinity;  wind;  zooplankton abundance;  fishing pressure. This learning algorithm allows to quantify the influence of the potential drivers of change with the test error lower when compared to other supervised learning techniques. The predictor variables importance spectrum (feature importance) helps to identify the underlying patterns and potential tipping points while resolving the external mechanisms underlying observed changes in size and weight of herring. This analysis is a useful casestudy of how Python can be implemented in academia. The outputs of the analysis are of relevance to conservation efforts and sustainable fisheries management which promotes species resistance and resilience. 