Predicting Titanic Survivors with Machine Learning

Video in TIB AV-Portal: Predicting Titanic Survivors with Machine Learning

Formal Metadata

Predicting Titanic Survivors with Machine Learning
Title of Series
Part Number
Number of Parts
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
What's a better way to understand machine learning than a practical example? And who hasn't watched the 1997 classic with Jack and Rose? In this talk we will first take a look at some real historical data of the event. Then we will use amazing Python libraries to live code several of the most well known algorithms. This will help us understand some fundamental concepts of how machine learning works. When we're done, you should have a good mental framework to make sense of it in the modern world.
Machine learning Computer animation Pattern language Summierbarkeit Twitter
Blog Video game Quicksort
Web 2.0 Point (geometry) Computer animation Volumenvisualisierung Right angle Tracing (software)
Adventure game Email Distribution (mathematics) Inheritance (object-oriented programming) Information Mapping Plotter Counting Bit Number Machine learning Computer animation Library (computing) Social class
Slide rule Graph (mathematics) Computer animation Plotter Right angle Figurate number
Group action Graph (mathematics) Computer animation Information Structural load Order (biology) Graph (mathematics) Line (geometry) Number
Computer animation Cellular automaton Order (biology) Right angle Data structure Rectangle Row (database)
Web page Boss Corporation Digital photography Touchscreen Arm Computer animation Plotter Graph (mathematics) Bit Machine code Line (geometry) Scattering
Arithmetic mean Distribution (mathematics) Mathematics Computer animation Inheritance (object-oriented programming) Plotter Right angle Connected space Social class
Point (geometry) Graph (mathematics) Poisson-Klammer Electronic mailing list Number Estimator Population density Kernel (computing) Computer animation Personal digital assistant Order (biology) Object (grammar) Table (information) Social class Row (database)
Message passing Group action Arm Graph (mathematics) Computer animation Average 3 (number) Right angle Social class
Arm Gender Graph (mathematics) Machine code Line (geometry) Power (physics) Hand fan Process (computing) Cross-correlation Computer animation Personal digital assistant Radio-frequency identification Right angle Library (computing)
Arm Graph (mathematics) Computer animation Different (Kate Ryan album) Gender Graph (mathematics) Bit Right angle Graph coloring Metropolitan area network
Arithmetic mean Graph (mathematics) Computer animation Information Moment (mathematics) Shape (magazine) Machine code Protein Number
Distribution (mathematics) Computer animation Different (Kate Ryan album) Consistency Electronic mailing list Social class
Frequency Goodness of fit Graph (mathematics) Computer animation Different (Kate Ryan album) Right angle Machine code System call Social class Row (database)
Group action Spring (hydrology) Arm Computer animation Order (biology) 3 (number) Object (grammar) Metropolitan area network Social class Condition number
Computer animation Graph coloring Computer programming Social class Neuroinformatik
Predictability Distribution (mathematics) Functional (mathematics) Inheritance (object-oriented programming) Equaliser (mathematics) Real number Virtual machine Parameter (computer programming) System call Twitter Wave packet Mechanism design Arithmetic mean Roundness (object) Computer animation Heuristic Right angle Endliche Modelltheorie Resultant Social class Row (database) Condition number
Algorithm Arm Linear regression Multiplication sign Virtual machine Bit Number Computer animation Different (Kate Ryan album) Cuboid Heuristic Right angle Endliche Modelltheorie Musical ensemble Library (computing)
Point (geometry) Pixel Pattern recognition Graph (mathematics) Real number Dimensional analysis Twitter Number Medical imaging Numeral (linguistics) Machine learning Computer animation Video game Right angle Endliche Modelltheorie Quicksort
Algorithm Functional (mathematics) Constraint (mathematics) Information Linear regression 1 (number) Codebuch Line (geometry) Function (mathematics) Parameter (computer programming) Number Wave packet Computer animation Personal digital assistant Average output Circle Physical system Social class
Logarithm Linear regression State of matter Expert system Bit Function (mathematics) Line (geometry) Extreme programming Machine learning Computer animation Video game output Endliche Modelltheorie Figurate number
Algorithm Transformation (genetics) State of matter Cellular automaton Polygon Letterpress printing Bit Dimensional analysis Degree (graph theory) Revision control Preprocessor Computer animation Personal digital assistant Endliche Modelltheorie
Predictability Algorithm Arm Information Decision theory Multiplication sign Machine code Line (geometry) Limit (category theory) Vector potential Number Twitter Goodness of fit Computer animation Network topology Self-organization Circle Series (mathematics) Row (database) Identity management
Algorithm Computer animation Linear regression Network topology
Point (geometry) Algorithm Computer animation Personal digital assistant Order (biology) Pattern language Number
Algorithm Functional (mathematics) Multiplication sign Letterpress printing Bit Category of being Inference Latent heat Machine learning Process (computing) Computer animation Average Selectivity (electronic) Endliche Modelltheorie Cycle (graph theory) Physical system
Algorithm Randomization State of matter Real number Robot Branch (computer science) Machine code Element (mathematics) Heegaard splitting Message passing Word Computer animation Average Different (Kate Ryan album) Network topology Video game Energy level Self-organization Endliche Modelltheorie Pressure
Medical imaging Arm Graph (mathematics) Computer animation Network topology Chain Utility software Library (computing)
Algorithm Decision theory Graph (mathematics) Set (mathematics) Computer animation Personal digital assistant Hypermedia Network topology Order (biology) Authorization Energy level Pattern language Figurate number Social class
Computer animation Planning Rule of inference
this is a and you hear a so I at 1 of super have to be there sum of the using you know Ruby on Rails for ages and feels great this in real comes on so as collected yesterday you should follow in future I I don't know like to follow him so I think you can just follow me just as well on the so yes I'm actually pattern and I can speak Italian I can cook like Italian dishes I think that makes me I pretty much had an a certain Ruby in 2010 and in the last couple of years of been they mostly jobless create I got really interesting machine learning and I climb a lot so I'm not good climber adjust climb a lot so I think we're trying to organize like a climbing session tomorrow so then 1 is interested just like follow me on Twitter and then if you of article so they yes and right now I'm working in a company in London which still earn solutions and there were very kind send me here so I'm going
to you know tell the were about how often they are because the 1st company to actually believe in Erlang land as a key technology in 1999 which seems like you know a century ago because of his armor and of course the love elites here a lot all sort of like the colleges which slid run on the being so free up red and you know all this sort of stuff but OK let's was like you know my business blog I no
OK so my goal today is to mean the sunlight life so let me just 1st asking patterns on
something yes is crushed clear can you guys in the back sealed this we began the good we are marked good I'm just going to going and I no is OK less OK and as I go through this
but if you can't see it like you let me know but before I start I just want to show you my that's background it allays we nice because there's this like others like Photoshop like on the in the front like is just like such a high render quality and then you have the use guys in the back there even like looking each other like was the point I don't know a a lot about half of whom I hope like you're familiar with the movies like if you're not familiar with the movie raise your hand right we did basically is i well the getting it off now but anyways so we want to do today 1st of all there's like this amazing web and which I showed you before which is this guy a few the flute but there's also traces the file and this is actually
real historical data of the Titanic passengers so there's 892 lines of minus 1 for the header so is like 8 191 passengers and for each 1 of these passengers would have all this information so the passenger of what is survived the peasant class their name the sacks G H the number of siblings espouse is the number of parents and children of which they can be bought at the fair so how much it paid for at the cabin and where where they embarked on the ship so we're going to do today is to well I have to admit admit like mainly the main goal is just to make sure the moving in light was so those like my main my scientific goal when I started on his adventure all mean solar were going to these to use some Python libraries so I think the guys boat performing neurally explain that why Python is the way to go in the machine learning and I think it's right right so there does like some really amazing libraries in the hopes that I can show you just like how good they are and so now the start I'm going to create a file which is called
Joyce the pipe and we're going to use this leverage school panelists which is an amazing name because everyone loves tennis and is like 1 of the best libraries of ever seen for does like dealing with you know since the files for example and also remain to import maps plot live pyplot which is of dieselization in a library and this this consequent planets of data friend so whenever you're loading your sister you'll just being converted into up on this data friends under the closed like the aft and they use spend NASA just to read since the it has the file and he will automatically it of so for example if you wanted to see like a little bit more about you know like the distribution the survival rates which is you print that that DS survived easily automatically created by the library and then just do value counts again f feel free to interrupt me with this like anything like unclear when so if just run this you'll
see OK of all these slides we had and we only have like 342 survivors and rest unfortunately did not survive the but you be
nice if you could actually see this right so and to do that we just plot was say like what kind of of a new node graph we want and this is 1 of our and then discuss of past and then in the end of and actually should create the graph 1st um some just gonna go figure and this skill T . R she endure and here just pass a fixed size the 10 thank 6 I think that and here I'm just going to appeal to the show so hopefully this is gonna work and and is it so
it up 2 lines would basically transform like this information to this graph and of course as humans we con reason to well with these numbers so I want to see the percentages and
in order to do that I just have to right here and normalized true yeah and if they do that and run python
you'll see this is what happens so we considered
action our dataset there's like a 40 per cent people which survived and 60 per cent unfortunate not again of but what's better than the single graph is to have loads of graphs and and in
order to do that we're going to use these subplots the great which basically creates agreement off subplots does that make sense at another on BOT and I I don't think this is just like the set the title for this thing and we're going say is survived and is it like this structure does mean is like this is a rectangle and 2 rows and 3 columns and this is the 1st cell right so if I run this on you'll see that the basic just like
puts it like up there on I really
hate the fact that like photographs like are not full screen so I have like some really nice
code 2 lines of code which would be that so just on a page from an again some
article so now we can keep making more graphs and just like the good old like copy paste so what we want to do now is to take a look at we want to see if there is a relationship between on the survival rate and age so there is this this tool which is called scatter plot which will show you this and the only thing we need to do more than just like remove of these 2 lines so we take a look at the survival rate and compare to the h and I'm going to pass a little bit of past adheres as well just because otherwise the boss would just get like to come together and hero save easiest arm the H with regards to the survival rate OK and I print this
will see that actually this is quite unexpectedly like for me when I started in this like there's no like a parent like H connection between the survival rate so that you can say the main plot like lumps of people are between 20 and 40 both on the left hand side and the right hand side as I said before 1 means and survived and 0 means past and if we take a look like in more closely considered like older people like have you know passes mark and there's like younger people here which might have survived but other than that of single distribution I still like this and allowed us to know like any conclusions the OK so something else we might want to take a look
at is just like the distribution of the passage classes so here I'm just continue this change this to you know to to and here is that of survive just into a passenger class and saying here and if around this will see that
most of the passengers were found in the 3rd costs and have it in the 1st class in keeping the 2nd books and I think this is quite like what we expected because you know there are more people like in and which I couldn't for like the more expensive tickets from something we might want to look at is the relationship between the age of the passengers and the class like this the way that there were able to buy and in order to do
that and I'm going to do something which will probably be making very very afraid if you work with HTML some point like tables colspan and 1 could call strand rocks anyways so we're going to use this feature of Python which is a list comprehension so just gonna say it's basically uh 1 2 3 but each right and for each 1 of those were going to want to display age when in the passenger class is equal to certain number so where is in the square brackets which endangers just like to filter and like you want to extract these H but only for the rows which passenger classes acts like a and for each 1 of those I'm going to create a new graph in this case is going to be called a kernel density estimation and you can look it up on Wikipedia afterwards with you actually means but looks pretty so that's why about it on and here title is going to be H with regard to the object class and I'm just going to add a little legend to the graph so that you can actually tell what is going on 2nd 3rd take and
if around this you see this is really nice graph right I'm agrees me among these agrees and most importantly place so we considered for the 3rd class passengers of the direct way younger than the average um ages thank years old and for the 2nd pass the 2nd graph passengers that get around 30 years old and the 1st class passengers around 40 so as they get older they get richer and therefore they can you know by by more expensive tickets I since it was really interesting to understand more of the movie the arm of ashes you out there is like you know they could glaring my historical like
the mistake but which is that I discovered that the boat the the ship action into stops so it's starting Southampton bought it actually made the other 2 stocks and we can take a look at the embarked columns and I'll just put in 1 2 I think that's OK inch but and I run
this you see out that 70 per cent of our dataset and Bachman Southampton in England but then the ship made a pit stop found in France in should vote and then it actually than other solved in Ireland in Queens town yeah so yeah if right Python fans this like that in the case of tritium on node like at heart especially I think it's quite cold I'm but I think it's quite nice you know like when you have this like 15 lines of Python code and can actually drill down the dataset and get no try to put some of this values like in correlation and try to just like see what's going on and you know instead of building the process 5 and taking you you know at least a week or something here using like these libraries just like takes the 5 minutes then of the I think it's pretty powerful fountain something which is missing from these graphs is the gender of the passenger and the reason why that is missing is because I think it's quite important so I think it
deserves arms a dashboard and so on so I'm going to different the next face
gender applied a couple of things that is this arm and actually I don't really need most of this
graph from going to comment this stuff and just leave this again so I would want to do now is just like to take a look at the difference of survival rates between men and women right but I want to make more acrostics more graphs or more also said and I'm just going to do this quickly and here I want to show this like survivor right but only when the sax is a male OK and here the right man survived and then all do the same thing here and I'll just say on sex is female and here will be the women survive but to make things a little bit nicer just like create the colleges for female I and I just make it you know something just like different and color female color and and I think this looks good the and we can see here
on in like the graph like have like pretty much the same shape so we can see like in total there's like 40 % people survived the men they have like a attentive 2 per cent survival rates and the women who looks quite similar unless you look at the numbers like underneath protein countries see ball of this room this is 1 not right so actually 70 per cent of the women survived at at least like in these current dataset but by itself this information doesn't mean that much because you know if you think there is a room with a hundred men and 1 women 1 moment sorry I like this information by itself is not really significant so maybe we can just like take a look if that's the case or
not and to do that and we just like take the same code we used here and just say and just let show me the sex again and consider so I
think emittance fake somewhere and I should have
updated listing 3 but it and we can actually
see that OK there's like more but the difference is not that big like a 40 diverse 35 per cent 65 per cent setting the dataset is quite balanced in that way but it's not you know and inconsistency of the dataset so there must be something else right following before we looked at the passenger distribution correlated that the H and we can do the same thing and correlating the passenger class the survival rate so I'm just going to
uncomment this code we had before and it's here instead of the agent is going to take a look at the survival rate and I think the rest can just remain the same apart from this and I'm just going to make the call spun larger the every articles from just like funding but
and here we have the graph frequency on the left of the passages in the 3rd class they're like we know such a less spot and that instead the passengers in the 1st class they have a good survivor right so I think this is quite cold because we can check on the 1st row this is like difference between agenda and here there's a difference between the passengers cost so maybe what we can do is to try to see all maybe if you try to combine the 1st 2 rows and there's like some like striking and no feature of the data but
so in order to do that I'm just pointing public this and the don't worry about the code like up the online I think it's already 1 action and so what I'm going to the is I want to take a look at the survival rate of all the men but I also want to add another condition so I'm just going to use this and and say that the key class is equal to the 1 and here just say our 1st class men survived a and then I'm going to do the same thing but I'm just going to change this to 1 4 months and this spring and this is going to be 3rd class man survived yeah
on and if I run this you see that is pretty much confirms are like suspicions before object the 1st class men have a 35 per cent survival rate and the 3rd class and has a FIL over 10 % so if the class man like cannot think right arm instead if you want to do the
same thing for women I'm just going to copy this there's a lot of computation this talk is insane Paul 1 could say the same thing about programming in general and so on and so there's a color which is a female color and here I'm just going to say no women of yeah and the last 1 is going to be um female in lecture class and so the cost like a and I
think this is quite striking because you can see that in the 1st class women I had to check the dataset was like when the 1st round the 2nd we believe it but like I think like this 70 a fly 1st class women and 7 2 7 of them survived in the data and the center of the classroom and other distributions like more even like 50 50 but I think like especially to the my initial in like scientific goal of this exploration I can say that the movie's actually confirms so is some historical accuracy of the moving because these 3rd class now be Jack and this 1st class the rows so it's not really surprise that Jack no perished and Rose survived no not like no a fictionally at no at the fictional you know mechanism is like actually how things went so we can take a look at the picture and OK that but is it is like it's super nice to be able to this is like data license interplay visualize this things like in the the room like this way up but would be even cooler is to try to ransom predictions right we all want the wizards 1 day on so what we're going to do is just to create like the most basic heuristic that we can do which is just like to predict if you're a woman doing to survive in Germanic and I I'm sorry for all the male audience like in the room own so we're just going to
create a new file I actually made them just come across clinic predict hi and import import dependence has been the and here I'm just like right like call this data from training because the what's machine learning like the training set like the initial data that used to build a model right so here we go going to the with 68 armed and trend of sustained and what I want to do is to create a new column so I'm going to do that just by doing this so in this way it's going to create a new column called for about this but it's like a quality of this thank you the mean that had an unweighted initialize the whole column 2 0 except that I want like that when some condition applies a 1 that come to be 1 somebody's dysfunction xical lock and this function takes like in old is a real function but anyways this thing takes 2 arguments the 1st 1 is the condition and the 2nd 1 is the columns you want update so the condition is yet sacks equal equal female and the column I wanted data support and when said that 1 it and so in this way basically we created our were 1st guess of like how to predict the outcome that we want to predict but now we want to track like how accurate is our prediction so I'm going to do like something quite similar I'm just going to create a new conclave result and inside the result I'm going to track the survival rate against our hypothesis in and here I'm going to set the results but and this is now that we can do the same thing we did before we just like extract the result back we run that comes on it and each college friend of like this from but no Jefferson defined I think it's because a call a training I train sorry I can assure you right and we can see that this thing like was
correct 701 times a was wrong 190 times which in
percentages means something new which I can remember for I think it's wrong with 70 % 78 per cent when it so if you consider that if you're only guessing you have an accuracy of 50 per cent you just improve your heuristics by 28 per cent which is like a very simple guess OK and you got to that gas because you took a look at the data to try to understand a little bit of how the data works and you know of course if you use like more advanced algorithms will be able to prove that number on but things quite cold just like to see the huge difference of when you have like basic understanding of the data and when you don't on so I think this is cool box something which is cool there is to let the machine do the whole like the whole thing right arm luckily there is like it's an ensemble of libraries which is super well known just cough I could learn and insights I could learn those basically so many machine learning models so we're going to do is to use a model which is called a linear model somebody to be a linear adopt pi I just want to quickly
explain you like how the model works right
so if you see like a graph like this on and like the old is our datasets on like the data points we assume ends with gas OK the date is exhibiting sort of trend right and I think is quite simple like you know you asked me to my cousin has like 5 years old who prepared it on but the problem is that in real life datasets have way more features into so for example if you're doing image recognition every different pixel of the image is a different feature so if you're analyzing images which are 50 by 50 pixels you're facing a problem which has 2 thousand 500 dimensions so even going to ask you to solve the same problem in 2 thousand 500 dimensions I don't really know who is able to do that but the advantage of having a numerical approach which does the same thing is that apart from performance of course the the numerical problems and care and the mocking the numerical problem just like so the problem does as well the anything that's why like most people think that machine learning is quite you know on scary thing because it's tries like T condensed is sort of like human knowledge into some number somewhere and there's like not a really good explanation for us like what that means but no I don't really want to scare you with my this
topic like teachers tells so I'm just going to import again as a pity on train that could be and reaches the train the system and I'm just going to show you what I sold and like codebook actually lied and there's some and then there's some like helper functions I wrote before because they really want the war you these details but basically of course 1 of the like things you always have to do with data for clean up so for example in this case some columns play some roles some have the Fair Information some rows and have h information from dislike filling them back up with the average value and then another things that most of these numerical approaches work really well with numbers but did conf were constraints some just converting the idea of Kazakhstan number so if you're male you're going to be 0 and if you're familiar and the ones the same thing applies for the marks information had so here what I'm interested importance it kills and and just do and you till start clean data are and how most of these algorithms works work is that you basically tell the argument this is these are old inputs and there is 1 output and that input circle features so which to do is to extract these features so that they want to use the pasta class B age need some sex and let's the the fear OK and went to extract those values and then I want target which is the survive information and and I'm going to strike that as well and sees this algorithm is going to try to decide if these role is going to go into the survived bucket or the deceased pocket it's usually call the classifier and here I'm going to use a secular so from sigh could learn from into import this linear model and this linear model has this little thing which is called logistic regression at which is basically the same thing there which are described for just like trying to figure out like which is like a good line to seperate between datasets on and is super simple you take this classifier is say OK fate of these features against these targets and then print me in this course uh off of these features again this target that the and if a wrong
that is like 79 % and as you
can see like I did held logarithm anything at all it just like a figure out the OK these are the inputs this is the output that this guy asked me to figure out I'm just going to know who my best and the model that he constructed was like a little bit better than our naive intuition Of course the linear model isn't always like the right answer because we know in real life their problems
which are nonlinear so if I asked you to describe these data you would say states right I mean unless like you have a thing for extreme lines of maybe you do something like this which is not too bad I'm not judging but I'll give it usually you go for some like this so likely like mostly Machine Learning experts they recognize is a hatred for straight lines as well and they created this
model just a preprocessing where you can basically you can manipulate your features and transform them into the sort of like a polynomial transformation so busy can make them quadratic if you want to and you would just like data data can in combine them and multiplying them together and create a new columns for you the so I'm going to do is just the great 1 of these like transformers and for the nominal features and you have to pass like the degree so in this case dimension just to try out the quadratic and now we can transform our existing features using his feet Prof features am so basically we had these features in this way we transform them into the sort of like a protracted versions and now we do the exact same thing which is like cell OK feet this poly features against the target and then print made the score of these other classifier for the futurist target it was like it gives
state already like improved a little bit of usually you can see that it's because they I think the mental model is now the algorithm is trying to match against the data which behaves in a crevasse when I
usually if you add like more information like the algorithm will behave better so for example I can add the number of potential trend in the number of working spouses and I think that should be enough and try to run again and you can see that even more information to the algorithm the algorithm is able to like you know make better predictions again on and all this in like
20 lines of code and it's pretty cool arm coverage time have left 6 minutes identical so I just want to show you 1 more thing that which is a or circle decision trees so basically that these organisms which which are going to build this decision trees so for each row in our society is going to run through a series of questions and then depend on the outcome of these questions were going to classify are you wrote so our limit justice and good
things I predict a tree pi and no I don't think my taste worked but let me try again yes
the special and instead of linear model going to import tree are I think we don't need this so I'm just going to change this to this situation on trade classifier all the rest remains unchanged and in this way the algorithm is going to build a tree and try to match
that and if you're a normal person you see these numbers like wow this is so cool but if yourself to develop no there's something wrong because you like we all know there is no such thing as a free lunch that might make more sense like way too good the copyright and so basically what the
order them did in this case
it's like a phenomena in some Machine learning school overfitting so basically were patterns in data and the algorithm is trying to find a very complex solution which matches against all the data points and that's why I like afterwards when we try to ask and all of our things going out of his great no I don't wrote a book with would you see the solution like know I think you might be a using of luckily later the
scientists recognize this problem as well and this like quite a big problem in machine learning problem of overfitting so we have to figure out ways to make the machine learning learn like the generalize properties of the system not the specific properties so it's like is this little thing for model selection and if you use this model selection thing you can basically said on OK taking this model and try to hide some data from this algorithm so that the uh given builds like a model which marginal generic and I'm going to use this function which is called crossfall score and you can like it and explore later work that actually means the but I'm just going to pass the features the target you have to pass like us scoring methodology which unjust and has secure see this is not good I don't do it but I think it's just like simple enough I am going to say OK we run this process 50 times so this is like this thing's going to subsample randomly the dataset heightened the from from from the algorithm run it again and try to see how it behaves right so if I print this past printing course the goal of this nuthouse Python works from inference let there be nice but I
NEC like if I try to hide it from the other than the actual average of this course is quite a bit more and this cycle to expect because we're trying to build a solution like to specific so you can't really work
on the good news is that there is a way like to fix this and it's basically
to tell the algorithm are well don't be too smart so what I'm going to do is to 1st full pass random state a to the organism and then basically specified a must step so if you imagine this is the tree the tree 1 will never go more than 6 levels and then I'm going to pass like this or the little thing which is called simple split which basically controls how many elements have to get into a branch before the tree decides split to a sibling itself super-important but I think it's nice and if it just like we run the same code again you'll see the difference you see that like the initial performance like wasn't as great bots the general average is much pressure I think so this in this way we create a model which is like more resilient because like we were cheeking before a word to the castle the data but in real life you train the model and then scoring on no new data that has to work well for like this is never seen before
but I still think this is like quite we urge so luckily there is a really nice arms know it this tree and there's like a really nice utility inside the tree in library which to export to a graph is file so special images of which I have to do that only the Dutch so on it just as the tree you pass the feature names which I don't have so I'm going to have to add that and you like the outfile chain of adult file I can send is going to call this the 2 of future names feature names recalls thing it just talked of this should work on article
certified a was that if
I list is the file just called tree off the case is just like a little weird but I can convert it to a PNG file and should be got into a tree that energy and if I open this fall there you'll see that this a really
nice graphs and this is the actual tree that the algorithm built this is just like visualize and and these are all the decisions that the tree dollars in order to classify the road is it is You can see like the the top level decision are most important decisions right and it's not a coincidence that the 1st efficient algorithm makes is is the passenger men women and then the 1 with the author that is 0 is the child like is like it's the age of his passenger less than 6 . 5 and the other media decision is was the what is the pattern like the peasant class OK so basically all algorithms and figure out a lot of things which we figure out like our own on but as a set for this thing can do it like on you know 4 thousand mentions so you know that will
be and there's a plane
at rules under room I