Merken

Everyone can do Data Science in Python

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
and the yes when facing on the much time from the effects of quite close analysis where you from the around you right hello everyone and welcome to our last session before ones please join me welcoming atoms stating that you're talking about our room everyone can do datasets and
hi everyone things for being here for mining Smith allow and I'm going to talk a bit about how to do the same thing by phone on what date the sensors for me so quick overview summary of what we're
going to do I'm going to talk about why and what so why I'm here actually talking about the use of a bit of of an overview of about what they the sense means for me and what is there that's a flavor of dates and then going to be talking about 1 and then
we would do a quick overview of all the latest science site in with with some examples from by from the decision cleaning processing and also using the data to to
predict some stuff so that's me with the bit less of fossils here come and why am I not the so ware
developer training I study physics actually so more far from there I came from their mom but around the point of view are either some research in systems biology complex systems always very interested in how things work between each other and things like that and that drive my attention too big and small data not so it's a long time ago and I started calling by phone around 3 years ago all you need to have the mind of God my only previews coding experience was doing for trying 77 university not kidding and was not so long Google probably there still teaching 4 thousand 77 in physics I'm sure I'm just 77 390 and I've become obviously had in love with by from very easy I and II because so engage in the start-up war of the ML science and and those kind of things I'm also huge advocate of
pragmatism and simplicity of and you will see that in everything that I'm talking about today that's why this talk is also pretty much of a beginner spoken to the science because I believe that always very easily because you can go lot actually do Council for everything that's for sure there is still of problems and things that we will need a very clever people to work on there for a lot of time but most of the stuff actually can be solved quite quickly but by by most a fast right now and
contrary to to send the time because of pragmatism and I don't for the very 1st time all these lights in in Python notebook because what I thought you know it's a Python conferences will give it a go on the all my slides using Python it makes sense to me forever but I'm actually so it was not very pragmatic but I'm actually quite probably the result even if it doesn't look as proof that the you know I will try to use powerpoint or whatever I am also 1 more thing I most of the demand stand between you wonderful from line
so I we try to be a bit faster than under the submit fun because and looking forward the full of after all interaction area today about it I also work at the word they all this is relevant because of some of the stuff that I will be talking about and also because of the vision of the data that a half and again in the sense that I do on what is important they all improve the is up platform that half through 2 different things because someone answered of tools free tools for people to use on the data from the web so to the web scraping without having to call it is you on you can interact with see if we're not really a lot of things come of model weights and get
data from the web even crawlers of things that on its thousand other fans and enterprise platform for for just getting data so we use tool on other things and we just very very big datasets that that we so
I've been working and prepare for a couple of years study this and this and more recently as the 5th of the the braces and so it's OK being made to be the the services that would those datasets together on and deliver those 2 to customers I now this is going to that the topic what what we do when we talk about data science that there's a lot about there's a lot of hype around at the science which obviously came with good things and bad things but when you have 5 there is some good things about it there's a lot of Jobs parentheses is defined as the percentage of all you can get very well placed to the to do it but that's so there is some buffed annotations so usually all of rows are you define so you can find on this day you can find with the same that things that are really really different come on this but basin sometimes can be actually quite not fair to to do what
is and to define what mean with the science of and going actually to just talk about in the spoke about what is did this cycle of the sense for me as as it could be the cycle of all of development and we just see it on the go what I mean made the science I'm going to start that interaction in around these these these nice picture this called here's your name which of the 2 from Wikipedia broadened out and I'm not even sure the context of this image was like talking about movies or books or whatever but it's very nice it's a very nice metaphor for I think for most guys development cycles on and very very with 1 for for the the science and thing that is called the call to events were in in the diagram is what I call the prime to solve for the business questions everything is to start we that all pieces of work that we do indeed sends to start with a business question with a broad energy needs to solve all otherwise you're just doing things for the sake of the army and I would be coming back to the same protein 2 or 3 times over there over the representation because kind kind of obsessed me because I see a lot of times they'll proceeds as so yeah here's where where myself their pragmatic issues common that's always a starting point and then that stressful between the known and unknown is when we start actually collecting data on clean data to try to so the problem all those questions we need then to do expatriate analysis which is usually what drive faster some kind of revelation where we can actually start to find some insights and knowing what we can do what we can do and so on in there in the framework of the of the
business that we that we working on but then it comes down to recipients matching learning so trying to use that the staff to make some predictions of the let the last things but nonetheless important these ideas we need to answer those
questions that we try to solve or to what kind of and B and we need to remember this is a cycle of all when you usually arrive to your 1st model is just a 1st step into making it better assess the 1st step in to actually solving that these you might then realize that you have learned something by the have that that model is not the kind of folded the correct model that you need to use or that you need to do to change the kind of you worry that you were doing so far as you have learned something from the 1st edition of the cycle you're going in the right direction food as we want to mention that when when we talk about the the science especially in in that books ladies of most then we just focus on on on the material and algorithms which is fine because it's a lot of fun and talking with people that came from mathematical background so from a
programming they will get really deep into this kind of stuff because we find it fun myself included find find to to be playing with with Google's they've been called or to do stuff like that now actually most of the things that we do data science is something similar we're not playing with those
kinds of stuff from there than many other things like data cleaning or spread analysis usually takes much longer than than playing with algorithms or or taking them on and not the right dose of all those kinds of stuff and you see a lot of the pitfalls are are there so the refills things but I think this is is a very
nice list of sentences that I I agree with most of them are and I will just highlight a few things date is never clean yet most of the time tasks will not required learning or things layout was attacked sexually cool cool be done with very easy to re-examine and we will we will see that I can get this is basically other things I believe I didn't write this I hope there that the who wrote this but it's very pragmatic I like it a lot I think there's a lot of of fruits of surface about about data science the so this is going say that
cycle and this is some examples and the strength some stuff and see how far that goes and this is a cycle which basically is you could they tell you process data like if the and then use it and that's like a mantra we we're have to be a bit careful with that Monterrey because if you go deep into it you can yes you know you can try you can you can be biased by yourself bias by the data that you have and then because they have this kind of data and going to pretty these kind of things because that's what I can do or bias by all I really like to learn their native were right now so I'm going to that uh those kind of things happen happen all the time on actually would you should be biased toward is through the business to saying OK I'm trying to solve this issue and trying to predict the existing so what what what they did when he for that on what the state kind of fun with him a model that I need to make the prediction on that that's the right approach sometimes you might in fact using data you have and in that cool neural network although times you might be doing a very simple regression or yes just trust song KBs but that's fine the goal the goal always is actually to have fun and afterward you have done your goal is that when you have things toward something is going to change something is going to change in the business of something is going to change the how help people use your product or or in how you use your product or whatever but there has to be an accident if it's just like a normal which for the sake of the something is going wrong on an and you to fix it so that's going to getting data Our Common this is a very important part and the going to stop lot it but it's very important but because we can also be biased in in in in in getting data and the full of people talk about this but we can get data from you know a or internal data store will be out my sequel database and getting data then might means unisys from command a use of seeking commands and putting that into me your by phone calls or or or a file that you they're going to process and make predictions on now there is very important because usually 1 then you're going into them into the learning and doing was stuff the debate that a you don't think again about how did you get the data and you have gone on a mistake 44 or there's some kind of bias in how you get the data Europe have to be you you you will be condition for the whole and the rest of the cycle of these in the very 1st step of the final so you do need to be sure that you're doing it right or that if you are doing something that is the where you have questions you have written down those question marks so so you know where to go in the future if you need to if you missed to review this as I was telling we can get data from what can internal sources like like yeah the database where you have a date around Europe what states around your customers or something that a lot you can get us external sources which for me and understand bias here because I work on on on these can be things like like what data they can do that from crawling or things like that about the next step is to to process the data and what I mean talking about for processing data I mean died yesterday that the yesterday test so we get from that they tell go from a simple query that state or or whatever that these into their actually India rated Eureka going to use him by phone tool to make a prediction or to make a plot that's when ladies right and under steps in between where things can go wrong all where things yes can take time to to to make so uh we're going to very simple examples so I I this
is that what they a call is speaker bigger pedia which I find by pure coincidence and on is basically like our Wikipedia or or at least all of the speakers were around the Warsaw past topics you can find while mother or or things that are and how much they a cost you want to put in in your conference basically this was from interpenetrating no people a chance to speak places but the brain is some people do that 1 so I crawled the foresight and make a database of all 4 of these the status to
make some analysis and some we find stuff or or or insights into how that strange war of people who are of of people who
receive money for speaking walk sum I've done that with him the of to going to Holly products this site pretty easy and if someone is interested i can again so it to you writing like 10 minutes or so to to say that I 1 and this and then find to see the data analysis to so clinically debate in the a you can see here I'm just consume and this is the that was that's mine output of microbes it was actually 1 we got around more than 70 thousand speakers and we get a lot of information and just of plotting here at some some of the ones so in here some of their 1 that we have like before speaker name this C would have location perhaps stuff there's a lot of things to do to clean here which is very common in being getting data from the web and in some cases you can guess the do clean and why you wish to update that it's the same when you are putting delays or when you're crawling if I use their right radiates say I could have torn my hair those fees in tool our number that will be recessive float here and not string with half of the case but I've got very delaying and naive yes to showcase how these kind of things happen and we need to to deal with them the same thing happened for 48 hours were we have not a inside and inside the least or many other things I'm actually only a few columns here but they have many others so I mentioned your file can we team for
example the feeling that because we're going to lose something simple differ at the very 1st and I would like to see is you know how much people charts for speaking on how many people actually charts and things like that so I can so they can replace receive a case for serious in a string and then reload the state there that the core of the frame a float on we have then these ready to do was to be used to be consumed that's what I'm doing what I'm calling basically process the data fitting is ready for that 15 it's ready for for use in it well
Our uh and there's a lot of things to do to do it in
using data before going into making predictions with it and it was examples later said that we yourself so 1 thing that is close that thing that is college grad 38 analysis is basically knowing up can have the dataset they thought that was cool so we need to make now something out of it we need to know where we can start I'm breaking my role here I know I have no no business context or question in this protein OK this is just for fun and as I just don't understand and for finding the half on an objective insofar as these we will see other examples later where where I have that objective on are more like more examples uh this is not the case but it's product analysis appointees similar you need to see what what do they look like 1 if I see what effect and if I want to see what my data look like in the previous example where I can I can bring their their operates the median and the most Of the feasible that the the set up and receive variously here will have an average fee more than 1 what is that uh 10 2004 at 12 thousand dollars but then the the Maltese 0 which is a written as a pillar of people actually charts is 0 so that operators protein meaningless on that on that sense you we love boxplot we actually see that we actually see that was that we we see something on the books but this is not walks is just a line because the air there's so many things close to 0 and we see that that's also because we have like 3 outliers here the light the top of like a really crazy number of so crazy that I can think probably maybe is not true maybe it's do not i a whole speaker PDL lots but we can go back to the source and think again and this is why we need to to think about this kind of stuff what maybe if the speaker is actually like have Wikipedia on people kind the things that might be enough to let maybe something could do something crazy because that's what 10 millions or whatever you know that that might be or even if it's true it's so is changing a lot anything they they made at acidic of 70 thousand people here in just those 3 guys is going to change my numbers so may want to do is prove those light in any further analysis on 1 more thing to comment here I really boxplots and they're like 1 of the most important things important but you can think of 1 and probably in if I can choose only a few blocks to toward for the rest of my life he would be like only 3 of 4 and snake and do it with those writing scatter plot the box plot line plot and histogram and we need something else I'll journalists to blocked by chance but really not people who don't like actually that now after seeing this Brody tomorrow I'm going to use something has the sensitivities to bring important but uh that's that's that's 1 thing on we can go deeper into these and say OK this actually see the histogram but avoiding those of crises guys to see how actually is this this is distributed Susan is something that we will that we will expect head on we had the same thing of concrete in the media and the mean and the moles of what we say that the average is much lower but we still see the same because there's a lot of people trying uh 0 there's a lot of people as a speaker began with such an upturn in the there yes there because you know it's a least where you see people by location and people by and topics and things like that so what make you more sense to
see something like this where I'm seeing how many people do not touch anything and how many people use turning and what is the average of for those people which is around in 2000 dollars for for adults and but we see that only like 1 between for people speaker period that now it's like this this this is getting me back to my previous point of knowing always we tabulate the sources and how your bias from the very beginning because the right conclusion here is 25 per cent of the speakers in speaker pedia charts and average of 20 thousand dollars is not that 25 per cent of the speaker's starts at all because of muscle that this because of Poland charities that you are not in a speaker period I'm not and that's a very important point is kind of obvious in this case and maybe it's not so obvious when you are working with your database somehow but it's actually the same and you need to to half of the year all this and that we can do we can do here and we're not going to do about we could the stuff like repeat this kind of analysis for speaker topic and see how
different topics of a half of the charts different maybe what half of the frame rate you between people who charge and the people in charge of that something very easy with graphical memory for the topic so we can do I then within the location versus the how how how the correlate with the location of the speaker in all this kind of crazy stuff very interesting basically when we this battery analysis where we want to do that that kind of thing of of knowing what a median what historic mean what saw 1 of epicenters not the data to see how we actually looks like which values with half and as a which variables correlate or correlated with all others and then to use because about correlation but I'm going to give you this 1 comic about it uh which thing it's kind of important we this we can do a whole book is about this but I think think that the comic problem makes the point even better so OK we were using data this is that was an example of a very quick undertakes brother unit analysis and other things before we're going into predictions is escape eyes likely perform indicators so what are the right and what what are they the matrix of the thing that you're trying to solve or all within the jaw measuring because sometimes yes monitoring the right matrix can save your business on very simple things can can have a huge impact uh so we we shouldn't be afraid of when sometimes for for simple tools to the simple jobs every 2 class II is right for for 1 job on we show that the
afraid of of things like X you know is that the fact that we can consume the combined as in the recluse that that doesn't mean that sometimes uh by extension is not the right to I'm a I'm saying this because
actually useful most people consume data is basis is how must be because from data almost most the people it's also going to read your data so although a lot of times the output of an of an analysis of the output of a report or whatever is going to be on the emphasis B and it's important that we know of so how do not to work with those tools that is is not so difficult but uh had to make good use of them there is enough for a book written by John foreman call things simple data which is is about how to link the science only next uh and and it's a lot of stuff of modeling mental learning only in itself has uh when the Caltech so here and talking just about something that can give you a a graphic interface for viewing on anything that seems to be not really about microsoft excel in if
choose that picture decide things it's kind of amazing OK this going now
into making actually predictions is due to instrumental learning and more than an hour and I'm going to do super simple stuff here but when to use different examples and on and some different a a whole bunch of different algorithms at the 1st the
fall when we got to this this step is when we separated the data into what is called a train set and a test set up 1 this means whole what these mean everything into the science because this is the basic what of why you will be able to in very proof was why your predictions are correct this means that all that the data were preparing for we're anticipated into 2 2 pieces and 1 this is going to be used to trainer I would think 3 normative learning model and the other 1 is the 1 that we will use only to test the results so what is doing is the 1 that we're going to testing the model and then seal but there were right or not and because we we know the answers for their for that 1 so we can see what the stances of 4 algorithms money for that matter and can have some kind of of accuracy into or predictions so when that is very easy to get biased by these various yes of 2 of your dataset not being specific enough you have a sample said that this actually not with enough for the brand that you try to solve but then you divide you train the model you tested with the test set and citation what half 90 % accuracy and when you sign going to a real dataset outside Europe freebies big dataset the accuracy is completely wrong and the top a lot of uh the couple of times it's a very big problem uh so we we need to to be doing this all the time is what we train set and test set is what is going to tell us how with either the means but it's not like a I think it's still biased by how was your 1st dataset and where do you get it and how they can get at 1 after doing that would have basically anyone western to to to to answer from my very simplistic approach which is going to break a battery or they want to predict the number so want to break a battery I'm inaccuracies in a justification problem if I want to break the number this is just a regression so they're all basically 2 things that go on and mean simplistic and I'm thinking it's cases aside but we can put that most everything in those 2 buckets are very very different state and they depends on what is the output going to be a number of what is going to get a degree of let's start with that with the regressors with the same I thing is what is what put everybody has gone voting in in high school fast uses the squares on least-squares assignments in the number of things that will make predictions with some data for where some of the that we would predict that all the points for for data using some some trains data and half they're all of since the last so things they support support vector regression for example was an example but it is a square feet is
basically a matching algorithm on any of the reasons that we do are basically going to be the same or the same in in and in the theory that anything that that would change most of the times is how we are defining the distance between the adults on or perfect line of a quarter to that to to to those stops how you define these these these specificities thing or that thing or any other crazy thing is what will change between having a very simple I was things here often finding a more complex 1 but on the end where basic human needs may where the this for 20 dimensions and the 4 2 or you know would have made it may be a lot of people a whole bunch of other problems but they're all on the ancestry what we're doing but I'm going to do another example here
uh the data then going to use now here is more business-oriented is hard drive uh it's hard drives prices that are so it's great from the internet site of horses with like features for 5 rights on some and prices and I can I come basically during the early never which is I think this is
course this and and then here after dividing uh into test and train sets my my may data I can see basically more is what is that the variance score for about a for for that university on see how it looks like a and we can very easily
using various suggesting the skill of the more complex more complicates progress supervector matching is just 2 lines is is to is to train students to to to bring the score on probably again 20 lines to make a plot but on the and this is very easy to do and we can get some some results we see that the results here are not much better than the cells that were from least squares are just like 5 per cent improvement something that which might mining of in in in business context but uh it's actually not a lot very quickly some justification issues let's try to do as an example of list right to put effects into how people is using a platform for example and here then i'm doing array of a real world problem and use and trying to get to know better the users often birthday the the the free to free platforms plotting on and divide and how they use a brother so I'm going to be looking into how much people of is just the best from from many more than they do all of the queries on how often do they do that use that volume of of course I can try to divide that in in clusters of that and just then something that I didn't know about that datasets on and hopefully make me the weather the seasons in the future so we again loathsome some stuff from a student wheels later we spend as we do our quick a quick model with using means which is 1 1 way to the clusters 1 other thing to clusters I would i don't like how it looks like because the correct balance of staff so basically the only clustering that has done is in 1 of the axis of which kind of not sense right so I say with this to come interview you who have 4 clusters also would look mean so so this list right and we find basically the same thing animation here which is very obvious for anybody who has done some uh some clustering before or even some children before but not for for the real beginners is that you cannot be then this this is this is absolutely wrong you cannot be working with an ax to go from 0 to I what on 1 the go from 0 to 1 that's this is never going to work especially in clustering so we need to the data and the going to do it but we just basically to normalize the 2 variables that we were trying to plot and then we just repeat the same thing with have what we go from 2 to 1 on we actually have some kind of clustering that makes more sense peacefully but as when I go to the data because if I now use this stuff to see OK which which which uses this nicely with real examples I see that it makes a lot of sense and 1 of these users can be an all their user will use Python on Connectathon applications we refer API and is the millions of queries versus the guy who is of the UI to do crawling with of knowing even what cronies on making that prediction might very valuable because you can implement that into your writing of your your head this system under customer support David you of working in your company can knowing ahead when to get support common is that guy is actually very technical a technical diaries doing this kind of user or of that kind of usage and that will improve the experience of for the user and the support that they get and also the life or your of friends to support this last thing then talking about very briefly ran out of time so it's a webpage classificator I think I'll have decision 3 which is another way to classify things in this case the context
is then trying to do I trying to basically know which kind of website is a website just by looking at very simple at of that website and which type of West of website and mean classifying the context of the contents of trying to know OK this is Honey kormas
website or summer or this is our job supplication wars or this sort events data things later there are 4
that various again with its Q-learning yes to 3 lines to to make a decision tree on us to plot we plot the thing here on again and we have very nice mistake here which is when you see something like this and isn't 3 supposed to be simple to that simple to read and simple to to interpret a simple to know what it's they knew when users something as big as this is because you're doing something very wrong there overfitting the whole dataset into a lot of very small conditions that would drop into these huge list of categories and on decisions of to then make the classification of there we can vary the change that yes by by the number of things actually but the most simple 1 you can just say no the maximum the maximum number of leaf nodes that I want to use these and then you put them on a much simpler or this season 3 which you can read and write to see if it makes sense which you can make a prediction yes in only 1 line with your test data and see actually how it how it all works
out on that seat the recap always know what price would trying to solve of clean your they think it's great to use the word of very common problems like overfitting I tried to make an example of that or normalization of of your data addressed to make an example of that 1 can always try to half on output which essentially something actionable something you say OK we think this analysis and now we need to change being a business now we need to change this in how we do support the people and born how we're doing this inner product when and how we are dealing with the state that is there is no that can on basically the whole the phosphate and you need to to learn from that cycle again into the loop on on and make it better so that was the biologist telling you that we're firing a lot of timber they also there's a lot of different positions DevOps front-end QA even Python with with very with a lot of data on the basis and the role so even the 1 that I want to talk about that or about data science borrowed by phone about great out here for 4 minutes later and I will very probably due to engage in conversations thanks for your
attention few during questions I've just seemed to jump over the abyss and adventurous right that this light that they do EPA is there's some saying in data science to like them into heroes cycle in the beginning of time the high court what's right what was the question on the cycle in a kind of QDE really didn't have reference that this theory at all what there we go and at this light detectors for friends of yes referring to the source you're referring about the the report all and of
so are a no-no a demand that i the from 0 . but thing that's that's precisely the moment I have I have actually words for all the the things that I have to make the for very well might effect on on the ABS basically piece of that moment of racism where you know what kind of graph and you really trying to solve from a mathematical point of view so what algorithm is going to work because when we are doing just expressed rate analysis and we have the backing and we might not even knowing at that moment for a complex bring mind knowing that the proteins and we're going to the regression of classification with my notes and even less what kind of algorithm is better for the classification problem of for that regression and that's the point of the revelation basically when you think you have an idea of how to so that and then you just
need to apply which is much easier the the and what is your experience with the scalar as when you were in the beginning of the to have to know all the world but trying around with different parameters grammars until I get a result or you do you don't have to know the interludes of of resistance it's it's very easy to use insulin basically and then the common based on based there is even a trail of how to approach it from the sounds like depending what kind of friend you have what you need to what the you need to use which is to be like a great model into how to much learning with the it on was you know what I was going to use which is usually just a few lines of code to to put in there and knowing which right parameters are going you need to use its so if we are objective is a very hard problem and is basically the whole thing around this is how the you fit those parameters so much from a simplistic point of view is not so much you can just use basically some defaults or something 1 them you committed a loop on 30 through different parameters and see how it looks like you always need to have fun on output from your model which is there are up to all all or prediction or even better the 2 of them so you can see OK I put these parameters this is my about my output do I like it or not this the parameters to do with it something that we think that it makes sense that would be a simplistic approach into how to of the change from meters on on feeding the right things in a using a scalar thanks sorry to hear 1 last question no I think you measure for the talk that that's all felt that at the end if that was launched in government
Portscanner
Soundverarbeitung
Uniforme Struktur
Ordinalzahl
Computeranimation
Eins
Analysis
Portscanner
Web Site
Bit
Prozess <Physik>
Machsches Prinzip
Baum <Mathematik>
Computeranimation
Entscheidungstheorie
Komplexes System
Metropolitan area network
Bit
Wellenpaket
Punkt
Sichtenkonzept
Analogieschluss
Grundsätze ordnungsmäßiger Datenverarbeitung
Physikalismus
Softwareentwickler
Grundraum
Computeranimation
Rechenschieber
Resultante
Metropolitan area network
Notebook-Computer
Beweistheorie
Personal Area Network
Gerade
Computeranimation
Bit
Subtraktion
Gewicht <Mathematik>
Freeware
Spider <Programm>
Interaktives Fernsehen
Systemplattform
Computeranimation
Metropolitan area network
Informationsmodellierung
Benutzerbeteiligung
Flächeninhalt
Fächer <Mathematik>
Wort <Informatik>
Maschinelles Sehen
Unternehmensarchitektur
Beobachtungsstudie
Punkt
Selbstrepräsentation
Anwendungsspezifischer Prozessor
Güte der Anpassung
Zwei
Systemaufruf
Extrempunkt
Kontextbezogenes System
Ereignishorizont
Framework <Informatik>
Computeranimation
Metropolitan area network
Energiedichte
Diagramm
Datensatz
Dienst <Informatik>
Prozess <Informatik>
Dreiecksfreier Graph
Softwareentwickler
Bildgebendes Verfahren
Analysis
Metropolitan area network
Informationsmodellierung
Algorithmus
Prognoseverfahren
Rechter Winkel
Stab
Dreiecksfreier Graph
Computeranimation
Richtung
Ähnlichkeitsgeometrie
Baum <Mathematik>
Computeranimation
Analysis
Wissensbasiertes System
Prozess <Physik>
Fortsetzung <Mathematik>
Computeranimation
Task
Informationsmodellierung
Prognoseverfahren
Flächentheorie
Lineare Regression
Speicher <Informatik>
Hilfesystem
Gammafunktion
Softwaretest
Datenhaltung
Abfrage
Systemaufruf
Mailing-Liste
Plot <Graphische Darstellung>
Quellcode
Biprodukt
Elektronische Publikation
Diskrete-Elemente-Methode
Konditionszahl
Dreiecksfreier Graph
Mereologie
Aggregatzustand
Neuronales Netz
Datenhaltung
Systemaufruf
Computeranimation
Analysis
Web Site
Gewichtete Summe
Prozess <Physik>
Rahmenproblem
Datenanalyse
Fächer <Mathematik>
Zahlenbereich
Extrempunkt
Computeranimation
Eins
Data Mining
Temperaturstrahlung
Metropolitan area network
Ausgleichsrechnung
Funktion <Mathematik>
Binärcode
Biprodukt
Elektronische Publikation
Zeiger <Informatik>
Schwimmkörper
Speicherabzug
Information
URL
Aggregatzustand
Zeichenkette
Sensitivitätsanalyse
Quader
Atomarität <Informatik>
Zahlenbereich
Computeranimation
Gradient
Metropolitan area network
Prognoseverfahren
Mittelwert
Gerade
Schreib-Lese-Kopf
Analysis
Inklusion <Mathematik>
Managementinformationssystem
Videospiel
Nichtlinearer Operator
Streuung
Anwendungsspezifischer Prozessor
Plot <Graphische Darstellung>
p-Block
Biprodukt
Kontextbezogenes System
Medianwert
Objekt <Kategorie>
Arithmetisches Mittel
Ausreißer <Statistik>
Histogramm
Hypermedia
URL
Matrizenrechnung
Punkt
Datenhaltung
Klasse <Mathematik>
Familie <Mathematik>
Dimensionsanalyse
Quellcode
Medianwert
Frequenz
Computeranimation
Arithmetisches Mittel
Metropolitan area network
Variable
Prognoseverfahren
Rechter Winkel
Mittelwert
Prozess <Informatik>
Festspeicher
Maskierung <Informatik>
Indexberechnung
URL
Korrelationsfunktion
Analysis
Arithmetisches Mittel
Informationsmodellierung
Rechter Winkel
Basisvektor
Systemaufruf
Benutzerführung
Maßerweiterung
Analysis
Verkehrsinformation
Analysis
Funktion <Mathematik>
Resultante
Abstimmung <Frequenz>
Subtraktion
Punkt
Wellenpaket
Zahlenbereich
Computeranimation
Informationsmodellierung
Prognoseverfahren
Algorithmus
Wellenpaket
Reelle Zahl
Lineare Regression
Endogene Variable
Stichprobenumfang
Gammafunktion
Funktion <Mathematik>
Softwaretest
Vektorraum
Entscheidungstheorie
Arithmetisches Mittel
Quadratzahl
Minimalgrad
Menge
Rechter Winkel
Beweistheorie
Methode der kleinsten Quadrate
Normalvektor
Logik höherer Stufe
Aggregatzustand
Umwandlungsenthalpie
Metropolitan area network
Festplattenlaufwerk
Web Site
Algorithmus
Perfekte Gruppe
Rechter Winkel
Hausdorff-Dimension
Abstand
Große Vereinheitlichung
Physikalische Theorie
Gerade
Computeranimation
Resultante
Lineare Abbildung
Wellenpaket
Stab
Familie <Mathematik>
t-Test
Zellularer Automat
Schreiben <Datenverarbeitung>
Kartesische Koordinaten
Web-Seite
Systemplattform
Computeranimation
Data Mining
Metropolitan area network
Informationsmodellierung
Variable
Prognoseverfahren
Arithmetische Folge
Reelle Zahl
Spezifisches Volumen
Cluster <Rechnernetz>
Große Vereinheitlichung
Gerade
Varianz
Schreib-Lese-Kopf
Softwaretest
Soundverarbeitung
Videospiel
Lineare Regression
Abfrage
Plot <Graphische Darstellung>
Mailing-Liste
Physikalisches System
Matching
Kontextbezogenes System
Entscheidungstheorie
Arithmetisches Mittel
Summengleichung
Quadratzahl
Menge
Rechter Winkel
Klumpenstichprobe
Ext-Funktor
Softwaretest
Web Site
Kategorie <Mathematik>
Extrempunkt
Mathematisierung
Zahlenbereich
Mailing-Liste
Kontextbezogenes System
Ereignishorizont
Quick-Sort
Computeranimation
Entscheidungstheorie
W3C-Standard
Netzwerktopologie
Metropolitan area network
Knotenmenge
Prognoseverfahren
Prozess <Informatik>
Konditionszahl
Datentyp
Inhalt <Mathematik>
Gerade
Subtraktion
Umsetzung <Informatik>
Ortsoperator
Gruppenoperation
Quellcode
Skalarproduktraum
Physikalische Theorie
Abenteuerspiel
Computeranimation
Bildschirmmaske
Rechter Winkel
Front-End <Software>
Dreiecksfreier Graph
Basisvektor
Wort <Informatik>
Personal Area Network
Verkehrsinformation
Haar-Integral
Aggregatzustand
Funktion <Mathematik>
Analysis
Resultante
Parametersystem
Subtraktion
Sichtenkonzept
Punkt
Momentenproblem
Graph
Mathematisierung
Formale Grammatik
Bitrate
Skalarfeld
Code
Computeranimation
Loop
Weg <Topologie>
Informationsmodellierung
Algorithmus
Prognoseverfahren
Lineare Regression
Meter
Wort <Informatik>
Default
Gerade
Funktion <Mathematik>
Analysis

Metadaten

Formale Metadaten

Titel Everyone can do Data Science in Python
Serientitel EuroPython 2015
Teil 114
Anzahl der Teile 173
Autor Elola, Ignacio
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20129
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Ignacio Elola - Everyone can do Data Science in Python Data Science is a hot topic, and most data scientist use either Python or R to do their jobs as main scripting language. Being import.io data scientist for the last 2 years, all of them using Python, I've come across many different problems and needs on how to wrangle data, clean data, report on it and make predictions. In this talk I will cover all main analytics and data science needs of a start-up using Python, numpy, pandas, and sklearn. For every use case I will show snippets of code using IPython notebooks and run some of them as live demos.
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Ähnliche Filme

Loading...