We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

A Gentle Introduction to Data Science

00:00

Formal Metadata

Title
A Gentle Introduction to Data Science
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A Gentle Introduction to Data Science [EuroPython 2017 - Talk - 2017-07-12 - Anfiteatro 1] [Rimini, Italy] This introductory talk, will cover the basics of datascience. From the incluence of artificial intelligence, and the quest to replicate a human mind, to a practical demo on how to build a hello world machine learning in Python. The talk will try to answer questions such as: What do we understand by data science? What do we know about the human mind, that can be an inspiration for our programs? Which problems can we solve with data science? What tools are available to do data science in Python
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
IntelInteractive televisionMobile appInversion (music)Form (programming)Lecture/ConferenceXML
Theory of relativityCartesian coordinate systemAreaWell-formed formulaBitMereologyWebsiteMultiplication signBeat (acoustics)JSONXMLLecture/Conference
Visual systemField (computer science)Shape (magazine)Square numberSoftwareProduct (business)MereologyPattern recognitionProjective planeFrequencyVisualization (computer graphics)Category of beingNeuroinformatikFocus (optics)Symmetric matrixContext awareness1 (number)MicroprocessorVideo gameInformationScaling (geometry)Multiplication signCellular automatonEvent horizonCircleOrder (biology)NumberMedical imagingPixelComputer-assisted translationCountingTouchscreenProgrammer (hardware)Level (video gaming)Interactive televisionSlide ruleBitArtificial neural networkBenchmarkSet (mathematics)Sphere
RotationDifferent (Kate Ryan album)Computer-assisted translationDiagonalLine (geometry)TriangleCircleEndliche ModelltheorieRow (database)State of matterLecture/Conference
Self-organizationCellular automatonProcess (computing)Channel capacityRead-only memoryTheoryData structureFunction (mathematics)CalculusLogicMathematicsBulletin board systemWeightSystem programmingComputer networkTrigonometryPattern recognitionMachine visionInferenceMathematical optimizationBoltzmann constantInformationDynamical systemVolumeMaß <Mathematik>WeightNeuroinformatikBitInformationTheorySoftwareBoltzmann constantConnected spaceCodeNP-hardMultiplication signPoint (geometry)Standard deviationLinear regressionDistribution (mathematics)Film editingCategory of beingOptimization problemCASE <Informatik>Endliche ModelltheorieMedical imagingSampling (statistics)Arithmetic meanComplex (psychology)PixelNumberArtificial neural networkBasis <Mathematik>Dimensional analysisAverageVirtual machineLine (geometry)Flow separationLinearizationSingle-precision floating-point formatInheritance (object-oriented programming)Semiconductor memoryFunction (mathematics)Covering spaceMathematical optimizationDerivation (linguistics)ResultantoutputCross-correlationSlide ruleAssociative propertyRankingGraph coloringDisk read-and-write headIterationContent (media)WordBit rate1 (number)PlastikkarteInternetworkingApproximationVotingSign (mathematics)MereologyAuthorizationWeb pagePattern languagePattern recognitionParameter (computer programming)Direction (geometry)Sigmoid functionDigital divideElectric generatorRight angleDifferent (Kate Ryan album)Phase transitionSpacetimeImpulse responseXML
Data structureComputer networkPoint (geometry)SoftwareMedical imagingPattern language
Data structureComputer networkError messagePredictionFunction (mathematics)Insertion lossGradient descentGradientStochasticBuildingUnsupervised learningScale (map)Representation (politics)AlgorithmSet (mathematics)Transformation (genetics)YouTubeFilm editingEndliche ModelltheorieFrame problemError messageFunctional (mathematics)SoftwarePattern languagePoint (geometry)Direction (geometry)Hill differential equationDerivation (linguistics)QuicksortBefehlsprozessorRecursionPlastikkarteMedical imagingDifferent (Kate Ryan album)Computer-assisted translationPredictabilityGoodness of fitLine (geometry)Sigma-algebraProfil (magazine)Linear regressionMultiplication signPattern recognitionInformationProcess (computing)Representation (politics)Term (mathematics)1 (number)Artificial neural networkCartesian coordinate systemBlock (periodic table)Physical systemDependent and independent variablesWebsiteGodSupervised learningMaxima and minimaSigmoid functionField (computer science)NumberData structureWave packetBounded variationGroup actionElectric generatorSpherical capNormal (geometry)TriangleLevel (video gaming)Bit rateArmFlow separationShape (magazine)Projective planeSpeech synthesis
MetreDecision theorySquare numberSimilarity (geometry)Linear regressionLecture/Conference
Linear regressionRoboticsSystem programmingEmailWordMereologyMetreLinear regressionRoboticsVideoconferencingComputer animation
AerodynamicsMaxima and minimaRoboticsRoboticsState of matterDifferent (Kate Ryan album)
Cartesian coordinate systemPoint (geometry)Game controllerOrder (biology)Wave packetRoboticsArtificial neural networkLecture/Conference
System programmingMathematical analysisComputer networkElectronic meeting systemPattern recognitionComputer-generated imageryVirtual memoryCASE <Informatik>Connected spaceMedical imagingSoftwareTouch typingPhysical systemPattern languageGroup actionDifferent (Kate Ryan album)DistanceEndliche ModelltheorieRepresentation (politics)Point (geometry)Pattern recognitionAssociative propertyProjective planeProduct (business)Bayesian networkDatabase transactionFlow separationWeb applicationLevel (video gaming)State of matterDefault (computer science)QuicksortElectronic signatureMultiplication signUsabilityKeyboard shortcutPlotterReliefData conversionBijectionWebsiteElectronic mailing listSampling (statistics)Radical (chemistry)1 (number)Game theoryControl flowSingle-precision floating-point format
Medical imagingServer (computing)Pattern languageMultiplication signWeightLecture/Conference
Multiplication signConnected spaceInformationLevel (video gaming)Wave packetProcess (computing)NumberFormal language1 (number)Key (cryptography)Lecture/Conference
Communications protocolRight angleFormal languageVirtual machinePrologTerm (mathematics)WebsiteComputer programmingQuicksortFamilyDemosceneWave packetInteractive televisionDataflowCASE <Informatik>Programmer (hardware)Standard deviationMultiplication signCodeLecture/Conference
Transcript: English(auto-generated)
Hello everyone, thanks for coming to my talk and even if it was not in the app so I guess it was a bit challenging for you to know what was going on here. This as the title says, this wants to be a gentle interaction to data science, just
to give an intuition. There are some formulas, some stuff, so I'm not sure how gentle is this being gentle. The goal is not that you really understand everything on that or really all the concepts, it's more to get an intuition or first where data science comes from or where artificial intelligence comes from, that it's quite related, it's my area and second just to see some
applications how it can be used. It's not really technical but there are some parts that are a bit more about formulas and understanding some concepts. So who I am, I'm working currently in a dating site, Badu.
I am a data scientist there. I have a master in artificial intelligence to give some context that this is data science but also I'll be talking about the human brain and things that are more considered artificial intelligence than data science. I'm also the non-focus ambassador here at EuroPython, non-focus is a foundation
that is sponsoring many of the PyData projects or all the PyData projects, even the PyData brand is a non-focus thing. Non-focus is also organizing the PyData events. So you can find me most of the time at the non-focus booth later on if you are interested in some of the projects to do data science in Python or even not in Python, some of
them are also in other technologies like Julia or Stan. So let's talk a bit about the human brain and I would like to do a kind of a small experiment but just try to make you visualize a bit to really be able to show what I want to
tell. It's kind of a silly example. It probably will sound to you more as a meditation exercise than as an artificial intelligence but what I want you to think is that your own eyes are kind of like two cameras. Let's imagine that they are like 20 by 20 pixel cameras and this image on the screen
is just like the number five, there is nothing magic on it and it's not like anything that fancy but it's just like 20, it's a famous, it's from a famous data set named Minst. It's quite common, used as a benchmark in artificial intelligence in deep learning and other techniques.
And what I want you to do is just to imagine that your eyes are just cameras and that you just perceive this image, this 20 by 20 pixel. So what it arrives in your eyes, in your retinas, it's just like this 20 by 20 information that can have values, let's say that zero for one and zero for white and one for
black. Okay, if you imagine this, then whole information, it goes into your brain. It feels arrived to the eyes, these spheres you can see there, there are these channels of neurons, we'll talk a bit about neurons in a second but there are these channels that just to propagate these 20 by 20 pixels, so imagine 20 by 20 neurons, just taking
this to the end of the brain where the visual cortex is. The visual cortex is the part, as the name visual says, that it's mostly focused on the understanding and the recognition of images. So going back to example, we've got this number five, it's just like zeros and ones
to say that get to your eyes and then at the end get to the visual cortex. So some years ago, you started to think this brain, I think it's something quite obvious but I think it's still worth mentioning, that the human brain is just like a network
of neurons and neurons basically are just like these cells that transmit electric information from one side to the other. The tricky part, the whole human brain, the whole human mind, everything we know or we think we know, it comes from these neurons just interacting among them in a massive scale and all the interactions are what are really making all our thoughts, all our
perceptions, everything. It's probably not very intuitive when you think as a single neuron like having all these powerful but when you combine thousands of them, it's really like that. It's like if you imagine a switch, a switch is actually nothing intelligent but if
you have a microprocessor of a computer, just like switch, it's just like zeros and ones, just activating some signals, some electrical signals and at the end it's so powerful what you can do with a computer. So it's exactly the same thing. So Hubbell and Thorsten were two researchers quite advanced for their period that they
did an experiment with this poor cat. He did an amazing contribution but I think he wasn't really happy about that. They basically connected some sensors to his brain, to the initial part of the visual cortex. As I was saying in my example, you are getting this information, zeros and ones, it's
perceived by your eyes as if they were a camera. Then they go to the visual cortex and what they were studying is this first layer of the visual cortex. So what happens when these neurons instead of being just a channel that propagates the information, they start to mix one with another. So they start to build these networks. So the idea is like, okay, probably this cat in this visual cortex is recognizing circles
like the one you can see in the experiment is probably recognizing squares, recognizing different shapes. So they took this cat, they had the cat just looking at shapes for many hours, looking for circles and waiting for neurons to activate. The funny part of this story, I don't know if you can see in this, I think it
can be seen in the screen, you can see these, those are these old slides that it was just a transparent paper and they had like the shapes on that. The funny part is that the neurons of the cat, of the visual cortex of the cat, the first layer, actually activated not because of the circle, not because of the triangle,
not because of any of the shapes, but because of the edge of the, of the, like the border of the, of this paper. You can see this line over here in diagonal. That was really what it made, it made the cat activate these neurons. So in the first layer, the takeaway of this is that in the first layer of these neurons
connecting one with another, actually what is going on is that, is that they activate for very, very small, small patterns, to say very small, just like edges in different, in different rotations and, and that. Another important, important experiment by Donald Hebb is based in the Hebbian theory is that
these connections on the brain actually are, are besides some of them that are of course when you are born you already get some, some connections in, in your brain, but some of them are created just because they get activated. So if you keep, keep watching this edge we've seen before for many, many times what
this theory says, this theory is a bit complex and I don't expect you to, to even read that is so you have them in the slides, but the whole point is that if you keep activating, if you have like an impulse, you'll see like this person, this face, this face, this face, one time and another time and, and, and once and once more, then at some point the neurons that are activating them, when the first time you see them,
they become stronger and stronger and the, the, they're kind of hard coding the, the information in the brain. So it's like something that you've never seen one activate much neurons and something that you see often they will activate much more neurons. That's something very interesting because at the end, like memory or the learning,
actually basically everything starts from, from this, everything is based on, on seeing something and then having it hard coded because some connections on the neurons are there and then being able to reproduce that. So we'll see, this is kind of like a bit of the history of the human brain, what's in the human brain, now we'll go to the counterparty like whole computers
are kind of cloning, cloning all those things. I'll start this, this is just like a linear regression. I assume that most of you know a linear regression. It's just like I got some input, let's say that I, you told me your age and you told me your, your weight to say and I want to break your height. So I just multiply each of the values and then I add them together.
That's a linear regression and at some point you find if there is a correlation between these input features and that, you get that. So this is something quite simple, but at the end it's just like a way, if you think as this, as neurons, is a way that every neuron can have a weight. So every, every input you will have will have more or less importance.
At the end, you pack all together, you will have like a single value. And what's very important in this is that this activate function, what it does is being arises the, the result. At the end, what you have is that the output of this is a zero and a one.
There is something similar, it's almost a logistic regression, if you ever heard about, about that in data science. But it's still quite powerful if you start to, if you start to do that. How is it binarized? I won't talk much about that, but it's not usually taking like, like positive plus negative and making positive zero or, or positive one and negative zero.
It's usually used in, in this sigmoid functions, just as a, as a reference. That what they do is, is just do the kind of the same exact thing, but you can take the derivative on that. And it is very key because in, in optimization problems, when you want to build these networks and these,
and you need to see the, the, which weights are the good ones, being able to derivate that, it's, it's really, really useful. It's how, how you really optimize. I'm gonna talk about Hopfield networks. I'm gonna cover this briefly. Hopfield networks are just like networks of these artificial neurons that everything is connected with everything.
And the nice property, it's, it's kind of something super, super simple. It's, it's nothing really, really complex. As we've seen, just a linear regression, you can implement this in a single line of code, and then you just connect them among them. The nice thing is that this has a property that it's named an associative, it's, it's called an associative memory.
What it does is like, you show, as, as I said before, with the number five, you keep showing images, and then the, this network is remembering the images in these, in the same weights, in the weights of the linear regression, it's, it's learning the, it's learning these, these weights. In this example, what you can see is like,
you have the original image in the, in the left, is the image that you've shown to the network, and then at some point, you show an image that is similar to this one. And the network kind of remembers, and it optimizes in a way that after several iterations, what you get is the original image again. So it's kind of, it can be used, for example,
to, to recover corrupted documents, corrupted images, because whatever you've, you've shown to the network before, at some point, will, will, will be back to you. That's kind of the point. If, if you show all this to the neuron, to a neuron, and of course, you need a, to a hopeful network, I mean, and of course, you need a proportional number of pixels,
so of, of neurons that are able to store all this information, but if you show all this at the end, and you check the weights, and you represent the weights in a matrix, you have something like this. Basically, it's kind of subtle, but you can see that this is like the shape of like the average number, and in every of the pixels at the same time, it have the average of the numbers that are activated,
that are, that have values at that pixel. So for example, if you check on this side over here, on the, on the right, on the top right corner, you will see the number seven, number five that go in that direction, while if you see, for example, in the, in the top left, you will see the number three,
because the number three actually ends up in that dimension. Actually, it's kind of more a trivial thing, but this really shows how the weights of the linear regressions I was talking about before actually are keeping information on these images. They are able to recover it later. After Hopfield networks, there are Boltzmann machines that start to be the basis of deep learning,
if you've heard about that, I guess you do. Boltzmann machines are the same with the difference that in the previous example, all the networks are the same as the pixels you are getting. So if you are giving like 20 by 20 pixels, you have this network, these, these neurons,
and this is exactly what you, what you have to store the information. In these cases, you have like hidden net neurons that they don't have a direct exposure, they, you won't have like the, the initial perception, but you will have like an indirect perception. And these are actually much more powerful because these, this work as a, as a generative network. A generative network is something similar
to what we've seen before. But the idea is that you've got a model that it learns some parameters, in this case it's like the weights of the neurons, but if you're thinking as a, as a Gaussian, imagine that I want to, I want to generate data that it comes from the distribution of the height of the people in this, in this room or their ages.
So at some point I can have like a Gaussian distribution, I say, okay, the mean of the age in the people in the room is 30, to say on the standard deviation is five. So I can start generating samples, and whatever samples I'm getting, it would be like the real samples, like if it was real people, that I knew the age of the people, like the same as if I get the data
from people entering the room. So what is quite important is if you do this with image recognition, you can start making like these models, these networks, start to learn patterns. This is something I think it's named like the deep dream, I think it's an experiment by Google, that they started to show images to one of these networks,
and what was happening is that then they said like, okay, now tell me about the data, give me this data, as I was saying with the ages, don't show me more data, don't show me more, don't tell me about the data you have already seen,
generate new one, and this is what they got, I'm not sure exactly what they used to train this network, but this is something generated by a computer that just seen some images, I guess they saw some animals for what you see here, these capsules, I don't know exactly what they are, it looks like it also saw people, maybe a temple, I don't know exactly what it saw, but it can actually show again what,
based on the images that has been seen. Again, just to emphasize a bit on the example, if you tell me the age, and I start to assume that the distribution is this, and then I start saying numbers, I think a normal age would be like, I don't know, 32 and 28, so this is basically what it's doing, but with images,
which I think is quite an interesting thing. The problem with artificial intelligence usually is that you find these models, they are super cool, they are quite amazing, but at some point you realize that they are NP-complete, which it means that a computer would take like, from the same time as the big bang to the present day to compute all this information
to get like the optimal solution. So in practice, what is being used is these restricted Boltzmann machines, that they have some properties that they don't have all the connections, so they don't really have these properties in the same way it was proved for the Boltzmann machines, but they still have mostly everything, it's kind of an approximation, but a good approximation,
and this one can be trained somehow, there are some techniques to optimize them and train them easily, which is quite nice to make things in practice. In practice, you don't have just one of these network, what you do is just have layers of networks and one connect to the other, and as I said before, the first layer with the cut was just an edge, you can see this in the left,
you have these very small patterns for themselves, they are not that useful, but when you start to combine them in the neurons, in these networks on the next layer, actually you start to have an eye, you start to have a nose, you start to have things. So it's kind of a sequential thing, like a cascading thing, that you start with a no pattern at all,
you start creating small patterns, at some point you have an eye, and at some point you combine eyes and you have a face, and you can construct anything you want if you have enough number of layers. I'll talk very briefly about that, just to give you an intuition on how this can be done. What you need in data science is just to have the error.
So you have some data, whenever you have some data, you know that, let's say that you know that this is an eye or what you are looking for, so you're comparing the answer with the prediction of your network. And whatever is mismatching, if you are trying to predict a cat and you don't have this cat image, you compute the difference between this cat and that.
This is for a linear regression, it would be much complex, but imagine that you all just have that, you can really compare the images that your model is generating with the images that you want to see. And based on this error, you just optimize it. And I was talking before about this derivative, how these functions can be, you can derivate them
because they are just linear regressions with this sigma a function that can be derivated. And because you know the derivative, the derivative is showing you where, how you can go to the minimum point, to the point with less error. So you start up to this hill and you just see, okay, you derivate your result, your function that you are using to estimate, and then you see that it's going down in that direction.
So you do a step in that direction, you keep iterating, at some point you end up with a place where the error is minimum. If the error is minimum, it means that your model actually is performing as expected, if we're being able to reproduce cards or whatever it can reproduce. And you do this by layers. So you always start doing that in the last layer,
you check the error and then you do it in a recursive way backwards. This is actually from a Google experiment they took, I think it was 40,000 CPUs and they train these sort of neurons with YouTube videos, frames from YouTube videos for a whole weekend to see what was the internal representation
that these models were learning. And what they found, or what they say they found, it's this cut in one of the neurons, it's looked like it's what it learned, and it probably makes sense because in YouTube it's full of cuts. So at the end, if you think on this whole thing on how the brain is learning,
the neurons are changing the transformations based on the information we see. You can see the brain as a thing that it's really just a set of patterns that have been learned, like these shapes that if you have children probably you can recognize. And at some point you have like these edges, these triangles, these faces, and you are only able to really recognize
whatever you've seen before. Your brain has been trained to understand certain patterns, and only these patterns are the ones that you can really understand. And it's exactly what your brain is doing in terms of recognition. This is a very interesting experiment. What they did is like, you have this picture from Van Gogh, the one in the middle in a small,
and they trained a neural network with this information. That information, what did happen is that the neural network, it happened to really get a representation of the world, the neurons were trained in a way that actually only could see Van Gogh pictures. So after that, after all the training,
and after all these neurons that has been coded to activate at certain stages, they shown this picture. I think, I'm not sure if it's Amsterdam, what is this place? Whenever you ask it, tell me, show me this picture that I just shown you, because the reality of the model,
the internal representation of the model is just about Van Gogh. What it returns actually is the same exact picture, but just with the pattern from Van Gogh. It's like the Van Gogh patterns. There is no way it can paint a red ceiling, it's a red roof, because the neuron has never been trained for that. So I think it's really, really interesting how this works.
Now, I just wanted to give an intuition why, or what can be achieved, or what's the direction of this whole world of artificial intelligence, and what's the top training, but I want now to talk more about practical applications, what's really being done as a data scientist
in a day-to-day job. The first one is classification. I would say it's the most common one, something usually named as supervised learning, that you have some data. I work for this dating site, Badu. We have many problems with the spammers, people trying to abuse the system, people trying to cheat on our users and that,
and we need to detect them and block them. We need to block some accounts. So at some point, as a data scientist, you get a data set where many people reviewed profiles, and they said, this is a spammer, this is not a spammer, this is a spammer, this is not a spammer, and you have certain information. You can imagine here that the y-axis
is the age of the profile, and the x-axis is whatever filter you can imagine, like the, I don't know, it can be the time the user has been registered on Badu, and you start to see that there are some patterns that the kind of profile that you're looking for, or the kind of data that you're looking for, fall more in one of these sites.
So you just need to plot like a line or a separator, so you can identify that. And this can be used for, like the classical example is from a spam, whenever you go to your inbox, and you see some spam that has been classified, they're using the same technique, just having some filters, they represent them, they plot a line, or whatever, and at some point, they are able to distinguish which one are the goods from the bad.
This is done also on cancer prevention, for example, they analyze images of tumors, and they decide based on these models whether this is cancer or not, a malignant cancer or not, for credit payments, whether someone is gonna return your credit. You always, in this sort of project,
that is the most used one, and the easier to start, I would say, what you need is someone who really labeled, or in some way that you read labels to the data, you know that certain users didn't pay, that certain things like that, and what you do is just try to replicate the same decisions that were being made,
you start to replicate it in the same way. A regression, you probably know exactly this quite similar thing, but let's say I want to know the price of this building, and based on the squared meters it got. So you can plot all these plots, all these dots, and then you do the linear regression. We can also do it in robotics.
I'll quickly show a video.
You've got an idea. This is like the state of the art in robotics. It's a different technique. It's usually named reinforcement learning, that it's a way that the robot keeps learning based on experience, so the robot it uses at the beginning on the training stage, it keeps falling, and then even if you kick it, it doesn't fall. It wouldn't surprise me. I think one of the concerns in artificial intelligence
is whether robots are gonna control and make their slaves at some point. I think it will make sense after seeing how they treat this poor robot. Yeah, more applications, recommendation systems. Whenever you go to Amazon, they are using all these data science, artificial techniques to see what one customer,
both actually it's the same that everybody else who is buying this product is buying, so you'll find these patterns on this association. Then clustering is similar to what I was mentioning before about classification, but in this case, the whole point is that you don't know the labels. You know that you have two different types of users.
I'll talk about dating sites again. Let's say that I have the users who are interested mostly in sex, and the users that are interested mostly in sex, but in a more subtle way, and we really want to identify them, but I don't have labels. Nobody is gonna just review them and say, okay, we have this. So you have these techniques to train models,
and these models at the end are able to really split the data by itself on finding the distance between different groups, saying, okay, there is this group that is quite similar, but at the same time, another group that is quite similar among them, but they are very different groups, so it's about finding these patterns. Network analysis is quite another interesting topic. They are able to predict, for example,
some problems with sicknesses like the flu. I think it was like some of the variants of the flu that they started to detect based on the network of flight connections when it was gonna start the first cases in every single country, and they are predicting that with great accuracy. Also, for the banking system, they check the transactions among banks,
and they say, okay, if this bank is defaulting, this other bank is gonna default because they make so much transitions together. Image recognition. It's quite challenging topics in many cases. In this case, the idea is distinguish, I think this is a Chihuahua, I'm not sure, which kind of dog, and there is muffins, and yeah, deep learning is managing to identify
which is a muffin, which is a dog, even probably better than a person, because in some cases, it becomes quite tricky, I would say. Also, self-driving cars is like the state of the art. Probably you've heard about them. They are already in the stage in several places. They are already driving autonomously. They have these cameras, these sensors,
and they are really being able to, even in cities, stop when there is a pedestrian who wants to cross and all that. Finally, I'll just briefly mention, I was mentioning before, there are many projects in Python that you can use. The more famous ones, you probably know already about Pandas, NumPy, which is like the internal representation of any of this software
at the end. You have Jupyter, and probably you already know about this one. It's just like a web application. It's like a Python terminal, but web-based, and it's very useful when you are in data science, because you usually plot things. Also, you have PyMC3s for Bayesian and probabilistic
models, quite advanced one. Gensim is a pretty cool one. It's being used for topic modeling, so you can really find the topics of conversations, what I was mentioning before, which are the topics that people is talking about. That's it. I'm just a quick mention where I think you're organizing something related to data on Saturday.
Probably, it would be partly a Pandas spring, so people who wants to contribute to Pandas, but if people is new to Pandas or new to data science, we can also do a tutorial and things like that, so please get in touch with me if you're interested. I'll be around on Saturday just in case anyone
wants to contribute to these projects or just get a tutorial or discuss about that. Yeah, that's it. Thank you very much. Yeah, and we have a couple of minutes for questions if we have any.
I guess no one really understood anything. It wasn't as gentle as I pretended. We can maybe get you both in if you're quick. Hi, I don't know if my question is really related
to mostly on computers, but on the other side of the human brain. So you said that when there is an image or a pattern that it's already, I mean, your server has already seen more neurons activated than, I mean, a new image.
So my question is why do we feel much more tired when we learn new things? At least instead of, you know, my guess would be if there are more neurons that are activated, one should be like more tired. Why is the opposite?
I didn't quite get the question. Why the number of neurons activated change? So my question is, you said that I was expecting that you get more tired, at least intellectually, where more neurons are activated, okay? But if I can, I mean, I would guess at the same time
that when I'm learning new things, I get much more tired. Okay. Okay, but why is not the case, I mean? Well, I think there are two different steps in this process to say. One is about training. I would say that when you feel tired,
it is because if you are seeing something new, you go to your new day job and that, of course, you feel much more tired because everything is new. So your neurons are actually kind of training in this training stage. So all the connections need to be activated, need to be created. So you have new connections that some neurons that are not connected among them and they are getting activated for the first time.
So I would say it's related to that. And then when you are doing something repetitive, it's like just your neurons are just already being activated, the ones you are using. So it's just like the information just flows. And I'm not a neuroscientist. I think it's kind of a question for more for a neuroscientist, but yeah,
I think that would be probably the reason. And I have another question. Do you use any other languages than Python, like machine learning, like the language is designed for machine learning, like Prolog or something? I think all the, I don't use much other languages than Python, actually.
I think everybody is like using C because NumPy is mostly program in C. There is also Cython that is quite the standard for all those things. So I would say that I would use Pandas. I would use something that is not Python for in terms of performance because for coding, I love Python, so I would never really like
to have to use another language. In some cases, there is, for example, TensorFlow, Theano, all these things, some of them. I think Theano is partly writing in Python, but TensorFlow, it's not. But usually you have bindings. At the end, I think Python is very good with the interaction with the user, with the programmer. I think it's the most beautiful code you can read is in Python more than in C or anything else.
So yeah, I think it's like the training is moving there. You have Stan, for example, for probabilistic programming. You have R that is quite good and quite used in this world, but in my case, I'm not using anything else, and I think you can really do a lot of data science just with Python. That's great, thanks, and that's all the time
we have for this talk, so thanks again.