We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Introduction to TensorFlow

00:00

Formal Metadata

Title
Introduction to TensorFlow
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Introduction to TensorFlow [EuroPython 2017 - Talk - 2017-07-14 - Anfiteatro 1] [Rimini, Italy] Deep learning is at its peak, with scholars and startups releasing new amazing applications every other week, and TensorFlow is the main tool to work with it. However, Tensorflow it's not an easy-access library for beginners in the field. In this talk, we will cover the explanation of core concepts of deep learning and TensorFlow totally from scratch, using simple examples and friendly visualizations. The talk will go through the next topics: • Why deep learning and what is it? • The main tool for deep learning: TensorFlow • Installation of TensorFlow • Core concepts of TensorFlow: Graph and Session • Hello world! • Step by step example: learning how to sum • Core concepts of Deep Learning: Neural network • Core concepts of Deep Learning: Loss function and Gradient descent By the end of this talk, the hope is that you will have gained the basic concepts involving deep learning and that you could build and run your own neural networks using TensorFlow
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
SoftwareIntelWebsiteDemonDataflowLibrary (computing)GoogolRevision controlInstallation artIntegrated development environmentDirected graphLaptopMathematical optimizationGraph (mathematics)Data modelPredictionOperations researchError messageFunction (mathematics)AlgorithmVariable (mathematics)outputProcess (computing)Artificial neural networkComputer virusLaptopData miningComputer virusInsertion lossoutputSocial classDifferent (Kate Ryan album)Error messageWave packetPredictabilityEndliche ModelltheorieComputer-assisted translationMedical imagingGraph (mathematics)Directed graphClassical physicsLibrary (computing)CASE <Informatik>WindowGraph (mathematics)MereologyWordFunctional (mathematics)Revision controlProjective planeFlow separationMathematical optimizationLogic gateWhiteboardSet (mathematics)Cost curveComputational intelligenceVirtual machineReal numberVariable (mathematics)Operator (mathematics)Open sourceType theoryData structureProcess (computing)Connectivity (graph theory)Sign (mathematics)PlanningArithmetic meanVideo gameDamping10 (number)DataflowIntegrated development environmentNormal (geometry)Beta functionWater vaporFree variables and bound variablesPulse (signal processing)Quality of serviceMathematicsBimodal distributionFunctional (mathematics)RandomizationInformation technology consultingComputer animationLecture/Conference
DataflowGraph (mathematics)Free variables and bound variablesLinear regressionFunction (mathematics)Linear mapVariable (mathematics)Artificial neural networkMathematical optimizationSquare numberGradient descentMaxima and minimaoutputAdditionoutputIntegerFunction (mathematics)Endliche ModelltheorieVariable (mathematics)Operator (mathematics)CodeFree variables and bound variablesFunctional (mathematics)Linear regressionSquare numberLaptopGraph (mathematics)Functional (mathematics)Casting (performing arts)Wave packetDifferent (Kate Ryan album)Gradient descentMultiplicationArrow of timeMaxima and minimaCodeVirtual machineAngleMachine visionModule (mathematics)DampingArtificial neural networkDistanceParameter (computer programming)Bit ratePoint (geometry)NumberRootCASE <Informatik>Medical imagingDirection (geometry)Social classError messageSummierbarkeitLink (knot theory)PredictabilityMathematicsComa BerenicesNonlinear systemInstance (computer science)Well-formed formulaState of matter10 (number)Core dumpNegative numberCost curveMilitary baseWeightDataflowSoftware testing
Cloud computingLinear regressionDataflowSoftware testingWeightData modelNumberInformationArtificial neural networkFunction (mathematics)Mathematical optimizationMachine learningSocial classComputer-assisted translationWordSummierbarkeitGoodness of fitSlide ruleRepository (publishing)Multiplication signLoop (music)Wave packetFunction (mathematics)MultilaterationDataflowNumberTensorArithmetic progressionTraffic reportingArtificial neural networkCombinational logicMaxima and minimaBit rateResultantMathematical optimizationEndliche ModelltheorieCASE <Informatik>PredictabilityInformationAreaGraph (mathematics)Functional (mathematics)DecimalVariable (mathematics)WeightConnectivity (graph theory)Cost curveCodierung <Programmierung>Software testingoutputCodeRobotics
DataflowGraph (mathematics)Cartesian coordinate systemMeta elementVirtual machineLibrary (computing)Interface (computing)Wave packetMachine learningProcess (computing)Linear regressionPoint (geometry)AlgorithmUser interfaceMedical imaging10 (number)Natural numberComputer animation
Lecture/Conference
Projective planePresentation of a groupGoogolAreaLevel (video gaming)Population densityLecture/Conference
Transcript: English(auto-generated)
Hello. Lots of people. Too much. First of all, thanks Pavel for the laptop. Mine is broken, so thanks a lot.
Introduction to TensorFlow. Okay, let's start with a question. What is this? It's a cat. That was an easy question. But is that easy for a computer? You know, a computer is a machine that makes computations, the things with mathematical operations.
So the real question is, is there a mathematical relationship between this input, an image of a cat, and the target, a class cat? And the answer is yes. But it's very complex. And we're going to learn this complex relationship using tons of examples.
And this learning a very complex relationship using tons of examples is a fairly well first definition of deep learning.
And you know we're here in Rimini, EuroPython 2017. We love Python. So we want to make deep learning with Python. And what is the best tool for that? It's TensorFlow. So what is TensorFlow? TensorFlow is an open source library for deep learning.
It's mainly used in Python. And it was released by the Google Brain project two years ago. But the first, the 1.0.0 version was not launched until February of this year. So installation, quickly.
The best practice is to download Anaconda and then create a new environment with classic data science libraries. And then there, deep install TensorFlow. For Windows it's the same, but without the source word.
Okay. Concepts. Now we enter the most important part of the talk, the concepts. Because TensorFlow could be a difficult tool for beginners if you don't understand the basic concepts of deep learning and how TensorFlow works.
So recall this. We have the cat, the image of a cat, and we have the class cat. And we want to find the mathematical relationship between these two. That mathematical relationship, we call it the model.
And this model is going to make predictions given the input. Sometimes at first it's going to be random predictions. Sometimes it could say it's a cat and sometimes it could say it's a non-cat.
Well, so we have the input, the image of a cat, and we have the model that is going to make predictions given the input. And we have the target that is the correct class for that image. The prediction should match the target, but it doesn't.
So what we're going to do is to change the model so it gets better and the predictions match the target. So the first step for this is to compute the difference between the prediction and the target.
And this is going to be done by the cost function or loss function. This cost function is going to produce an error and this error is like how far are we from having a good model. The greater the error, the greater we have to, the more we have to change the model.
So we're going to learn from errors. That's life, learning from errors. Sometimes you win and sometimes you learn. And the guy, finally the guy in charge of changing the model, of training the model is the optimizer.
Okay, so this is the basic structure of a learning process in deep learning. And this is what TensorFlow calls the graph.
And the graph is just a layout which contains both the model and the learning process. Okay, so the graph is totally independent from the data, but there is really a connection with the data.
So because the graph is nothing without data, we couldn't have predictions without input data and we couldn't have learning without targets. So we set two gates that are going to, one gate is for the inputs and one gate is for the targets.
And these gates are going to let the data to come in, but not all types of data, only the data that we want. So in this case we want images for inputs and classes for targets.
These gates are called placeholders. Well, quick summary of what a graph contains. Well, there is a green board there, variables, because the model is just a set of variables and we're going to vary these variables.
We're going to change these variables to make the model better. And keep this word in mind because TensorFlow uses it. Well, we have the graph with the placeholders, but we want the data to come into the graph. So what we do is to open a TensorFlow session.
Well, when we're in a TensorFlow session, we say we're feeding the graph with data. Okay, so example, we have this cat. This cat goes through the model and the model says it's a non-cat.
And that's not correct, so the cost function says we have an error of 100. The optimizer reads this error and says we need to train the model and we train the model. In all the case, we have the cat and the model says it's a cat, so the cost is zero
and the optimizer does nothing. Well, that's the main part of the talk, but we're going to see several cases of use in order to stay with the concepts, okay?
Learn the concepts better. It couldn't be otherwise. The first thing we're going to do is a hello world. Not really a hello world because we're not going to print hello world. We're going to add two integers. And the first thing is to import TensorFlow.
This is a convention imported as TF. And this is the graph we're going to build. We have two placeholders, one for one integer and the other for the other integer and the addition operation. So, this is the code.
We set the placeholders that is going to expect integers and we have the addition operation that is like a function of TensorFlow and that's the graph. And independently there's the data. We have number one, there's three, number two, there is eight.
And this is the session. The session is something that you open and you close, so we use the with keyword and we're going to run the sum operation and now we're going to feed the graph, feed with a dictionary
that links each placeholders with each data. So, this is the output for a Jupyter notebook and we see how it works. Three and eight is 11, so perfect.
But this is kind of boring because we're not learning. How can we make this thing more interesting? The next case is going to be a regression problem. You know you're in a regression problem when your outputs are not classes like cat, dog, fish.
You have output, there are numbers like two, minus three, six point seven, number pi, square root of two, that kind of thing. And our case is to learn how to sum. So, what we're going to do is to take two inputs and an output
and we're going to learn the mathematical relationship using 10,000 examples. So, these are the examples and we can see clearly that the first,
the integers are up to 13, so we see, we're seeing the relationship. So, this is kind of silly. But, you know, in a regression problem, you always have the same philosophy. So, we're now learning how to sum, but say we are in a self-driving car,
the first input would be like an image taken from the camera in front of the vehicle and the second input would be the distance taken from a laser in front of the vehicle. And the output is the angle you need to steer the vehicle to not crash
or not get out of the lane. But we're going to keep with the vision example. So, what we're going to assume is that the relationship between the output and the input is a linear function, that is, an addition and a multiplication.
So, these are the variables we're going to learn. We're going to be changing these variables so it gets better. And in this case, we're okay with a linear function. But if we wanted to learn a more complex relationship,
we just have to add another linear function, add another layer. And if we wanted to make it even more complex, we stack another layer. And if we want to make it even more complex, we add non-linear functions or activation functions.
So, that's a neural network. Okay? Easy. Well, it's the code. We set the placeholders. We're expecting floats. We're expecting two numbers, two integers for the inputs and one for the output.
And the known is because we don't know how many examples are we going to receive. So, we don't restrict the number. Okay? The model is just the two variables. We initialize it randomly and we make the linear function with a multiplication and the addition.
Well, so we have the placeholders and we have the model. Next thing is the cast function. The cast function computes the difference between the prediction and the target.
So, the most intuitive thing is to make the difference. But what we're going to do is to get rid of negative numbers. So, we square it all and then we sum it all. Sum all the errors from all the examples and reduce it to one number.
Okay? So, this is the code we need. We square it. We make the difference. We square it and reduce the sum. Well, let's say that our cast function can be plotted like this.
And, well, the high and the red things that are more error and the down peak is the minimum we want to reach. What we're going to do is let's say we have an error of 10 and we're going to get down to the down peak.
So, we're going to take the direction of maximum steep. That is the gradient. And we follow that direction again and again until we reach the minimum. This is the code. TensorFlow gives you a model of gradient descent and there's a hyper parameter.
There is learning rate and that is like the module of the arrow. So, you could have a graded arrow and then you could reach the minimum faster. But maybe you pass by the minimum and you oscillate by the minimum, even going unstable.
And if you have an arrow that is smaller, maybe you don't reach the minimum because it's too slow. Okay, we minimize the codes and we have it all.
We need the data. What we're going to do with the data is typical of machine learning. That is you split the data in two, one for training and one for testing. And the training is what we're going to use. So, I built helper functions for this so we don't have to bother about the data. That's for another talk.
And the session. Okay. In the session, we're going to feed the graph with the training data. So, let's do it. First thing, we start an initializer of the variables and then we run the optimizer.
Running the optimizer, you're going to learn. And we feed it with the training data and we run all the training data. A lot of times, when you run the data one time, you say that's an epoch.
So, we have this for loop that's going to train the neural network numbers of times. Okay. That's the epoch. And we see that we have an accuracy of 95% and then the sum of 5 plus 7 is almost 12.
But if we take a look to the weights, that's not a sum. A sum is just one, one and the value is zero. That means that we have over-feeding the neural network to just make good summation for our data. Okay. Now, a classification problem.
In a classification problem, you're not going to use numbers, you're going to use classes. So, we have a cat that could be a class cat. We have another thing that could be a class non-cat. That cat and non-cat are words.
We don't work with words. So, we just transform it. Transform it in a way that a class is a component of an array. In this case, a cat is like the second component of the array and a non-cat is the first component of the array.
This is called one-hot encoding. And this 0110 is just for targets. But what could be our predictions? Our predictions could be probabilities. And sometimes, the model is going to be very sure that the input belongs to a certain class.
In this case, we have the cat and the model is 82% sure that it's a cat. But sometimes, we're not going to be that sure. But that summation is going to be one.
Okay. Well, our case now is going to be learn. We have two integers. We're going to sum it and we're going to classify it if they are greater than 10 or lesser than 10.
If they are greater, we're going to say that it belongs to the second class and they are lesser. It's going to belong to the first class. This is a silly example, but it works and it's good for learning. So, this relationship is more complex than the before.
So, we need all the layers. And interestingly, the first layer is going to compute the sum and the second layer is going to classify that sum into greater than 10 or lesser than 10.
And this is going to happen always in all classification problems. The first layers are going to extract more basic features, more basic information and the next layers are going to work with this basic information to produce even more complex information.
So, we stack a submax, a nonlinear function at the end because we want that the output are probabilities that sum up to one. So, okay, we have built the model and in this case, in the cost function,
we're not going to use the cost function of before. We're going to use a cross-entropy cost function that is beyond of this talk and the optimizer we're going to use, better optimizer is Adam optimizer that works a lot better
because it goes changing the learning rate. So, first it's going to be a big learning rate and then it's going to be smaller. So, these are the results. We have a test inaccuracy of 100, that's bad.
We have done something bad, but it works. 5 plus 3 is, the model is 89% that is lesser than 10. 7 plus 6 is 87% that is sure that it's greater than 10 and the same for 10 plus 10.
And we see how in the first layer, it's like a sum but it's with a negative and in the second layer, it's the same number in the weight but with a negative in one of them.
That's going to classify the output of the first layer. So, okay, that's all. That, if you want to know more, I recommend you to follow the Neural Networks and Deep Learning book from Mike Nielsen
and the Stanford CS231N from Andrej Karpathy. You're going to learn a lot. And for TensorFlow, there's a lot of tutorials about TensorFlow but I thought that they missed the basics. That's why I did this talk. Okay, if you're like me last year and you didn't know nothing about data,
so let's start with this and you're going to be improving. Well, that's where it's all the code there is. It's a work in progress. I'm learning because I'm a robotics engineer
and my goal is to combine the robotics with artificial intelligence. So, if you are in this field, just talk to me and we'll be best friends. And my favorite combination is self-driving cars.
So, I hope that you could build the self-driving cars just with TensorFlow and using the basics that I've given you, you're going to improve and improve and build the self-driving car. Okay, thank you.
Thank you, Alejandro. Are there any questions? Please raise your hand. Thank you. Can you show that previous slide with the repository, please?
Our previous one? Yes, thank you. Easy. Anyone else?
Hi there. Thank you for the nice talk. How much data do we actually need to do a decent training? I mean, training is really dependent on what you have and how you can build it. How much data should we feed it to provide something useful, if there is any clear answer on that?
How much data do I need? How much of the training process would we have to do up to the point that it's actually something useful rather than just a simple algorithm? So, how much data do I need to make something useful? Yeah, if there is any guideline. You're going to need tons of data. It depends on the application.
Well, there's a cool graph in which you see that machine learning, machine learning algorithms works way better than deep learning algorithms when you have small data. When you have tons of data like terabytes of data for images,
terabytes of data for sound, then deep learning is way better than machine learning and it's going to do impressive things. Yeah, thanks for the incredible talk. I wanted to ask whether you have any comments about comparing TensorFlow to Theano, for example.
That's one question and the next one is, there are so many meta libraries which build on top of TensorFlow like Keras or something, which provide people with an easier interface to do a regression. Would you comment on when to use TensorFlow versus such a meta library like Keras?
Wait, can you please take the microphone? Not audible, sorry. So, the first one was, how would you compare TensorFlow to Theano? Do you know Theano?
Yes, well, I haven't worked with Theano but, well, they say that Theano is more, you can be, it's more low level, so you can be more creative, it's more like, yes, more low level, it's much for scientists. Maybe let me ask you this, what made you make this presentation
as an introduction to TensorFlow and not an introduction to Theano? What? What made this presentation an introduction to TensorFlow versus an introduction to something else? Like, what made you choose TensorFlow over the other things? I can't understand what's the question.
Never mind, it's okay. Thank you. Well, because it's cool. It's from Google and, well, Google maintains great their projects.
Anybody else? Question? No? Okay. So, thank you again very much.