Deep Learning with Python & TensorFlow
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 148 | |
Number of Parts | 169 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/21151 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Point cloudGoogolSoftware developerComputing platformSoftware developerSlide ruleRight angleInvariant (mathematics)Sheaf (mathematics)Cloud computingRule of inferenceLecture/ConferenceComputer animation
00:51
BitTwitterComputer engineeringQuicksortMobile appAverageMereologySource codeArtificial neural networkType theoryBuildingVirtual machineLatent heatTerm (mathematics)Projective planeSoftwareComputer-assisted translationOpen source3 (number)Level (video gaming)Machine learningLecture/ConferenceComputer animation
03:25
outputPixelArtificial neural networkPoint (geometry)Numeral (linguistics)outputRepresentation (politics)PixelComputer-assisted translationOperator (mathematics)SoftwareFunction (mathematics)Connected spaceLinear regressionLink (knot theory)Virtual machineClassical physicsSocial classQuicksortBlack boxFunctional (mathematics)Line (geometry)Goodness of fitArtificial neural networkMachine learningComputer animationLecture/Conference
04:59
Artificial neural networkFunction (mathematics)BitLinear regressionScalar fieldSoftwareDemo (music)outputLecture/Conference
06:00
BitBefehlsprozessorMultiplicationConnected spaceWave packetFunction (mathematics)1 (number)Category of beingLinearizationPosition operatoroutputSoftwareWeightNegative numberPresentation of a groupLine (geometry)ResultantMusical ensembleMultiplication signInverse elementSign (mathematics)Order (biology)Medical imagingArtificial neural networkMultilaterationData structureTriangleSpiralSelectivity (electronic)Linear regressionTheoryType theoryRight angleDemo (music)WordAdditionProper mapQuicksortProgram flowchart
10:46
Artificial neural networkPoint (geometry)Artificial neural networkDemo (music)Operator (mathematics)SoftwareFluid10 (number)Web browserProcess (computing)MultiplicationMatrix (mathematics)Type theoryTensorFunctional (mathematics)Binary multiplierFunction (mathematics)Computer animationLecture/ConferenceDiagram
11:56
Matrix (mathematics)TensorView (database)Point (geometry)Order (biology)Multiplication signPredictabilityQuicksort10 (number)WeightVirtual machineType theoryRevision controlVector spaceLecture/ConferenceEngineering drawingDiagram
12:44
Vector spaceMatrix (mathematics)Revision controlEuklidischer RaumVector spaceProgramming languageMatrix (mathematics)QuicksortSpacetimeDivisorTwo-dimensional spaceComputer programmingNumberType theoryTensorDimensional analysisAdditionMultiplicationSoftwarePower (physics)Operator (mathematics)Connected spaceArtificial neural networkLecture/ConferenceComputer animation
13:56
Maxima and minimaMatrix (mathematics)outputVector spaceResultantWeightTensorSoftwareOperator (mathematics)Form (programming)Multiplication signFunction (mathematics)Pattern languageMultiplicationWordComputer animation
15:07
Maxima and minimaMultiplicationAdditionSoftwareQuicksortLevel (video gaming)1 (number)Functional (mathematics)Form (programming)Function (mathematics)Lecture/ConferenceComputer animation
15:53
SummierbarkeitPredictabilityFunction (mathematics)MereologyCoefficient of determinationComputer-assisted translationoutputSoftware testingOperator (mathematics)Endliche ModelltheorieQuicksortMedical imagingSoftwareRepresentation (politics)Motion captureRight angleOnline helpLecture/Conference
17:21
1 (number)Function (mathematics)Coefficient of determinationSoftware testingComputer-assisted translationResultantDifferent (Kate Ryan album)Functional (mathematics)Insertion lossSoftwareFigurate numberDirection (geometry)Lecture/Conference
18:13
NumberSoftwareFunctional (mathematics)WeightBitInsertion lossMathematical optimizationFunction (mathematics)Direction (geometry)Gradient descentResultantDifferent (Kate Ryan album)Entropie <Informationstheorie>
19:18
Artificial neural networkTheory of relativityMachine learningTheoryError messageMedical imagingNumberBit rateTerm (mathematics)SoftwareMultiplication signWave packetOnline helpVirtual machineDirection (geometry)Lecture/Conference
20:45
Machine learningLevel (video gaming)Artificial neural networkWave packetBitSoftwareDiagramLecture/Conference
21:45
Data modelReal numberSoftwareArtificial neural networkDifferent (Kate Ryan album)Operator (mathematics)Matrix (mathematics)Function (mathematics)MultiplicationMedical imagingFlow separationEndliche ModelltheorieTensorQuicksortVector spaceLecture/ConferenceComputer animation
22:55
Operator (mathematics)Medical imagingSoftwareMultiplicationOrder (biology)Matrix (mathematics)TensorPredictabilityFunction (mathematics)Multiplication signGraphics processing unitLecture/Conference
23:44
Wave packetMatrix (mathematics)BefehlsprozessorSupercomputerWave packetMathematical modelSupercomputerSingle-precision floating-point formatResultantRight angleOrder (biology)Virtual machineMainframe computerCartesian coordinate systemArrow of timeMultiplication signMachine learningLecture/Conference
24:54
Formal languageGoogolScalabilitySystem programmingScale (map)Computer networkVertex (graph theory)DataflowTensorGoogle Street ViewDigital photographyMedical imagingGoogolProjective planeNumberVirtual machineDescriptive statisticsEndliche ModelltheorieDirectory serviceComputer fileDataflowComputer animationDiagram
25:40
Library (computing)Open sourceVirtual machineTensorArray data structureSoftware frameworkGraph (mathematics)DataflowAutomatic differentiationQueue (abstract data type)Thread (computing)Wave packetBefehlsprozessorRun time (program lifecycle phase)Data structureCore dumpMilitary operationLocal GroupOpen sourceArtificial neural networkLibrary (computing)Virtual machineSoftwareConstructor (object-oriented programming)Machine learningType theoryNumberProjective planeIntegrated development environmentFood energyWave packetCore dump10 (number)TunisDataflowGraph (mathematics)BefehlsprozessorTensorComputer animationLecture/Conference
26:50
TensorDataflowGraph (mathematics)Military operationLocal GroupCore dumpData structureIntegrated development environmentOperations researchObject (grammar)Logical constantAbelian categoryControl flowMathematicsMatrix (mathematics)BuildingBlock (periodic table)SynchronizationQueue (abstract data type)Variable (mathematics)Exponential functionDivision (mathematics)LogarithmComputer multitaskingRankingInverse elementSigmoid functionMaxima and minimaConvolutionOperator (mathematics)Graph (mathematics)TensorSoftwareWave packetData structureType theoryVariable (mathematics)Logical constantDataflowMathematicsoutputFree variables and bound variablesWeightLibrary (computing)QuicksortMathematical model10 (number)Core dumpConnected spaceLecture/ConferenceComputer animation
27:57
Abelian categoryOperations researchControl flowMathematicsMatrix (mathematics)BuildingBlock (periodic table)SynchronizationQueue (abstract data type)Variable (mathematics)LogarithmExponential functionDivision (mathematics)Shape (magazine)Computer-assisted translationComputer multitaskingRankingLogical constantInverse elementConvolutionMaxima and minimaSigmoid functionInfinityMetropolitan area networkElectronic mailing listInclusion mapExt functorWave packetOperator (mathematics)Interface (computing)MereologyElectronic mailing listCore dumpTensorLaptopArea10 (number)Exclusive orAdditionDataflowMultiplicationDifferenz <Mathematik>Matrix (mathematics)MathematicsState of matterDivision (mathematics)Lecture/ConferenceComputer animation
28:58
Function (mathematics)Inclusion mapMaxima and minimaGamma functionMultiplication signKernel (computing)DataflowWave packetMathematical modelMedical imaging10 (number)Type theoryNumberVirtual machineSpeciesLecture/ConferenceComputer animation
30:20
Shape (magazine)Shape (magazine)PixelLinearizationExecution unitArray data structureRange (statistics)Medical imagingLecture/ConferenceComputer animation
31:06
SimulationInfinityShape (magazine)Conditional-access moduleShape (magazine)Vector spaceWave packetMedical imagingNumberFood energyWeb pageGraph coloringoutputPlotterCASE <Informatik>BitParticle systemLine (geometry)Exception handlingUniform resource locatorRight angleComputer animation
32:35
Medical imagingConnected spaceWave packetPixelWeightNumberCASE <Informatik>Software1 (number)AreaOrder (biology)Lecture/ConferenceComputer animation
33:51
Metropolitan area networkNichtlineares GleichungssystemLogistic distributionGoodness of fitPixelSoftwareArtificial neural networkoutputNumberFree variables and bound variablesWeightEndliche ModelltheorieSingle-precision floating-point formatMathematicsVariable (mathematics)10 (number)DataflowMatrix (mathematics)Multiplication signMultiplicationComputer animationLecture/Conference
34:39
Maxima and minimaGamma functionSummierbarkeitTrailConditional-access moduleArmNichtlineares GleichungssystemPersonal area networkMultiplicationWeightoutputVariable (mathematics)DataflowEndliche ModelltheorieRepresentation (politics)Wave packetGradient descentDirection (geometry)NumberFree variables and bound variablesEntropie <Informationstheorie>Function (mathematics)Functional (mathematics)Artificial neural networkInsertion lossEntire functionPrime idealComputer animationLecture/Conference
36:00
Variable (mathematics)BitMultiplication signWave packetDirection (geometry)Insertion lossError messageFunctional (mathematics)StapeldateiSubsetGradient descentMaxima and minimaTotal S.A.Lecture/ConferenceComputer animation
36:58
Internet forumSoftware testingSet (mathematics)Installable File SystemVariable (mathematics)Gamma functionStapeldateiMultiplication signWave packetCASE <Informatik>Goodness of fitSubsetEntire functionArchaeological field surveySampling (statistics)Lecture/ConferenceComputer animation
37:52
Software testingLetterpress printingMaxima and minimaCAN busMetropolitan area networkVariable (mathematics)Hand fanArtificial neural networkCASE <Informatik>Word10 (number)MereologyNeuroinformatikView (database)MappingWave packetDataflowCore dumpGraphics processing unitBefehlsprozessorComputer animation
38:41
WhiteboardTensorConditional-access moduleTetraederWave packetMaxima and minimaEmulationMoving averagePersonal area networkWeightOperator (mathematics)CASE <Informatik>BitVirtual machineMultilaterationMereologySoftwareMedical imagingWave packetCore dumpSelf-organizationSound effectLecture/ConferenceComputer animation
39:58
Conditional-access moduleConvolutionElectronic data interchangeFinite element methodWide area networkView (database)Maxima and minimaMoving averageNumberUniform resource locatorConvolutionSoftwareMedical imagingMereologyPixelBitRandomizationDifferent (Kate Ryan album)WeightKernel (computing)TensorFunction (mathematics)BuildingoutputQuicksortType theoryRepresentation (politics)Maxima and minimaSummierbarkeitFlow separationSpacetimeInformation securityCASE <Informatik>Lecture/ConferenceComputer animation
43:11
Variable (mathematics)EmpennageWave packetMereologySoftwareFunction (mathematics)Gradient descentWave packetMathematical optimizationNetwork topologyRegular graphGradientEndliche ModelltheorieScaling (geometry)Lecture/ConferenceComputer animation
44:12
Wave packetMaxima and minimaSpecial unitary groupIterationMultiplication signCASE <Informatik>Order (biology)Function (mathematics)StapeldateiRight angleSoftwareLecture/ConferenceXMLComputer animation
45:06
Software testingWhiteboardTensorHistogramGraph (mathematics)PlastikkarteMetropolitan area networkCore dumpWritingRepresentation (politics)LoginCartesian coordinate systemGraph (mathematics)Computer fileWave packetFunction (mathematics)Insertion lossGraph (mathematics)Functional (mathematics)WhiteboardDataflowComputer animation
46:01
Wave packetGraph (mathematics)SoftwareGrass (card game)Artificial neural networkCASE <Informatik>Inverse elementInsertion lossRevision controlEntropie <Informationstheorie>Variable (mathematics)Functional (mathematics)Correspondence (mathematics)Closed setLecture/ConferenceComputer animation
46:53
Entropie <Informationstheorie>Object (grammar)MereologyWordFunctional (mathematics)Insertion lossEndliche ModelltheorieSoftwareRepresentation (politics)Medical imagingCASE <Informatik>WeightZoom lensGraph (mathematics)LoginMachine codeFunction (mathematics)outputMaxima and minimaOrder (biology)BuildingLecture/ConferenceProgram flowchart
48:18
TensorDataflowWave packetDifferent (Kate Ryan album)Artificial neural network10 (number)Line (geometry)DataflowLecture/ConferenceComputer animation
49:13
Data modelMathematical modelParallel computingVirtual machineSineSummierbarkeitDataflowTensorScheduling (computing)Mathematical optimizationLocal ringBitGeometric quantizationReplication (computing)Graph (mathematics)Wave packetServer (computing)Ordinary differential equationMultiplicationLibrary (computing)Endliche ModelltheorieParallel portDifferent (Kate Ryan album)Computer hardwareSoftwareDataflowMathematical modelType theoryVirtual machineSlide ruleMereologyLecture/ConferenceComputer animation
50:09
WindowMathematicsGraph (mathematics)Maxima and minimaHash functionStorage area networkWater vaporPhysical lawDataflowTensorMathematical optimizationScheduling (computing)BitGeometric quantizationSimultaneous localization and mappingSineSpecial unitary groupEmulationLocal ringTask (computing)Maß <Mathematik>Wechselseitige InformationParallel computingMathematical modelReplication (computing)Process (computing)Wave packetServer (computing)Data modelIndependence (probability theory)Cloud computingInformation managementIntrusion detection systemRow (database)Heegaard splittingEndliche ModelltheorieNumberVirtual machineSubgraphSynchronizationType theoryComplete graphDifferent (Kate Ryan album)Parallel portWave packetServer (computing)Computer-assisted translationWeightRevision controlNoise (electronics)IterationParameter (computer programming)Mathematical modelService (economics)BitTerm (mathematics)DataflowOperator (mathematics)Lecture/ConferenceComputer animation
52:50
Hill differential equationCoefficient of determinationSoftwareOrder (biology)Network topologySoftware testingMachine learningWave packetFigurate numberExpert systemEvent horizonTwitterMultiplication signTrailLecture/ConferenceComputer animation
54:12
NumberParameter (computer programming)Connected spacePotenz <Mathematik>Server (computing)SoftwareScaling (geometry)Moment (mathematics)Stack (abstract data type)Virtual machineTelecommunicationOrder (biology)WordWave packetComputer hardwareComputer animationLecture/Conference
55:36
Graph (mathematics)DataflowTensorLinear regressionMachine learningPoint cloudPredictionDisintegrationGoogolComa BerenicesProcess (computing)Execution unitSicGeometric quantizationBit10 (number)Forcing (mathematics)Operator (mathematics)Computer hardwareDigital photographyPlanningExecution unitReal numberTensorOrder (biology)Graphics processing unitGraph (mathematics)Mereology
56:29
TensorVirtual machineDataflowSoftwareSource codeLemma (mathematics)WebsiteDataflow10 (number)Different (Kate Ryan album)1 (number)Goodness of fitLevel (video gaming)Term (mathematics)Virtual machineGraph (mathematics)AlgorithmArtificial neural networkMereologySystem callMoment (mathematics)Right angleMathematical modelParameter (computer programming)2 (number)QuicksortType theoryMultiplication signBlock (periodic table)Product (business)Visualization (computer graphics)WordLogicMultiplicationLecture/ConferenceXML
Transcript: English(auto-generated)
00:02
So this is the Python and TensorFlow talk. Thank you very much for coming to this hour long session right after lunch. I know many of you are gonna start getting sleepy right around the 40 minute mark, 30 minute mark. So do your best to stay awake. I'll do my best to keep you awake.
00:22
So let's kind of work together to get through the talk. So if the slides will advance. Just to introduce myself, my name is Ian Lewis. I'm a developer advocate at, excuse me, at Google.
00:44
I work on the Google Cloud Platform team so that kind of encompasses all of Google Cloud Platform so if you people are, not you people, but if you guys are familiar with things like App Engine or Compute Engine or that sort of thing,
01:00
that's what Google Cloud Platform is. And so I'm on Twitter at Ian M. Lewis. I've been tweeting like sort of throughout the conference so you should be able to find me fairly easily on Twitter. And just a little bit more of background about myself.
01:21
I'm based in Tokyo, Japan so I've lived in Japan for about 10 years and I've been kind of active in the Python community there as well. So I'm one of the four people who kind of founded the Python JP conference which is about a 600 person conference, just to give you
01:43
an idea of the size. And we're gonna be having the conference in September, in the third week of September I believe, it's from the 20th to the 24th I think. And if you look at Python JP, you can find out how to register. I think there are, as of now,
02:01
something like 20 slots left so hurry up. I'm also pretty enthusiastic about other kind of communities so the Go community as well as the other open source kind of projects coming out of Google like Kubernetes
02:22
and Docker and those type of containerization type of things. So that's the type of thing that you can expect to hear from me if you follow me on Twitter. So first, just as a kind of a background, I want to go over kind of what deep learning is
02:41
and kind of give a very high level, not necessarily high level, but sort of a quick overview of what that is. Like how many of you guys went to the talks earlier in the day about the deep learning? So quite a few of you. I'm gonna try as my best to kind of build on that,
03:00
but there may be a little bit of overlap. So what are we talking about when we talk about deep learning? So we're talking, in terms of deep learning, we're talking about a specific type of machine learning, which is using neural networks. And neural networks are a way of doing machine learning
03:20
where you build this kind of network of nodes that are interconnected. So you essentially give something like this cat picture. You change the pixels into a kind of numerical representation. You pass that through as the input layer into the network,
03:42
and each of these internal nodes will take the values from your input and do some operation on them and then eventually give you the output. So these are typically organized in layers. So you can see this blue one is the input layer.
04:01
The orange one there is what's called a hidden layer. So if you think of a neural network as kind of a black box, the hidden layers are the layers that are actually inside that do the operations. And so each one of these little nodes does some sort of operation on the input,
04:22
and that's called basically an activation function. And then each of these are kind of linked together using weighted connections. So each of these little lines connecting the layers will be weighted to indicate a strength
04:40
between each of the layers. So what are neural networks good for? So neural networks are essentially good for kind of classification and regression problems. So these are very wide class of problems that you can apply machine learning to. So classification is basically putting things into buckets.
05:01
So you can have a bunch of predefined buckets like A, B, C, and then you get some input and you say which bucket does it go in, and then you basically put it through the network and you get a probability that it goes in A, B, or C. And regression is a little bit more, somewhat more complicated in that you get, instead of a probability that it goes
05:21
into a single bucket, like between zero and one, you get kind of a scalar output. So say you have our, you are, you get some values into your neural network and output you want to say is a temperature. So like from say zero kelvin to some value or some temperature, that's more like a scalar
05:43
or could be solved by a regression problem. I'm gonna be talking mostly about classification problems but regression is also something that neural networks are pretty good at. So what does that actually look like? So here's a little demo that's available
06:05
at playground.tensorflow.org. This is like a little demo that allows you to kind of look into a neural network and kind of get an idea of what's going on. So here we have some input features. So these are some values that you add, they do input into the network
06:22
and then you have some hidden layers in the middle and then you get some sort of output. So if you went to some of the earlier presentations, you saw something similar to this where you have say, my example I'm gonna use is like say that you have like the weight and height of a person and then you have two different categories here,
06:41
like these orange ones are maybe children and these blue ones are say adults. So if you want to classify a new piece of data coming into your network, you could say, you know, train a network to do this but this is really, you know, very easy to do, like this is essentially a linear classification problem
07:01
or a linear regression type of classification problem where you can just draw a line in between the two and get a way of predicting between the two. But let's say you have something a little bit more complicated. Let's look at one where the one category is completely encircled by the others.
07:20
So if we were to do something like this and then try to train using just some, you know, x and y inputs, this would actually never, basically never converge. It would never figure out how to do this properly. So we can do things like add a hidden input layer
07:40
that will essentially do the, this kind of linear classification multiple times. So you can say, let's do this one time. We'll see that it basically creates one line here. So everything on this side, it will classify as orange and this side it will classify as blue. But then when we add like new layers, whoops,
08:02
not layers but new nodes, we can actually see that it gets a little bit more complicated. So we could say it now figures out two lines or uses two lines and then aggregates the result together. So you can see in the one node, it's done one kind of linear regression and one node, it's done another.
08:21
And then when you combine that together, it kind of makes this band here. And then, whoops, not this one, that one. Now if we do it with three, we can actually combine the result three times and we get kind of like triangular type of structure. So like as we add these kind of nodes and hidden layers,
08:40
we can do things that are more and more complicated. Okay, so that's great. But how do we classify something that looks more like this? So this is kind of a spiral looking thing and this spiral is blue and this spiral is orange. This is something that's quite a bit more difficult
09:01
and we can't necessarily classify this using something like just x and y inputs. So you can imagine maybe sine would be good or x squared. But even still, like we don't really get very good output just by having a very shallow kind of network with just three nodes here.
09:22
So in order to actually make this a little bit more complicated, or to solve these more complicated problems, we need a much more complex network. So for something like this, this may or may not actually converge. But it's getting there.
09:45
So right now, this is actually not terribly stable, but it will stabilize. So if you have these kind of more complicated networks that you can kind of put together, you can actually start solving more and more complex problems. And so I'll talk a little bit about
10:01
why that's important a little bit later. But you can see that each of these individual nodes has their own kind of addition to the final output. And then each of these little lines here are weight, show the weight. So these blue ones are positive and these orange ones are negative. So that's gonna show us that the negative ones
10:22
are actually a inverse relationship. So with these inverse relationships, you can essentially just reverse the orange and the blue in order to get the right type of output. But essentially, you can have these kind of positive and negative weighted connections between the different nodes.
10:42
So let's turn this off so it doesn't burn my CPU. And then let's go back here. By the way, that demo is basically just a way of getting to understand neural networks. It's not actually using TensorFlow under the hood. It's like all done in JavaScript in the browser. But it's essentially a way of like kind of getting
11:03
more familiar with neural networks. So what is a neural network? So a neural network is essentially, when you break it down, is essentially a pipeline of basically taking something like a matrix, what is essentially called a tensor,
11:23
and putting it through like this pipeline of operations. And so you can imagine that each of these is say like a matrix multiplication type of problem or type of function, where you take one matrix, multiply it by another matrix, multiply it by another matrix,
11:40
and another and another and another and another, and then eventually you get out a tensor that represents the output for your particular problem. And this is basically very loosely modeled after how the brain works, and how the individual nodes have a kind of strength
12:03
or a weight in between the neurons in your brain, have a certain weight between them. But from a practical point of view, you're essentially doing matrix operations on a bunch of times in order to do some sort of prediction.
12:21
So I mentioned what a tensor, and this is where TensorFlow gets its name, but a tensor is not something that people necessarily think of very often or don't encounter it too often, unless you're a machine learning type of person. But most people are familiar with things like vectors and matrices,
12:42
and a tensor is essentially a generalized version of that. So you can imagine this kind of like 2D, like Euclidean space, or 3D space, and then you have some sort of value out here in the space. And so for something like a vector,
13:00
you would have a 2D type of vector, and that could be say represented by a single array in a programming language, or a matrix which is a three-dimensional vector or a two-dimensional vector. But a tensor is essentially a generalized version where you have this n-dimensional type of vector.
13:21
So you could have like any number of dimensions. So this could be one dimension for each type of feature that you're actually adding into the network. And you can essentially do the same sort of operations on a tensor as you would on say a matrix.
13:43
So like matrix multiplication, matrix addition, that sort of thing. So how a basic neural network works is that you would have these kind of connected nodes. So this is our input vector or input tensor with x1, x2, x3.
14:02
Then we have a weights tensor that is represented, that you can then multiply against. And then finally we add the result biases in the form of a tensor, and then softmax it to get the output.
14:22
So this is a very, very basic one-layer network. But you can think of these individual, these individual weights or whatever are not each matrix multiplications, but essentially this matrix times this matrix
14:41
is this kind of interconnected, makes this kind of interconnected pattern. And so this is how if you went to some of the earlier talks, most of the operations are formed in this way, where you have the input x times w, which is the weights,
15:01
plus b, which is the biases. And then you can do that multiple times for each layer in the network. And so these are basically just multiplications and additions. And then we have this kind of softmax thing at the end. This softmax is essentially just a form, a way of normalizing the data once it comes out.
15:22
So you'll typically see these at the very end of a network. So what happens is after you've gone through this network, these outputs, what it would be at this level is you would have some sort of value, like say 50. This one's like 50, this one's like 20, this one's like .32. And so you don't really get an idea of what that actually means.
15:42
That's kind of, these values are kind of a relative value for your actual network. So when you put this through the softmax function, this will actually normalize it to a value between one and zero. So you can get essentially a prediction output. And then these individual values would represent
16:00
whether the percentage, say, of that particular value goes into a particular bucket. Say this is a cat, dog, and this is a human. And we put in an image value. The output might be that it's .99% certain that it's a cat and .01% certain it's a dog
16:20
and .001% certain that it's a human, which essentially means that it's a cat. So that's great. So that's how we actually do predictions. So we say input and input, we go through all these operations and we get some sort of predictive output, right?
16:40
But how do we actually train a model? So a model is trained in this way where you use a method called backpropagation, which was talked about in some of the earlier talks. But essentially what you have is you have this. This here is the neural network as we've been talking about it before. So here's like, say, one layer. Here's the second layer.
17:01
Here's our softmax and here's our output. And we actually go through here and we do the prediction. But then what we do is we use test data to actually, as we put it through our network. So we have some test data that says, here's the actual data, here's the cat picture, this is a cat.
17:21
So you have the actual value or the actual output, the expected output, and the actual test data associated with each other. So you know which ones are cat pictures, which ones are dog pictures. And so what we do is we put this, say that cat picture through here and then it comes out with a result.
17:42
And what we do is we take that result and we end the expected value and then we find essentially the difference between those two values. So say if it came out that it was .86% certain that it was a cat, but we know that it's 100% certain that it's a cat,
18:01
we want to be able to nudge our network in the direction of actually determining with 100% accuracy that it's a cat. So we'll take this output and use what's called a loss function to find the difference. So a typical loss function might be cross entropy, but there are a number of other loss functions
18:21
that you can use depending on the situation. And then you go through these other, you kind of optimize the results by using something like gradient descent. Those were also talked about a little bit earlier. I'll talk a little bit more in general about these kind of optimization or especially gradient descent.
18:43
But essentially what you do is you put through this optimization function and then back propagate all the values into the weights and biases for each individual layer. So this weight one, weight two, bias one, weight two and bias two are actually the weights
19:01
and biases that are used in the network here. And so what you're doing here is you're essentially back propagating all these values and updating the weights and biases and kind of nudging the network in the direction of actually giving you the proper output.
19:22
And then you do this essentially many, many, many times, training it over and over and over and over again, and it eventually nudges into the direction of being a very accurate network. At least that's the theory. So this doesn't always work,
19:41
but in general that's the idea behind it. So that's kind of like a relative overview of how the, what neural networks are like. So why are we actually talking about this? So one of the earlier talks mentioned things like
20:05
that these like ImageNet is a famous open data set about for machine learning, would get like say like 25% error rate in like 2002. But so essentially the reason why we're talking
20:22
about these kind of deep neural networks all of a sudden is because people have started to get very much better at creating these neural networks. And this is because of a number of kind of breakthroughs in terms of training these networks to do things that are actually practically useful.
20:43
So you can think of the quality of a neural network kind of like this. So like kind of traditional deep, or traditional learning algorithms would kind of, as you give it more data, would kind of increase in performance, but it would kind of level off it very quickly.
21:02
And then you would have like small neural networks which would also kind of level off quickly. And so essentially what people did was they would, you know, train the amount of data to about here, or give it a certain amount of data to about here, and then they would basically, you know, they wouldn't, adding more data
21:20
wouldn't actually make it much better. So they would essentially be able to stop right here. But we've since found these kind of, these, you know, neural network methods that allow us to scale the learning much better. So as we throw more data at the problem, they actually get quite a bit more sophisticated and have quite a bit better performance.
21:42
So we've been able to create these large, deep neural networks that will continually get better as we give it more and more data. And with that comes like other problems, which I'll talk about in a second, but essentially these like medium
22:02
and large neural networks have become possible recently. And so here is a model of what's the, this is the GoogleNet network that was used,
22:20
this is called, this is essential, the inception model that was trained on GoogleNet. And so what this is essentially doing is like labeling pictures, or labeling images. You can think that each one of these as being say a matrix multiplication or some sort of operation on a matrix. And then it goes through several different kind of layers,
22:44
and then eventually gives you an output tensor that tells you the labels. So this is what we mean when we talk about deep neural networks. So networks that are essentially have like many, many, many, many layers before they actually give you this output. And by adding these layers,
23:01
we actually can start getting more and more complicated, you know, solving more and more complicated problems, and actually getting pretty good results about with them. But this gives us a problem where we have, you know, you can imagine that each one of these is a matrix multiplication, and these tensors might be, you know,
23:21
a large image like a megabyte or something, and you're changing that into a tensor, and then doing a matrix multiplication on it, you can imagine how many actual, you know, operations you have to do in order to actually train this or to do even prediction even just once. So you have to do this many, many, many, many times over in order to actually train a network,
23:40
and so what people do is they use GPUs, and these GPUs are very good and high powered, but still you're essentially waiting for like weeks or sometimes even months for the results of actually one single training run. So what people start to do is like they use like supercomputers in order to train models faster,
24:03
but still this is a problem because not everybody has access to a supercomputer. How many of you guys have access to a supercomputer? Somebody does. That's the most I've ever seen. That was like three or four, I think. So how much do you pay for that, by the way?
24:21
So those are, supercomputers are something that you have to lease time on, so they're like the mainframes of old, you know, where you had to like lease some time, you know, 7.30 to 8.30, and you know, like in the middle of the night or something like that, and you pay tons of money for them, so they're not exactly the easiest or the best way,
24:40
and we wanted like, you know, the ideal thing is to be able, for everybody to be able to do machine learning. So what you need is kind of distributed kind of training, and so like at Google we've been able to do that, and so we use it for a lot of practical applications, things like Google Photos, and like detecting text in street view images.
25:02
So there's a lot of kind of exciting things that are going on, and essentially recently we've, these kind of breakthroughs allowed quite a lot of activity at Google, so this is a number of projects internally at Google that use machine learning.
25:21
This is just a number of directories that contain a model description file. But you can see from 2004, we've got this kind of hockey stick growth. And yeah, by distributing it, we've been able to do much, much faster, et cetera, et cetera. Okay. So now I'd like to talk about TensorFlow itself.
25:41
So TensorFlow is an open source library. It's a generic or a general purpose machine learning library for, particularly for doing neural networks. We are also kind of expanding it to encompass other types of machine learning. But it was open sourced in November 2005,
26:01
and it's used by, internally at Google, for a lot of our internal projects. So it supports a number of things like, this kind of flexible and intuitive construction. Basically be able to do a lot of things
26:20
in an automated way. And you can, it supports training on things like CPUs and GPUs, et cetera. But one of the nice things is that you define these kind of networks in Python. So before I kind of dive into looking at what TensorFlow looks like, some of the core concepts is that you have a graph.
26:41
So TensorFlow is, the name of TensorFlow comes from the idea of taking tensors and then having them flow through a flow graph or a directed data flow type of graph. So a graph is a representation of that. The operations of the actual nodes are the operations that you do. And then the tensors are the data
27:00
that's actually passed through the network. And then we have other types of structures. So we have the idea of these constants, which can be something that doesn't change. But then you have things like placeholders. These are basically inputs into our network.
27:23
These variables, so variables are things that can actually change during the training. So these are the things that you usually use for your weights and biases, et cetera. And then session is something that actually encapsulates the overall connection between TensorFlow's core
27:43
and how you actually, and the models that you define. So I should mention that TensorFlow is a library that is based on the same sort of concepts as many other libraries, kind of scientific libraries, where you have a Python interface or an API,
28:04
and then it has a kind of a C++ core that enables you to do these kind of very fast operations. So when you're actually doing training, you're not actually going through the Python VM. So these are a non-exclusive list
28:24
of all of the operations you can do with TensorFlow. So things like math addition, subtraction, multiplication division of these tensors, matrix operations, stateful kind of operations, et cetera, et cetera.
28:41
So let's actually look at what this looks like. So I'm gonna run through. So this is a Jupyter notebook. So how many people have heard of Jupyter, used Jupyter, et cetera? Okay, how many people have been asked
29:03
that more than five times at this conference? Yeah, that's what I thought. So I'm gonna just assume that you guys know Jupyter and just kind of go from there. But let me actually just restart this kernel here.
29:23
Yes, this is a Python 2 one because TensorFlow also supports Python 3, if I remember right, but this particular example is Python 2, yeah. So TensorFlow is pretty easy to get started.
29:41
There's like, this is just using the kind of MNIST example, so one of the, so Mr. Rashid was talking about the MNIST example earlier today. But it's essentially a bunch of images that are kind of handwritten numbers, and then you just OCR on those to determine which type of,
30:02
which number is actually present in the image. So the training images look something like this where you have 55,000 images, and they're all in this big, huge, long array of, and each one has 784 pixels. And they're basically monochromes,
30:21
so like they're just black and white, but the, and so if you look at the shape of that, you know, it's a 55,000 size array with the 784 pixels. But if you look at the images, they're essentially, each value in it is the,
30:43
this is each one of the images in a kind of a two-dimensional array, and each of these values is a value from zero to one of essentially how dark that particular pixel is. So some of these are like 0.23, which is kind of a light gray, all the way up to one.
31:03
So that's essentially what the data looks like. So that's how we've actually represented here, like if you had a color image, you would need to represent it a little bit differently, but that's essentially how we're doing it in this case. And then this is just using, this is just showing an example value,
31:22
so using matplotlib. So this is just one of the input images. So that's essentially what the training data looks like, but then we have these training labels that are associated with each image that says, that's basically a 10, you know,
31:41
an array or a vector of size 10, with, you know, a bunch of zeros in it, and a one in the right location that indicates the number of the, for that particular image. So for this image, we have an eight here. So if we look at the training labels, the shape of that, it's a 10 size.
32:01
And then if we look at this, the particular one for this eight, we can see that the one is in the, what is it, this is like zero to nine or something, in this particular column. So in the eighth column, I think this is like the zero is actually for zero. So it's from zero to nine.
32:22
So that's essentially what this is. These are actually called one-hot vectors, where you have zero in all of the values except for one. And this is used often in training data, but the data that you'll get out of it is actually similar to this, except for it will be a bunch of values from zero to one, essentially a probability.
32:44
And so here's some of the images, but as you're, some images of, that I've kind of shown earlier, but so once you're training it, you can kind of get these, you can train it to show these different,
33:00
set these weights and biases so that like individual pixels will indicate whether it's a particular number. So in this case, we're actually using a very simple neural network, which will kind of, with just one layer, which will work this way. But essentially, like if you see pixels
33:21
in these blue areas, that's probably a zero. And if there's any pixels in here, in the red area, then it's probably not a zero. And then it basically aggregates the probabilities in order to decide whether this is a zero or not. And you can kind of see that in many of the other ones. So like, this one's a one, so if you see pixels in this area,
33:40
with a two, it's like in this area, and three in this area. So they look similar to the actual values, the numbers you're looking for. So, did I actually execute all these? Okay, good. So the next one, this is actually us defining our network.
34:00
So here we're importing TensorFlow, and we're using the placeholder that I talked about earlier. This is our input into the neural network. And we have, it is a size of 784. So this is the size of the number of pixels. And then we have these weights and biases as variables, which can be updated as we train the model.
34:23
And then here's actually where we define our network. So here is, this is just a single layer network. So what we're doing is, you can define it very similarly in Python, so the way that you would do it in mathematics. So here we can say that we're doing a matrix multiplication on the input times the weight,
34:42
and adding it to the bias variable, and then doing a softmax on it at the end. And then TensorFlow internally will take these and build our data flow diagrams, or our data flow kind of model representation.
35:01
So once we have that out, we can then use that to then train our model. So this is actually our neural network. And then we have a placeholder for the output with this Y prime. And we've defined a cross-entropy function. Here's our loss function. And then we can basically put it
35:22
into this gradient descent optimizer, and optimize it using cross-entropy. And this will then create our kind of training step. So this encompasses the entire neural network, plus the training that we need to do.
35:41
And like some of the other explanations in some of the other talks, gradient descent is essentially a way of kind of nudging our neural network in the direction that we want it to. So I think one of the talks talked about using, going down a mountain, using a single little flashlight or a torch,
36:02
and then kind of just going a little bit at a time down the mountain. But essentially that's the idea. You're essentially going down, moving it in the direction towards a minimum, to actually minimize the loss. So say this altitude would be the error
36:23
or the loss generated by the loss function. And then we basically nudge it in a direction where the loss gets minimized. And then essentially do that just over and over and over again. So each one of these would be a training epic as we're going down the gradient descent optimizer. So here what we're gonna do
36:40
is we're going to train 1,000 times on a particular piece of data. And so what's great is that we can also do this kind of mini batch training, which is a way of, you basically pick just a small subset of the total training data.
37:00
So we're not actually training every single time on every single piece of the training data. We're actually training on a randomly selected batch of, in this case, 100, I think, or yeah, 100 elements. And we're doing that essentially 1,000 times. This is taking way longer than it usually does.
37:20
And so what's good about that is that you don't have to train on the entire training data. You can essentially do something that's, you basically pick a randomized subset of the training data. And that's essentially the same thing you do when you do, say, a statistical survey, when you ask a bunch of people something. You basically get a significantly,
37:45
or a representative sample of the data. Okay, so this is actually done, by the way, okay. So this, I've actually run through the training. And then at the end, we can actually check
38:00
the accuracy of our neural network. So in this case, we've actually got about 90%, which is pretty bad. But this is a very simple one-layer neural network. So that's essentially kind of how you can use TensorFlow. You can basically create these steps to run through it.
38:23
But all of these steps are actually, or all of the actual computation is done under the hood in part of the CPU, or in the C++ core. And that's also mapped to, it maps actually to devices. So if you have GPUs or CPUs available,
38:42
it will actually map the operations to those particular devices. So in this case, I'm running this on, like, say, I think a 32-core machine. So it'll actually map that. So I'll talk a little bit about TensorBoard later. Let me go back to, can I go back?
39:09
Let me see if I can find the back button. It's not the back button, it's a whole new thing. Okay.
39:21
So here I'm going to look at a little bit more, a little bit more complicated example where we can get a little bit better accuracy. So here we're training, or we're using the same exact data that we did before, but we're gonna actually use, build what's called a convolutional network. And this was talked a little bit about earlier.
39:44
And so this basically allows you to look at the image, like kind of in parts, and basically pick the specific features from each part of the image.
40:01
And this helps with things like, say if you write the way that I had it earlier, you know, you had the image, and you had, like, if you saw pixels in a certain location, then that would indicate what number it was. But what happens if you write the zero or whatever, but you actually translated it slightly over a little bit?
40:21
That would actually change the way that the, you know, that particular network wouldn't be very good at figuring out that I just moved the zero over a few pixels instead. So stuff like that, that's kind of convolutional, looking at it, helps a lot. But in this case, what we're gonna do
40:40
is we're gonna initialize the weights and biases a little bit different. I think this one is just doing this kind of, what was it doing? I think this one was picking, like, kind of random weights to begin with. But here's the kind of convolution part. So essentially what we're doing
41:01
is we're going over the image, and we're picking a particular, these are what are called kernels, I guess, over the image, and then we're kind of building this other value, or this other kind of tensor that indicates, that has a particular value
41:20
for each of the picked kernels over the image. And then we can actually work on these. So this is just picking kind of features of each individual part of the image, rather than looking at the whole image, or the image as a whole. Then we can take that, those type of things,
41:43
and use what's called pooling. Pooling is another kind of a method that you use to basically kind of, the most common example is max pooling, where you take the individual values from a part of the tensor, and you pick the maximum value. So this kind of gives you somewhat of a representation
42:02
of a particular part of the image as well. And then kind of put that all together into a layer, and then you can do that, like create several layers that look like that. So here we're doing a set of full, our first convolutional layer,
42:22
by building these bases, building the weights and biases, and then building our layer here. And then the second convolutional layer takes the inputs from the, or the outputs from the previous layer, and does the same, basically the same sort of thing.
42:42
And then at the end we create like a, this is a densely connected layer, so as the, some of the previous talks talked about, like the convolutional layer's not 100% connected between the values, because they're actually using this kind of translated kernel over the image. But the final output layer is kind of a
43:03
densely connected layer, which allows you to kind of just do basically the exact same thing that we did in our previous layer, but we're just, we didn't have the convolutional part. And this will allow us to, you know,
43:21
get a much better kind of output. I'm not gonna really talk about dropouts, but, and then we have basically the readout layer, and this is essentially just doing the softmax on the outputs of our last part of the, of our last output from the previous layer.
43:42
And then you can kind of train to execute the model. In this particular one, we're doing this, using the same kind of cross entropy, but we're using an atom optimizer instead of a gradient, the regular gradient descent optimizer. And then with those kind of optimizations, you can kind of get a much better output,
44:03
or a much better performance. So here, we're actually doing a lot more training on this particular one, because it's a deeper network, and we can train it, or it scales a lot better. Our previous one, if we continued to train it more and more, it probably wouldn't get, it wouldn't get very much better than 90%. But in this case, we can train it quite a lot more times
44:23
in order to improve the accuracy. So we actually train it about 20,000 times on many batches of 50. And so, then we'll go through, this is actually me doing this, because this takes about five minutes or something to actually run through. But you can see from the output
44:41
that we get about 99.2% accurate, which is a good deal better than 90%, right? So instead of one in 10, you know, it's like around one in 100 is classified incorrectly. So, you can do things from very simple networks
45:04
to much more complex networks. So let me go back to my, so one of the other things that you can do, because TensorFlow has this kind of internal representation or knowledge
45:21
about how the graph and everything is working together, is you can essentially write log output files as you're doing training. And these can then be read by an application called TensorBoard. So we were obviously very nice with the names,
45:41
you know, TensorFlow, TensorBoard, you get the idea. But what's really cool about this is that, where is it, can I make it, where do I make this bigger, there it is, is that you can look at the things like the accuracy, the values of the loss functions,
46:04
and look at these kind of graphs as you are training and going over the data to kind of see how your network is performing. So in this case, we're seeing the actual accuracy as we're training it. So this is one of the, this is data, I think, on the simple version.
46:21
So once we get up to about 90%, we get there pretty quickly, but we don't really get very much better as we train the data. But you can look at things like the accuracy, but you can also look at the actual loss function. So this is cross entropy, looking at the cross entropy value, and that kind of goes down and down and down and down.
46:41
So this should actually be close to the inverse of the accuracy. But you can also look at many of the other values, and these basically correspond to the variables or the individual parts of your,
47:01
basically the values that you have. So here, cross entropy was an actual object, a Python object that you can use, or that was defined, and then you can get this kind of log output data. So other things like the max and the min and the min and stuff like that are all kind of part of that.
47:22
Let's see, so these are kind of input images that you can look at. But one of the other cool things is you can actually look at the graph of the data itself or of the model itself that you're building. So here we have a two-layer network.
47:40
So if we have a two-layer network, we can actually just kind of zoom in and look at the individual pieces of the network, like the weights and the biases and things like that for individual parts of the network. We can look at things like the dropout values, the loss function as well. So this basically gives you,
48:01
from the Python code that we wrote, will give you a full kind of graph representation of the network. So in the case of, say, something like a very complicated, you know, the ImageNet thing that I was showing you earlier, you would see this huge, huge graph that was generated by that.
48:22
But this is really cool because it helps you visualize your neural network that you've defined. So let's go back here. I have about 10 minutes or so left or something, right?
48:43
Yeah, well, we'll get there. So the main difference between distributed training or between TensorFlow and many of the other libraries that are out there is that TensorFlow is built from day one with distributed training in mind.
49:06
So essentially, TensorFlow is built in such a way that we want to be able to productionize or to actually do practical work with our network, with the library and with our networks. So we want to be able to train things faster,
49:23
and based on the kind of hardware kind of breakthroughs and improvements that we've made in the past, we want to be able to utilize those to be able to train models faster. So TensorFlow supports multiple different types of parallelism, so model parallelism,
49:41
which is essentially breaking up your model. So each one of these machines takes a different part of the model, and you basically feed it through here and basically break up the work that way. It also supports what's called data parallelism, which is hopefully on one of these slides.
50:02
No, it disappeared. So data parallelism is the opposite, where you basically break up the data instead, but each one of the machines has a full copy of the model. So you're basically splitting up, say, like record one through 100 and sending it to one machine and, you know, 101 through 200
50:22
and sending it to a different machine, and then breaking up the work that way. And there's a number of kind of trade-offs between these, whether you use like, you do like a full graph or a subgraph type of model parallelism or synchronous or asynchronous data parallelism.
50:41
This is gonna help, or like, you know, there's kind of these pluses and minuses to each of these. So it's kind of, there's no like silver bullet to this, but I do know that at Google that we use pretty much data parallelism, or use data parallelism pretty much exclusively.
51:04
So TensorFlow basically supports in a number of ways these different types of model parallelism and data parallelism. So data parallelism is this one, okay. So this is where you take the data and you kind of split it up
51:20
and each one of these replicas has a full, has the entire model. And then once you've done some training, you can pass this to the parameter server. So this is the thing that holds all the weights and the biases. So as these are updated, it'll push those back to the model replicas. And then there's like kind of asynchronous and synchronous versions of data parallelism,
51:43
where you're updating the model, or updating the weights and biases in parallel, or you're updating them synchronously for each kind of iteration. Asynchronous is much faster, but can kind of add some noise to your model
52:02
because the parameters are kind of changed midway through, can be changed midway through a run. Whereas in synchronization, you split up the data and then you wait until all of the models are finished a particular epic before you go on to the next one.
52:21
So this will reduce it, but it'll actually make the training a little bit slower. So this is kind of an example of how that would run with TensorFlow, where you have a bunch of workers doing the parallelizing and then you have some kind of parameter servers. And then in between the servers, they use gRPC to communicate.
52:44
So why is this kind of data parallelism important? Okay, so let's say that instead of a cat getting a cat out of our narrow network, we got a dog and we're like, well, we want to improve our network. We want to make that better. So what do we fix in order to make that actually better?
53:02
I don't know, maybe this. This is probably a good idea. I don't know. So we make our tweak and we run this again and we're like, okay, yeah, this is right, nice, and it's running on my GPU and it's getting fine. And then it comes out and it's like, it doesn't make it better. And you're like, oh crap, what do I do now?
53:22
You got to go back and you got to start over again. So in order to run these kind of experiments, you want to be able to run these experiments over and over again very quickly. You don't want to have to wait a week in order to figure out that your tweak went well or not. And this is a problem with people
53:41
who are even experts in machine learning. It's like you basically, you have experience and you have literature that you can use to kind of figure out, you kind of narrow down what things you might want to tweak. But in the end, you need to be able to run that and test to see if the data or if the actual tweak that you made improved or didn't improve.
54:02
So you essentially have to test it. And this takes time. And so that's why it's very important to be able to do this kind of distributed training. But one of the problems is that as you scale up these number of nodes, like these number of connections, the number of connections between your parameter servers and between your workers increases kind of exponentially.
54:23
And so this doesn't essentially scale. You essentially bottleneck on the network because these guys are talking over TCP. And you essentially get kind of like, you know, on the order of milliseconds latency between the machines.
54:41
So you essentially need to build this kind of like, you need to have like a dedicated network or a kind of dedicated hardware network. Like a lot of people use things like Ephiniband or whatever in order to make this go faster. But this is actually something that's a really big problem at the moment. So one of the things that we did at Google
55:01
is like we're releasing Cloud ML. Like internally what we do is we create, we do our distributed training, but we have a dedicated network that doesn't use TCP IP and basically skips the whole TCP IP stacks and is able to have the communication between the machines run on the order
55:20
of milliseconds or nanoseconds instead of, so nanoseconds or microseconds instead of milliseconds. So this is something that we are planning on making public as what's called Cloud ML, which will allow you to basically run TensorFlow graphs on inside of Google data centers.
55:44
We're also planning on exposing, as part of the API, the dedicated hardware that we're using for, so instead of GPUs you can use what are these, these are called tensor processing units I think is what they're calling them. But essentially what they are, they're dedicated hardware
56:01
used for doing tensor operations. So we're basically being able to like expose those, but to other people so that they can use that kind of dedicated hardware in order to kind of do more experience and things like that. So I think that's all I had.
56:23
So I want to thank you for coming and spending the last hour here. How many of you are still awake, raise your hand. So about like 70% of you. But thanks a lot for coming.
56:40
Definitely check out the tensorflow.org website. There's tons of really good examples, like if you go here and then if you go up here there's like tutorials and documentation. This is actually really, really good and has like lots of good examples about how to use TensorFlow. And especially if you're also a person,
57:02
you know there's different ones for different levels of people, as well as how to use TensorFlow serving to kind of move towards actually productionizing your models. So thanks a lot for coming and I hope you enjoyed that presentation.
57:27
Yeah, two-ish questions I think. We have like two minutes, so that works. Yeah, so your question has to be about 15 seconds
57:41
and then maybe 30 seconds for me to answer it. Is there anything like profiling for this kind of models, like to have an overview of how many multiplication,
58:00
how many parameters does each block of a flow graph need? So you're talking about like actual like time that it took to run it. I don't know if TensorBoard gives you that. I think that it probably should if it doesn't. I don't know if that offhand, if that actually is,
58:24
but I think that that could be something that you could visualize with TensorBoard as part of the, you basically log that as a value that you can view here in TensorBoard and then kind of see how each part of your graph performed and things like that.
58:40
Other question? We got one right there behind you. So the previous talks today mentioned that you typically have to do some feature extraction before you can actually apply neural networks. Will TensorFlow help me speed up
59:03
my more manually designed feature extraction or is it designed only to do neural network stuff? So at the moment it's mostly geared towards neural networks. I mean obviously you can do like feature extraction using a separate neural network.
59:20
So you could do like a neural network that does feature extraction and another one that does the actual like classification and stuff. But there is some work going on. There's like, I forget what it's called. It's like TensorFlow wide or something like that
59:41
I think it's called. It's essentially, instead of having deep neural networks, the idea is that you have these kind of more standard type of machine learning algorithms. And so I think there is work going on there to incorporate more standard machine learning algorithms. So you can do that sort of feature extraction beforehand and stuff like that.
01:00:00
That's kind of ongoing work. You might try to search about TensorFlow wide. I haven't played with it personally, so I can't really give you details. Yeah, thanks a lot.