EuroSciPy 2017: Keras
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 43 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/38189 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Place | Erlangen, Germany |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
4
5
6
7
9
10
11
14
28
29
32
33
34
35
38
39
41
43
00:00
Drum memoryMathematical analysisCodeComputer fileBookmark (World Wide Web)PredictionKernel (computing)Linear regressionPlot (narrative)Pell's equationMultiplicationConnected spaceArtificial neural networkAkkumulator <Informatik>Mach's principleSoftware developerIntegrated development environmentSystem programmingGastropod shellPhysical systemDirectory serviceWindowWitt algebraGroup actionBoom (sailing)Library (computing)MathematicsFront and back endsCNNInclusion mapEndliche ModelltheorieWinAmpDataflowLevel (video gaming)TensorGraphics processing unitBefehlsprozessorWeightCellular automatonLaptopCloningImplementationInternet service providerSample (statistics)Default (computer science)Residual (numerical analysis)Data modelHeat transferMaxima and minimaAsynchronous Transfer ModeRevision controlContent (media)Touch typingFloating pointMultiplication signRight angleWordSoftware frameworkElectronic mailing listAreaLaptopGoodness of fitView (database)Task (computing)Artificial neural networkCodeComputer fontChainBitArmNetwork topologyVirtual machineShooting methodReading (process)WindowGraphics processing unitRecurrence relationConvolutionMaterialization (paranormal)Heat transferResidual (numerical analysis)Computer animation
03:29
Default (computer science)Computer fileKernel (computing)Cellular automatonContent (media)Touch typingFloating pointCrash (computing)Radio-frequency identificationContext awarenessMeta elementWindowPhysical systemGastropod shellBinary fileVirtual realityIntegrated development environmentSystem programmingDirectory serviceSoftware developerShape (magazine)MathematicsAddress spaceFront and back endsScripting languageLetterpress printingHaar measureCompilerLoginSource codeSample (statistics)CodeData AugmentationRow (database)Structural loadTensorGraph (mathematics)Parameter (computer programming)outputNormal (geometry)File formatStandard deviationSoftware testingRevision controlLibrary (computing)Expert systemLevel (video gaming)Graphics processing unitBefehlsprozessorMultiplicationConnected spaceArtificial neural networkGoogolComputer fileWindowConfiguration spaceVirtual machineFunctional (mathematics)Right angleDoubling the cubeLibrary (computing)Object (grammar)NumberArithmetic meanArtificial neural networkLaptopConnected spaceUsabilityLine (geometry)Pairwise comparisonProjective planeComputer animation
05:50
TensorDataflowSoftware developerIdeal (ethics)Level (video gaming)Artificial neural networkLibrary (computing)Connected spaceCellular automatonDefault (computer science)Computer fileKernel (computing)Principle of maximum entropyBefehlsprozessorGraphics processing unitDifferent (Kate Ryan album)Multiplication signGoodness of fitResultantBranch (computer science)CASE <Informatik>Online helpAdditionQuicksortFront and back endsWordPlanningMathematicsCAN busArmSoftware frameworkSpacetimeLibrary (computing)Artificial neural networkSoftware developerCodeLevel (video gaming)System callRevision controlKey (cryptography)Computer animation
07:58
Sheaf (mathematics)Mathematical analysisProduct (business)Data modelCategory of beingEvent horizonIdentical particlesInternet service providerNumerical analysisTotal S.A.Kernel (computing)Default (computer science)Computer fileLaptopCellular automatonLinear mapCanonical ensembleAlgorithmLinear regressionVariable (mathematics)Function (mathematics)Sigmoid functionLine (geometry)Front and back endsTensorLetterpress printingSocial classShape (magazine)DataflowCodierung <Programmierung>Uniqueness quantificationRevision controlBeat (acoustics)Random numberInformation systemsMatrix (mathematics)Vector spaceShared memoryDot productPredictionLogarithmGraph (mathematics)Regulärer Ausdruck <Textverarbeitung>Arithmetic meanSummierbarkeitCompilerComputerGradientWave packetSinguläres IntegralRange (statistics)Free variables and bound variablesParameter (computer programming)outputCurvatureWeightHistogramEntropie <Informationstheorie>Scalar fieldMathematical optimizationVarianceEquals signVisualization (computer graphics)Glass floatPlastikkarteFunctional (mathematics)Variable (mathematics)CodeCodierung <Programmierung>Social classWeightShape (magazine)Element (mathematics)Free variables and bound variablesSampling (statistics)Supervised learningResultantProcess (computing)Endliche ModelltheorieTerm (mathematics)Level (video gaming)BitGroup actionTwo-dimensional spaceSingle-precision floating-point formatQuicksortPerturbation theoryHistogramProduct (business)InformationCorrespondence (mathematics)Row (database)NumberDifferent (Kate Ryan album)Matrix (mathematics)AbstractionTotal S.A.Wave packetTensorIdentity managementCASE <Informatik>Rule of inferenceObject (grammar)Front and back endsCuboidMathematical analysisGraph (mathematics)GradientCalculationInsertion lossComputer fileAlgorithmDimensional analysisFunction (mathematics)Linear regressionGraph coloringOperator (mathematics)Equivalence relationEvent horizonDataflowShared memoryMultiplication signLibrary (computing)Particle systemMedical imagingInternet service providerRight angleCountingProjective planeScaling (geometry)Standard deviationPoint (geometry)Set (mathematics)outputOptical disc driveWordLogistic distributionArmComputer animation
15:20
TensorGraph (mathematics)Phase transitionLetterpress printingVariable (mathematics)Range (statistics)PlotterCellular automatonDefault (computer science)Computer fileKernel (computing)PredictionLine (geometry)Gastropod shellHausdorff dimensionUser interfaceFree variables and bound variablesModule (mathematics)Negative numberTask (computing)Message passingUniform resource locatorArtificial neural networkMagneto-optical driveWindowStreaming mediaHacker (term)Parameter (computer programming)Kerr-LösungHill differential equationPersonal identification numberBit rateServer (computing)SimulationLaptopPlane (geometry)Plug-in (computing)Directory serviceTape driveUniform resource nameProgrammable read-only memoryAdvanced Boolean Expression LanguageCurvatureTerm (mathematics)CodeProcess (computing)TensorSemiconductor memoryFront and back endsGraphics processing unitComputer animationSource code
16:34
Kernel (computing)LaptopComputer fileCellular automatonDefault (computer science)MathematicsData typeLibrary (computing)FreewareSoftware developerLevel (video gaming)Artificial neural networkLinear mapCanonical ensembleLinear regressionFunction (mathematics)Variable (mathematics)Sigmoid functionAlgorithmFront and back endsDataflowTensorShape (magazine)Structural loadLetterpress printingSocial classCodierung <Programmierung>MultiplicationUniqueness quantificationPredictionFile formatPhase transitionPlotterUser interfaceRight angleMereologyPlotterTwitterDifferent (Kate Ryan album)MathematicsNumberCuboidServer (computing)Computer animation
18:12
HistogramAsynchronous Transfer ModeCartesian coordinate systemNetwork switching subsystemOverlay-NetzUser interfaceData modelWhiteboardDefault (computer science)Cellular automatonComputer fileKernel (computing)Link (knot theory)OutlierScaling (geometry)WritingSmoothingComputer-generated imageryDistribution (mathematics)TouchscreenGraph (mathematics)Logical constantKeilförmige AnordnungNamespaceAttribute grammarSubgraphoutputData loggerComputer fileDirectory serviceRight angleCost curvePlotterWave packetGraph coloringArithmetic meanWeightGraph (mathematics)DataflowRaw image formatReal numberTensorEndliche ModelltheorieHistogramTwitterCorrespondence (mathematics)Different (Kate Ryan album)Computer animationProgram flowchart
19:21
Data modelOverlay-NetzWritingCartesian coordinate systemAsynchronous Transfer ModeHistogramComputer-generated imageryEndliche ModelltheorieSequenceShape (magazine)Function (mathematics)Letterpress printingBuildingInterrupt <Informatik>Cellular automatonComputer fileKernel (computing)Data typeSystem callDefault (computer science)MathematicsPopulation densityCodeCurve fittingSocial classoutputCompilerMathematical optimizationLogarithmVisualization (computer graphics)Glass floatWeightConnected spaceCore dumpData structureLinear mapMaß <Mathematik>Matrix (mathematics)Punched cardArray data structureElement (mathematics)TensorEndliche ModelltheorieHistogramMultiplication signArtificial neural networkContent (media)Declarative programmingAreaLinear regressionFunctional (mathematics)Object (grammar)Sigmoid functionSeries (mathematics)Gene clusteroutputFunction (mathematics)Shape (magazine)Execution unitParameter (computer programming)Type theoryDimensional analysisSequenceNumberCASE <Informatik>Wave packetTensorSampling (statistics)Social classStapeldateiDefault (computer science)IntegerGradient descentProcess (computing)Bit rateMathematical optimizationStochasticBitDecision theorySystem callCorrespondence (mathematics)Insertion lossArithmetic progressionGame theoryOscillationRight angleRing (mathematics)WordGame controllerFitness functionSpeech synthesisGoodness of fitImpulse responseCuboidCategory of beingFocus (optics)DataflowComputer animation
27:09
Inheritance (object-oriented programming)outputShape (magazine)Function (mathematics)WeightMaß <Mathematik>TensorDefault (computer science)Kernel (computing)Computer fileCellular automatonLaptopArtificial neural networkNumerical digitArray data structureMatrix (mathematics)Core dumpChainArchitectureAffine spaceUniformer RaumPermutationLambda calculusData modelStapeldateiHausdorff dimensionUser interfaceTerm (mathematics)Rothe-VerfahrenNP-hardObject (grammar)TensorOperator (mathematics)Different (Kate Ryan album)Internet service providerLine (geometry)Default (computer science)CodeFront and back endsComputational intelligenceWeightWave packetParameter (computer programming)Process (computing)Functional (mathematics)Fitness functionRight angleFunction (mathematics)Kernel (computing)DataflowArtificial neural networkRegular graphPower (physics)Shape (magazine)Single-precision floating-point formatCore dumpNormal (geometry)Instance (computer science)SurfaceDampingPhysical lawCASE <Informatik>Error messageMathematicsLevel (video gaming)Roundness (object)Multiplication signGraph coloringUsabilityService (economics)Computer animation
31:27
Number theoryComputer fileKernel (computing)Default (computer science)Cellular automatonUniform resource locatorMomentumWeightGradient descentSource codeControl flowCore dumpConfiguration spaceMathematical optimizationUsabilityData modelThermal fluctuationsRandom numberParameter (computer programming)Population densityFunction (mathematics)Wave packetCurve fittingCodePell's equationConnected spaceArtificial neural networkMultiplicationPerceptronNP-hardUser interfaceCompilerSequenceIdeal (ethics)Computer-generated imageryPhysical systemBuildingApproximationInfinityVariety (linguistics)Touch typingTheoremCompact spaceState of matterSet (mathematics)Different (Kate Ryan album)Core dumpArtificial neural networkRight angleView (database)SequenceSequelOrder (biology)Mathematical optimizationCodeGradientExecution unitRecurrence relationConvolutionBuildingoutputBounded variationFunction (mathematics)Population densityFunctional (mathematics)Endliche ModelltheorieParameter (computer programming)Shape (magazine)Front and back endsWave packetStochasticInsertion lossValidity (statistics)StapeldateiComputational intelligenceTotal S.A.Fitness functionCASE <Informatik>NumberSoftware frameworkObject (grammar)AdditionRow (database)WeightDecision theoryVirtual machineMathematicsPerceptronProcess (computing)Heegaard splittingQuicksortBitSoftware testing2 (number)Multiplication signMultiplicationInformationLevel (video gaming)ArmNetwork topologyDefault (computer science)Condition numberVibrationPosition operatorCuboidSystem callMechatronicsComputer animation
39:32
Front and back endsArtificial neural networkApproximationParameter (computer programming)Touch typingKernel (computing)Cellular automatonDefault (computer science)Computer fileCurve fittingData modelSequenceCompilerCompilerCodeIdeal (ethics)ImplementationAsynchronous Transfer ModeComputer-generated imageryPhysical systemBuildingMusical ensembleSimilarity (geometry)LaptopModule (mathematics)TensorLinear regressionDataflowResultantCodeDifferent (Kate Ryan album)Front and back endsAdditionModule (mathematics)Right angleOffice suiteCausalityComputer animation
40:54
Similarity (geometry)CodeTensorDataflowFront and back endsPreprocessorComputer fileCellular automatonKernel (computing)Default (computer science)Floating pointBeat (acoustics)Shape (magazine)Data modelVariable (mathematics)Dot productWeightTouch typingRandom numberCodeQuicksortFront and back endsObject (grammar)CASE <Informatik>Logistic distributionModule (mathematics)Integrated development environmentComputer fileConfiguration spaceSet (mathematics)Free variables and bound variablesWrapper (data mining)Functional (mathematics)Linear regressionInterface (computing)Direction (geometry)DataflowRight angleComputer-assisted translationHand fanGame theoryComputer animation
43:34
CodeLaptopDefault (computer science)Beat (acoustics)Shape (magazine)Variable (mathematics)Random numberData modelWeightMetropolitan area networkDot productComputer fileKernel (computing)Cellular automatonLink (knot theory)Logical constantGradientFunction (mathematics)Range (statistics)Letterpress printingFile formatComputational intelligenceGraph (mathematics)Linear regressionLinear mapBit rateFree variables and bound variablesObject (grammar)Free variables and bound variablesRecurrence relationQuicksortImplementationLevel (video gaming)ConvolutionMultiplication signBenchmarkDifferent (Kate Ryan album)Interface (computing)MultiplicationLine (geometry)Front and back endsCodeInstance (computer science)Artificial neural networkLaptopMathematical optimizationSemiconductor memoryMemory managementResource allocationLinear regressionTensorMessage passingRevision controlConfiguration spaceCASE <Informatik>Process (computing)Functional (mathematics)Numeral (linguistics)Single-precision floating-point formatAbstractionMatrix (mathematics)Product (business)Real numberEquivalence relationDataflowLimit (category theory)Right angleForm (programming)Term (mathematics)Selectivity (electronic)MassHeat transferRow (database)10 (number)Casting (performing arts)Correspondence (mathematics)WebsiteComa BerenicesData managementComputer animation
49:06
Cluster samplingConnected spaceArtificial neural networkCNNFront and back endsConfiguration spaceGraph (mathematics)Multiplication signDynamical systemSemiconductor memoryTensorDefault (computer science)Computer fileSoftware frameworkQuicksortArtificial neural networkCASE <Informatik>Functional (mathematics)CodeRevision controlSound effectLatent heatDataflowDifferent (Kate Ryan album)Right angleComputational intelligenceTerm (mathematics)Instance (computer science)Wave packetSpacetimeComputer animation
52:36
Computer fileKernel (computing)Default (computer science)Cellular automatonTensorFront and back endsDataflowComputer-generated imageryFile formatPreprocessorFloating pointMathematicsUser interfaceCodeProcess (computing)Software configuration managementLaptopSet (mathematics)Artificial neural networkInclusion mapSoftware testingGastropod shellPersonal identification numberMagneto-optical driveConnected spaceMetreExtension (kinesiology)Hash functionWindowFluid staticsLevel (video gaming)Instance (computer science)Letterpress printingRight angleDirection (geometry)Front and back endsComputer animationSource code
53:45
InformationData typeDefault (computer science)Interactive televisionWindowGastropod shellFront and back endsRevision controlProcess (computing)Front and back endsSet-top boxIntegrated development environmentFocus (optics)WindowVirtual machineComputer animation
55:02
WindowGastropod shellDemo (music)Data typeInformationFront and back endsModule (mathematics)Revision controlFront and back endsIntegrated development environmentEqualiser (mathematics)Configuration spaceInstance (computer science)MultiplicationBitRight angleVariable (mathematics)Computer fileData conversionMultiplication signOffice suiteSystem callComputer animation
57:03
TensorDataflowFront and back endsFloating pointComputer fileKernel (computing)Cellular automatonDefault (computer science)Group actionCNNArtificial neural networkCodeData modelCompilerIdeal (ethics)Physical systemBuildingComputer-generated imageryHypercubeEndliche ModelltheorieModal logicMathematicsProcess (computing)Numerical digitWeb pageDatabaseEmailBenchmarkState of matterWordDatabasePoint (geometry)Medical imagingArtificial neural networkCASE <Informatik>Set (mathematics)Right angleMatching (graph theory)LaptopComputer animationLecture/Conference
58:12
Data analysisStructural loadLibrary (computing)Front and back endsTensorDefault (computer science)Cellular automatonComputer fileKernel (computing)Menu (computing)LaptopElectronic visual displayComputer-generated imageryNumerical digitHausdorff dimensionShape (magazine)Type theoryLaptopDatabaseCASE <Informatik>Module (mathematics)Structural loadBitBit rateShape (magazine)Medical imagingRight angleObject (grammar)Multiplication signState of matterSampling (statistics)State observerNumberWave packetTwitterInformationPhysical lawQuantum stateDimensional analysisSet (mathematics)Computer animation
01:01:04
Cluster samplingArtificial neural networkConnected spaceEmailCNNParameter (computer programming)Electronic visual displayCellular automatonDefault (computer science)Kernel (computing)Computer fileTask (computing)LaptopComputer-generated imageryFunction (mathematics)Sigma-algebraMatrix (mathematics)outputData modelRadio-frequency identificationBinary fileMathematical optimizationSequenceFront and back endsVariety (linguistics)Stack (abstract data type)Linear mapGraph (mathematics)SmoothingApproximationRectifierCore dumpEndliche ModelltheorieCodeLaptopArtificial neural networkEndliche ModelltheorieGraph (mathematics)Object (grammar)ConvolutionAnnihilator (ring theory)Drop (liquid)Functional (mathematics)Mathematical optimizationCASE <Informatik>Bit rateComputer architectureDerivation (linguistics)Maxima and minimaMathematicsExecution unitGradient descentLink (knot theory)Insertion lossStochasticWordConfiguration spacePoint (geometry)Real numberQuicksortSoftwareArmCategory of beingSurgeryLengthComputer animation
01:03:27
SequenceCNNFunction (mathematics)CodeCore dumpComputer fileKernel (computing)Cellular automatonDefault (computer science)Endliche ModelltheorieAsynchronous Transfer ModeData modelFunctional (mathematics)Social classTotal S.A.NumberObject (grammar)TensoroutputPopulation densityNormal (geometry)CodeAdditionString (computer science)Endliche ModelltheorieMultiplication signArtificial neural networkShape (magazine)Vector spaceMathematical optimizationReal numberRight angleControl flowGroup actionBeer steinProgrammable read-only memoryComputer animation
01:05:14
Default (computer science)LaptopSequenceEndliche ModelltheorieData modelCore dumpComputer fileKernel (computing)Cellular automatonAsynchronous Transfer ModeoutputNamespaceString (computer science)Object (grammar)CASE <Informatik>Metric systemParameter (computer programming)CodeReal numberEntropie <Informationstheorie>Default (computer science)Insertion lossNumberMaxima and minimaFunctional (mathematics)SpacetimeComputer animation
01:06:52
Data modelNumerical digitComputer-generated imageryEndliche ModelltheorieSequenceCore dumpCompilerComputer fileKernel (computing)Cellular automatonDefault (computer science)Shape (magazine)CodeFunction (mathematics)outputRadio-frequency identificationMathematical optimizationBinary fileFront and back endsVariety (linguistics)Internet service providerSineVector graphicsInfinite conjugacy class propertyIntegerMatrix (mathematics)Data storage deviceEndliche ModelltheorieShape (magazine)State of matterOperator (mathematics)Artificial neural networkNumberBitMoment (mathematics)outputArray data structureFunctional (mathematics)Heegaard splittingDimensional analysisWave packetPerturbation theoryMedical imagingCodierung <Programmierung>Function (mathematics)Software testingGauß-FehlerintegralComputer animation
01:08:29
Computer-generated imageryShape (magazine)Computer fileDefault (computer science)Kernel (computing)Cellular automatonLetterpress printingWave packetData modelCurve fittingFunction (mathematics)CompilerArtificial neural networkLaptopTuring testPhase transitionPlot (narrative)Medical imagingArchaeological field surveyPerturbation theoryCodierung <Programmierung>Wave packetValidity (statistics)VibrationExpected valueResultantInsertion lossFitness functionFunction (mathematics)Different (Kate Ryan album)Artificial neural networkPhase transitionObject (grammar)NumberDefault (computer science)Key (cryptography)Process (computing)Theory of relativityMachine visionAreaSystem callEndliche ModelltheoriePerformance appraisalTwitterValuation (algebra)Right angleTraffic reportingAttribute grammarSquare numberCuboidComputer animation
01:11:16
CompilerData modelCodeDefault (computer science)Kernel (computing)Cellular automatonComputer fileComputer hardwareWave packetCurve fittingFunction (mathematics)Drop (liquid)Random numberLatent heatoutputBit rateCore dumpMaß <Mathematik>TensorShape (magazine)IntegerBinary multiplierMassArtificial neural networkSequenceEndliche ModelltheorieCASE <Informatik>Connected spaceDrop (liquid)Endliche ModelltheorieSoftware frameworkWave packetPhase transitionStochasticParameter (computer programming)Gradient descentVariable (mathematics)Right angleInformationBit rateLie groupArtificial neural networkComputer animation
01:13:14
Cellular automatonKernel (computing)Computer fileDefault (computer science)LaptopCore dumpArtificial neural networkCodeEndliche ModelltheorieSequenceFront and back endsPhase transitionPredictionBit rateMaß <Mathematik>outputSource codeElectronic signatureBookmark (World Wide Web)WindowGoogolCategory of beingTouchscreenEmailInternet forumSoftware developerHypermediaComputer engineeringComputer-generated imageryCodeOpen setExpected valueVariable (mathematics)Connected spaceDrop (liquid)View (database)Computer animation
01:14:25
MIDIComputer fileFront and back endsDrop (liquid)Artificial neural networkCore dumpKernel (computing)Default (computer science)Cellular automatonSystem calloutputBit rateConfiguration spaceInclusion mapFlagPhase transitionFunction (mathematics)Data typeShape (magazine)Block (periodic table)Web pageData modelSequenceCodeEndliche ModelltheorieTuring testPoint (geometry)Curve fittingCodeArithmetic meanSocial classWave packetInferenceFunctional (mathematics)Multiplication signArtificial neural networkoutputImplementationDrop (liquid)ApproximationPoint (geometry)Different (Kate Ryan album)Configuration spaceMathematicsUsabilityPhase transitionSelectivity (electronic)SequenceMereologySystem callBitComputer animation
01:17:14
Data modelFunction (mathematics)Default (computer science)Kernel (computing)Computer fileCellular automatonTensoroutputLetterpress printingArtificial neural networkFile formatConfiguration spaceDistribution (mathematics)Scale (map)Asynchronous Transfer ModeUniform convergenceMaß <Mathematik>WeightStapeldateiSequenceCompilerPredictionShape (magazine)MetreFront and back endsCodeWeightObject (grammar)TensorCASE <Informatik>Function (mathematics)Representation (politics)MathematicsArithmetic meanoutputFunctional (mathematics)Parameter (computer programming)ImplementationEndliche ModelltheoriePopulation densityBlack boxRevision controlArtificial neural networkSimilarity (geometry)LaptopGraph (mathematics)Asynchronous Transfer ModeWave packetHand fanRight anglePresentation of a groupAttribute grammarPressureAuthorizationProcess (computing)DataflowSet (mathematics)Drop (liquid)MultilaterationComputer animation
01:22:13
Interactive televisionRange (statistics)Price indexCellular automatonComputer fileKernel (computing)Default (computer science)CircleManifoldEinbettung <Mathematik>Inclusion mapHydraulic jumpDimensional analysisLibrary (computing)BitSocial classGraph coloringRepresentation (politics)ManifoldRight angleProcess (computing)CodeDot productSampling (statistics)Interpreter (computing)Computer animation
01:23:26
Kernel (computing)Computer fileSocial classNetwork topology1 (number)9 (number)Phase transitionArtificial neural networkError messageMassMultiplication signWave packetRight angleLine (geometry)Computer animation
01:24:45
Artificial neural networkCNNEndliche ModelltheorieConnected spaceFront and back endsModemHeat transferMultiplication signWrapper (data mining)Object (grammar)1 (number)Graph (mathematics)TensorImplementationContent (media)SequenceArtificial neural networkoutputMaterialization (paranormal)NumberCASE <Informatik>Server (computing)YouTubeGreatest elementPresentation of a groupMobile WebAreaComputer animation
01:27:16
CNNMultiplicationArtificial neural networkConnected spaceFront and back endsTensorFunction (mathematics)outputObject (grammar)SequenceNumberLengthArtificial neural networkDifferent (Kate Ryan album)Wave packetInsertion lossMultiplicationNetwork topologyGraph (mathematics)Materialization (paranormal)Real numberValidity (statistics)Sampling (statistics)Connectivity (graph theory)Mobile WebImpulse responseComputer clusterDisplacement MappingProduct (business)Dimensional analysisComputer animation
01:30:28
Computer animation
Transcript: English(auto-generated)
00:08
In the meantime, you get the material. I will just give you a few words about this tutorial. Oh, sorry. What's the idea I have in mind with this tutorial?
00:26
You should have a list of notebooks here. The original title of this tutorial was 10 Steps to Keras. The idea was to introduce the Keras framework into 10-ish steps,
00:41
and these steps are going to be notebooks. You have a preamble notebook here, where you can see all the instructions. In the back, can you read that? Is it fine for you? No, probably it's a bit tiny. Do you want me to switch? Better? Good.
01:05
The goal for this tutorial is to introduce the main features of Keras, and to learn how you can actually implement deep networks using Keras. One thing I should definitely want you to learn and to know
01:21
is that Keras, at the end of the day, is very easy to use, but it is not for easy tasks. You can actually do complicated stuff using Keras. This is the outline for this tutorial, more or less. These are the topics we're going to go through in the next hour and half.
01:42
Multilayer fully connected networks is to introduce the very basic features, and then we go through each layer's features, embeddings, convolutional networks, hyperparameter tuning, how you implement custom layers in Keras, and how you actually use deep neural networks, in particular convolutional networks
02:03
and residual networks, transfer learning and fine-tuning, and if we have time, recurrent neural network and other encoders, and in the end, multimodal networks. There's a lot of stuff, as you can see. All the materials are online. You can actually play with it. All the code has been thought to be run on your notebooks,
02:25
so I actually simplified some data examples. You should be able to run it even if you don't have GPUs in your machines, but of course, if you have it, it's definitely better. It just runs faster.
02:42
I'll try to go fast on some topics. At any time, if you have questions, please feel free to interrupt me. There's no problem at all. You will find the requirements. Basically, it's going to be Python 3. 3.4 should work as well, and also Python 2, I don't really know.
03:06
Just a quick note about setting everything up. A friend of yours, some people from the audience, sent me that actually it may be a bit cumbersome to set up the Keras folder on Windows machine.
03:26
If you try to Google it, what I was saying here, the .keras folder, where the actual Keras configuration file should stay, even on Windows machines, should be placed in your home folder.
03:41
So, Googling it a few minutes ago, it ended up to be that the home folder is what you get in return from this Python function here. Typically, something like C semicolon double backslash users,
04:01
double backslash and your username. This is the place where you should create your .keras folder. I don't know if you experienced the same problem, but this should fix the problem. So when I say .home folder here, I assume even on Windows machine
04:22
that that is the folder. So you should have, for this notebook, for this series of notebooks, these configuration files. I will tell you later on the meanings of these configurations, but they're quite straightforward. So, I assume you already tried and verified everything.
04:46
That's nice. Good shot. After that, let's start with the first notebook, if you don't mind, which is going to be number one, muscle.io fully connected network. The main goal of this notebook is to provide you an example
05:06
on how you can actually deal with Keras and how you can build networks using Keras. I freely borrowed this payoff sentence saying that Keras is from the Django project,
05:22
saying that the Keras is deep learning library for perfectionists with deadlines. This means that, as Django does, dealing with Keras objects and creating networks with Keras is very easy to go.
05:40
In fact, one of the main important features of Keras is summarizing these sentences from the official documentation, when they say that being able to go from one idea to results with the least possible delay is the key to doing good research.
06:03
In a few words, Keras is a high-level, in case you don't know, Keras is a high-level neural network API written in Python. It's pure Python, so you can actually take a look at the code and even modify it, which is capable of running on top of different deep learning frameworks,
06:23
which Keras calls backends. So far, the officially supported backends are TensorFlow, CNTK, which is the deep learning library from Microsoft, and these have been the latest addition. I think it's been added from three or four versions,
06:44
and Theano. Historically, Keras supported Theano TensorFlow, now CNTK, and just for you to know, there's an official support of MXNet in the backend.
07:00
I'm saying it's not in the master branch of the Keras code. That's because, for some reason, the developer of the MXNet backend are now supporting the Keras 1 API, but now we're using Keras 2 API,
07:22
so there's been an API change some time ago, and for some reason, MXNet is still stuck on the Keras 1 API, so that's why the MXNet is not yet officially supported, but if you use Keras 1 version API, you can also have MXNet as a backend.
07:41
Having multiple backends is a feature that you will appreciate very soon in the next example. Starting with this very simple example, it's a Kaggle challenge data.
08:05
I'll just read very briefly the problem. So the Hotter group is one of the worst, biggest e-commerce companies, and the consistent analysis of the performance of product is crucial in this company. However, due to diverse global infrastructure, many identical products get classified differently.
08:22
From this competition, we have provided a dataset with 93 features for more than 200,000 products. Of course, for the sake of these projects, I reduced drastically the number of samples. In our data, each row corresponds to a single product, and so we have a total of 93 numerical features, which represents the count of different events.
08:43
The data should be placed in your Kaggle Hotter group folder. For the rest of these examples, we're going to play a bit with a very simple model, which is a logistic regression. For those of you who don't know,
09:00
the logistic regression is a very simple supervised learning algorithm characterized by the usage of the logistic function, also known as sigmoid function, which has the shape, so the S shape. That's the sigmoid name. So after preparing the data and getting the data,
09:20
in a few words, what we do here is just loading the data from the CSV file. There's actually some pandas code running in the backend, and then we pre-process the data so that we just scale the features using standard scaler in scikit-learn, and then we pre-process the labels to have one hot encoding of the labels.
09:42
Are these terms clear to everybody? So I can go safely with that? OK, fine. Thank you very much. Taking a look at the final results we have, we have nine classes and 93 features, as we expected. So these are the labels. Of course, these are anonymized in some way
10:02
to increase the toggle data. And just to give you an example of what is the result of one hot encoding, this is something like we just have for each sample just one element corresponding to the activated bit, the so-called hot bit, and that's why this sort of encoding of the labels is called one hot encoding,
10:23
because just one single label is enabled. Of course, this is a single class problem. Every single sample is just one class in our data. So if you want to implement it in pure Theano code, I will go very fast about that, but if you want to implement a logistic function using Theano,
10:48
these are the code you have to do. So you basically have to declare these objects, so the Theano matrix and the shared variables here, those that are actually modified during the training,
11:03
that's the difference. So you have to implement manually everything, basically. So it's just the probability target, the cross-entropy loss function we want to use, but the only thing, which is not only of course,
11:20
something which is provided from Theano is the calculation of the gradient. So Theano provides out of the box the calculation of the gradient. So that's one reason why you don't want to really implement this code in pure Python code, so you want to rely on these libraries. So finally, you have to create this Theano function,
11:42
so that connects every single tensor object you're actually creating. So you provide the input, you provide the output there, and this is the rule for the updates of the shared variables here. So of course, these are the weights and the bias, the simple model here.
12:00
So we compile all the code. In case you don't know, this is based on symbolic execution, so you have to compile it and then you run it. So everything you set up previously has not been executed yet until you compile it and you run it. Then you create a new function for the prediction, using all the tensors already provided.
12:22
So this is for the training, this is for the prediction, and the rest of the code is just repeating the training process for the number of epochs. So we're actually implementing the training process manually. And this is the way Theano does this very simple problem.
12:41
As you can see, it's very low level, so you have to implement it at each step. If you want to use TensorFlow, as you will immediately understand, the level of abstraction is exactly the same as Theano. So again, you have to specify a placeholder here.
13:04
So let me just execute it, because I just want to show you something. So you have to create the placeholder here. Again, the placeholder are going to be the variables that are going to be updated during the learning into the graph.
13:21
So as I told you, this is symbolic execution. If you print X, you just got a tensor object with no shape yet, because you don't really have passed the data, fed the data to the graph. The only thing you know is that you have 93 dimensions, so 93 features. And Y will be a tensor having a total number of classes as dimension.
13:45
That's a second dimension. TensorFlow ships with this tool, which is very useful, which is called TensorBoard. I'm going to execute it just because I want to show you. This is very nice. So this is a little bit of engineering of the TensorFlow code.
14:06
So we're going to again create all the variables here. Then TensorFlow provides a little bit more abstraction with respect to Theano. Having this softmax function already implemented, then the matmul operation is going to be the equivalent
14:23
of the dot operation in NumPy, if you come from that background. And then we're actually adding some additional information that will be used in this TensorBoard tool. So we're going to record some histograms during the learning and some reference to these color objects we have here.
14:46
This is very intuitive because we're actually defining different scopes. So we're defining the scope of the model here, the scope of the cost function, the scope of the training process. So they're going to live in different scopes so you can actually better manage them.
15:00
And finally, the accuracy scope, which is going to be the metric scope we want to use to evaluate our model. So then, TensorFlow provides you this file writer, which is actually something that stores the results of the graph you have. So the computational graph you have in the backend.
15:21
And after that, you just run it, defining TensorFlow session. So everything living in this TensorFlow session can be interpreted and then run during the learning process. But again, what you have to do is to implement manually all the learning process by actually writing code
15:44
and providing the data for each of the tensor you want to change during the learning process. So, oops, that's something went wrong, of course. That's the demo approach. I don't know why.
16:01
Let me just... Sorry, let me just... I just want to clear the memory of the GPU. Well, I'm actually running on some GPUs in a remote backend.
16:21
So let me run again. Sorry about that. So I have to load the data.
16:53
Come on. Cold start.
17:02
Very cold one. So let me skip the other part. So let's go to TensorFlow. All right. Okay. Now this is going to work, hopefully.
17:24
Please. So the epochs, we have just run it for 25 epochs, I think. The number of epochs is... I can't remember. So we have 25 epochs. And this is the value of the accuracy during the different epochs.
17:41
And this is how the plot we have just started it. How we plot different costs. How the cost changes during different epochs. So this is the trend. So I just want to show you this TensorBoard tool, which is going to be something we're going to see also integrated in Keras.
18:02
That's why I'm showing it now. So I'm running... So this tensorflow.tensorboard command runs a server, typically on the port 6006. And what this does is to read the log file that has been written
18:22
during the execution, in particular the file stored in this directory. All right. So hopefully we're going to see something. So we have these colors here. So we have the plots for the accuracy automatically provided,
18:42
the plots for the cost function during the training, and all the different plots for the mean weights and the mean bias. All right. This is very easy to use and very useful when you face real problems. Another thing that may be useful is to take a look at the graph. So this is actually the model that has been implemented inside TensorFlow.
19:03
So this is how the graph of tensors has been translated by TensorFlow. So you can actually take a look at all the different nodes here. And each of these corresponds to the different scopes we specified.
19:20
And finally these histograms we built to show the different trends of the cost histogram and of the model histogram. There's something wrong maybe because I just run it multiple times on the save folder. So that's why you see this so strange.
19:42
All right. So let me stop this. So let's see how we implement a logistic regression using Keras, finally.
20:04
As you know, the logistic regression may be implemented as a neural network, just one layer, which has the sigmoid activation function. And so it is. So you have all the data again, and we are going to use the sequential object. We're introducing this new sequential object in Keras.
20:20
The sequential object gives you the idea that you want to build a series of layers stacked one on top of the other. And so you actually instantiate this sequential object and you simply add layers to these objects. So in this case we had just had a dense layer having a number of units in output corresponding to a number of clusters.
20:49
So we specify the input shape, so the number of features we have, and the activation is going to be the sigmoid function, as we expect because we implemented logistic regression. Finally, we add the decision function,
21:01
which is the softmax function, as we did in TensorFlow, and we in the end compile the model again. So we compile, specifying which is the optimizer. In this case we specify the stochastic gradient descent. We actually did the same using TensorFlow. Let me just go back a bit. So when we are in the train scope here,
21:24
we specify that the optimizer was the gradient descent optimizer with the minimization problem. So it's stochastic gradient descent with this learning rate. And we are doing the same here, so we are implementing the same using the categorical cross-entropy
21:40
and the stochastic gradient descent optimizer. After that, we just call this method in KerasObject, which is called fit, and we provide the training data and the training labels, and that's it. So you don't have to implement it manually.
22:01
You don't have to implement the training process manually. You just call the fit method, as you do in scikit-learn for instance, and that's it. Keras provides you automatically a series of default parameters we are going to change during the next steps of this exercise, but in the very beginning you just create the model,
22:21
you compile it, and you fit it. These are the only three steps you have to do to make our Keras model up and running. So as you can see, if you run the fit method, you're going to have a series of ten epochs by default, and Keras provides you this progress bar during the training,
22:42
telling you the time it takes, the values of the loss during the training, and how does it change. This is the very beginning. So now let's have a look at it in a more interesting way, so in more details.
23:02
So the call the destruction Keras is a model, of course, and the main type of model is going to be a sequential object. The sequential object provides you this method which is called add, and so you have different layers. In this example we just had one layer, which is the dense layer, and then the activation layer.
23:23
So the dense layer, this is the dense object, has this parameter. Most of them are default values, and in case you don't know, keep the default values, Keras typically has been designed to have
23:40
the best default values you can get for the different objects, and this is particularly useful when you run it for the first time, especially for optimizers, when you have to set up the learning rate, and depending on the optimizer, different other parameters. The most important things you have to learn and to understand very soon
24:01
is that the first and the only required parameter for this layer, and this is typically for all the layers, is the first one, which is units. As you can see, it's the only one that has not default values, of course, and this is where you put your insight and your knowledge, and the units value
24:22
is going to be an integer value greater than zero, of course, and this corresponds to the number of neurons that you will get in output from the layer. Keep this in mind. It's going to be the output number of neurons, not the input, right? You have to specify the number of input dimension
24:42
only in the first layer. The next layers will get, as input, the output of the previous layers. Automatically you don't have to specify it. So the units is going to be the number of neurons you want in output. So back to the example. Since we just have one layer in this example,
25:02
the number of units we want to have in this network is going to be number of classes because it's the output we want to predict. It's just one layer, right? So it's just number of classes, so in this case it's going to be nine. All right? And as you can actually see, since this is the very first layer of a network,
25:23
we also specified an input shape. And this is mandatory for the very first layer of the network. You always have to specify what is the input shape. Speaking of which, the input and the output shapes of the layers are going to be this one. So in input shape you get an n-dimensional tensor object
25:43
with this series of shape. Remember that the very first dimension of the input tensors are going to be the best size. In other words, the number of samples you're going to provide to the network during the training, right?
26:00
And this parameter is specified when you start the training process, so you call the fit method. You don't have to specify this dimension in the input shape parameter, all right? So let me just go back again to this example. As you can see, the input shape in this case
26:22
is going to be DIMS, so the number of features we have. So this is 93 in this case. We don't specify the batch size. The batch size is indeed automatically specified in the fit method. We're going to use the default parameter here, but we can actually change it.
26:42
So remember, the input shape is mandatory for the very first layer and has to not include and must not include, sorry, the batch size dimension, which is going to be the very first dimension of the tensor you have. And as output, what you get in output from a tensor
27:02
is the batch size again, so the same number of samples, of course, but, as I mentioned, the units you specified as the first parameter of the layer, so the output neurons. All right? Is everything clear?
27:21
Questions? All right. So, before moving on, some very few notes about the star initializers parameter. So, typically, in the dense layer, but in most of the layers, you have this kernel underscore initializer
27:42
and bias initializer parameters. This is just for you to know that, in literature, there are lots of methods to initialize the weights of neural networks, and this is particularly useful, and different methods for the initialization of the weights
28:02
can drastically change the performance you're going to have. So, I just proposed you to take a look, of course, at the Keras.initializers to see what are the functions supported by Keras. There's lots of them. And read this very interesting article.
28:23
This article does not include all the possible initializers supported by Keras, but this is very interesting, and I give you some recommendation paper to read. This is a very interesting topic, and, as you can see, the default initialization method for different weights,
28:41
by default, will change according to the different layers Keras provides. So, if you take a look at the default parameter for the convolutional layer, it should be different. I cannot remember for sure, but it should be different, because for convolutional layers, you have different initialization technique by default.
29:03
Other core layers provided by Keras, you have flatten layer, reshape layer, permute layer. These are going to be so-called operational layers. They're not learning anything here. They're just making operations on the tensors during the flow of the network.
29:23
With the Lambda layer, you can actually specify a custom Python function to apply to the tensor. Please keep in mind that what a Keras layer gets in import is always a tensor object, which is going to be
29:41
physically and concretely corresponding to a TensorFlow tensor object or a Theano tensor object, or, in this case, nowadays also we have a CNTK tensor object. To give you just the idea, when you run the fit and the training process
30:00
using Keras, the feed is not performing any real operation on the GPU. So the Keras code relies on the Python level, and all the real computation on the GPU and stuff is demanded completely to the backend. The main advantage you get from this approach
30:21
is that the same code you write, exactly the same code, is supposed and guaranteed to work on all the backends supported by Keras, without changing any single line of code. Keras does all the job for you, even if there are some very slight
30:41
and tiny differences between, for instance, the handling of the shapes of tensors between Theano and TensorFlow. Keras does all the job for you. You don't have to change anything. That's the power of Keras indeed. Another important layer is the activation regularization,
31:02
so you can actually apply regularizers on the activation functions to, for instance, L1 or L2 regularizations on the activation functions of your network. This picture, kindly provided by my friend Ian,
31:23
shows all the, probably it's a bit tiny, but should give you another view of all the kinds of networks supported by Keras. Keras provides support for feed-forward network, so the classical network, you provide input on the left
31:42
and you get output on the right, so the feed-forward network in these sets of networks, also the convolutional networks, fill in. You have recurrent neural network, of course, support for LSTM or gated recurrent unit network. Also, for unsupervised learning, you can actually build autoencoders,
32:02
variational autoencoders, and sequence-to-sequence networks using Keras code. Finally, Keras also provides support for different optimizers, not just the stochastic gradient descent, of course. There are many of them. You can have AdaGrad, AdaDelta, LMSprop,
32:22
lots of different optimizers. You can actually take a look at the Keras.optimizers package to see all of those optimizers supported by Keras. If you want to data science a bit this example, we're going to play a bit with the hyperparameter of the network
32:41
and the parameter of the feed method. I assume you all know what overfitting is, or underfitting is, something we can do. The one method I'm going to show you, because we're going to use it very heavily, a lot,
33:01
is this summary method for the model object. When you call the summary, you actually have a look on what are the layers, what are the output shape, how many parameters you have, and in the end, how many total parameters you have, and how many of them are trainable parameters. If this does not make any sense to you,
33:20
it will make very soon. So far, the number of parameters and total number of trainable parameters are going to be the same. All right. Data science in a bit this example, we're going to, of course, use some sort of train test split on the data. We have not only training data, but also validation data
33:41
we want to fed into the training process. We're going to apply a technique which is called early stopping. The early stopping technique is something that is typically used when you have big networks and lots of data. The intuition is that in this case,
34:01
we're going to monitor this metric which is called the validation loss. When the validation loss does not change after two epochs and does not change significantly, so you can actually specify the precision you want, the training process is automatically stopped. This is the early stopping.
34:21
This appears differently from other machine learning methods. Unless you specify this kind of tricks when you run a fit of a neural network, you have to specify for how many steps you want the learning to go on. So the number of epochs, so-called. These other objects are the model checkpoints.
34:44
During each epoch, at each step, automatically, this function, this object, records the value of the validation loss again and saves the weights, so the status of the network at that stage,
35:04
and automatically this has been set up to save only the best model. So in the end you get the best weight configuration, leading to the best validation loss value. You can plug both of them using this parameter
35:21
in the fit method here, which is the callbacks. Keras provides this way to plug additional callback objects. They're called callbacks because they're automatically called by the framework before the next batch or before the next epoch is going to be processed during the training. You don't have to do anything. You just pass the callbacks.
35:43
We're going to specify a different number of epochs. We're going to specify what are the validation data in this case. We previously provided just training data and these are the two mandatory parameters, the very first two. Validation data again and we specify the size of the batch
36:01
and we also specify that we want some verbosity but this is the default one. So we run it for 50 epochs and this goes on and on. Additionally, we also have validation loss. Since we have validation data in this case,
36:20
we're going to also have the computation of the validation loss. So during each step, we have a log of how the loss changes, the training loss changes, and the validation loss changes. Okay, moving on. Very briefly, moving on just a bit, so we want to create a multilayer fully connected network.
36:43
When we say fully connected network, we're going to expect to have all dense layers. The very first example of a multilayer fully connected network is the multilayer perceptron. The multilayer perceptron is characterized by having one input layer, one hidden layer,
37:01
and one output layer. So technically, when you have this sort of network, so with just one hidden layer in the middle, you're calling it multilayer perceptron. When you have at least more than one hidden layer, you're allowed to call it deep network.
37:21
So when you have at least two layers in the middle, it's going to be a deep network. So in this case, we're going to have a multilayer perceptron because we have one input layer and this is just the decision layer, so it's going to be a very shallow network yet. As you may assume, the way you plug additional layers
37:44
so you build a multilayer network in Keras is just by calling the add method to the sequential object multiple times with different layers. So again, make sure that you specify the input shape in the very first layer and the second layer doesn't have to
38:01
because it's automatically induced by the previous layer. Alright, so again, we compile the model, we print the summary, so we have dense number two, dense number three here, and the final activation, which is the softmax for the decision function. Of course, since we have 100 neurons here in output
38:23
and nine in the end, the total number of parameters has increased, as you may expect. Then we run the fit model again and the validation loss changes drastically considering the same data.
38:42
Yes, so this is 1.6 taken just randomly and this is 0.7, so this is going to be a better model with respect to the previous one with our data. If you want to play a bit, there's an exercise for you. The idea is to play a bit with it, adding as much layers as you want,
39:06
better to add a couple of, and run it in your machine to be sure that everything's up and running and to see that it's very easy to run using Keras.
39:24
The next steps will be taking a brief look at what is called the Keras backend. I just want to show you the Keras backend because this is how actually Keras allows you to have multiple backends
39:48
running for your deep learning code. To give you some hints,
40:03
the way to play with it is just to add additional dense layers. How many layers you want to add, nobody knows. You just play with it and you see if that makes any difference to the final results.
40:20
That's the takeaway typically for deep learning. At the end of the day, you don't really know what you're doing. Let's have a look at this Keras backend. The Keras backend is the module which is provided by Keras
40:44
that integrates to real backends. If you take a look at this .backend package in Keras, it's going to be the Keras.backend package.
41:02
If you take a look at the code, you will see that in that Python package you will find a module called TensorFlow.backend.py, Theano.backend.py and CNTK.backend.py.
41:21
This backend is a sort of wrapper module that calls the real backend depending on the configuration file or the environment variable you have set up in your settings before running your code. We have actually set up our TensorFlow backend in the configuration file,
41:45
as you may have assumed, so I just want to show you. If I'm going to cut the Keras.json file, among these configurations we have the backend and we are specifying TensorFlow.
42:01
If we just switch these directive to Theano, you're done. You just rerun your code and Keras automatically switches everything on Theano backend. How this backend works?
42:21
As I said, the Keras.backend is the wrapper for the other module. All these modules, so basically the Keras backend provides the interface and all the functions provided in this Keras backend module are assumed to have a counterpart in each of the backends supported. So when you're actually calling the...
42:43
Okay, let's have a look at it, so maybe it's easier to do. We're actually using the Keras backend API to reimplement again the logistic regression example, but in this case we're still using Keras code.
43:01
Let's have a look. Here we're creating placeholder objects using the K module, which is the Keras backend. We just call it K. It's just a sort of naming convention here. The K is going to be the Keras backend module and we're going to create a placeholder object. The syntax is very TensorFlow-ish.
43:22
It's very similar to TensorFlow. But what actually happens in the backend is not TensorFlow always. It really depends on the backend you're using. If I'm using TensorFlow as the backend, this instruction is going to create a TensorFlow placeholder object.
43:40
If I'm going to use Theano, that is going to be a creation of Theano Tensor object. This is the backend API. We're creating a variable object again. The level of abstraction has changed drastically. It has moved again to the same level of abstraction you have using TensorFlow Theano packages.
44:06
But with a big difference that this code is supposed to run, seemingly, using all the backends you want. Those supported, of course, without changing any single line of code. Just the configuration file.
44:21
This is going to create a TensorFlow tensor if you're using TensorFlow. I repeat the question. The question was, why would you actually have support for multiple backends?
44:43
Because of many reasons. Because the different backends have different performances. You want to leverage on the benefits of those switching the backends. Sometimes ago, the memory management of TensorFlow was really rubbish.
45:05
So you typically wanted to switch to Theano to have a better allocation memory. Now it's not the case. Because sometimes you want to try different backends to test if everything is working with the same performances.
45:26
Because of numerical reasons, because of numerical optimization. For instance, I've just read a sort of benchmark article, providing a sort of overview of the benefits provided by CNTK.
45:46
CNTK, among those backends, is the only one using MPI in the backend. Multiple Message Passing Interface. So for multiple processing. As far as I know, CNTK is built on top of MPI.
46:04
So far now, TensorFlow supports MPI in version 1.3, so the latest one. And considering the different network architectures, at the end of the day, CNTK has turned to be the most efficient for the recurrent neural networks,
46:21
which are going to be the very heavy network to train. The one that takes more time to train. So for instance, for convolutional networks, TensorFlow and CNTK are sort of the same performances, according to that benchmark. But CNTK won on recurrent neural networks, for instance.
46:42
So maybe you want to switch to CNTK without changing any single line of code. That's a benefit. And you don't have to mess with the very tiny details of the different implementations, because Keras does it for you. All the three backends have totally different APIs and totally different syntax for creating tensors.
47:05
Keras does the job for you to translate all of them, and the way it does is using this Keras backend. So when you're actually running, that's it. This is what I was looking for. When you actually run this function, the k.function, this is calling matmul for TensorFlow and the dot function on Theano.
47:25
I don't know the equivalent in CNTK, but TensorFlow decided to call the dot product matmul, matrix multiplication, for some reason. When you call k.dot,
47:43
it's going to be mapped to the corresponding function in the corresponding backend. Automatically. That's it. The question could be, why would you do that? Why would you implement it using the Keras backend?
48:01
What's the reason why Keras provides you this API to be used? Because this is the same API you're actually using when you're creating custom layers. If you create custom layers using the Keras backend API, you're automatically sure that that implementation is going to work on all the backends provided,
48:23
because you're using the Keras backend functions. You're not using the TensorFlow or Theano real functions. You see what I mean? The rest of this notebook is to show you how you basically do the same TensorFlow-ish stuff using the k backend,
48:46
and there's also an exercise for you, which I'm going to skip, which is the implementation of the linear regression. I'm going to skip also because you indeed have the solutions in your material, so if you load it, you're done.
49:04
In case you don't have any questions, I would just move forward. Yes, please.
49:20
Yes, you're actually stuck. Sorry, I repeat the question. I'm actually advertising all the benefits of Keras, and so the question was, what's the downside of it? I may say that the downside of Keras stands on the downside of the supported frameworks, supported backends.
49:42
I would say two things. Not all the functions are really supported by Keras backend, because Keras backend, since it has to provide you that the same code works seemingly on all the backends, in case there are some very specific functions supported by one specific backend,
50:05
you're not going to find it in the Keras backend API. But a sort of note about this is that if this framework, the one that has this tiny function, is going to be TensorFlow,
50:22
you can actually manage to do it, because Keras is going to be integrated in TensorFlow, and it is better integrated by each version. I will show you something in the end.
50:42
That's one problem you have, and another downside I may say is that since Keras is not doing any real computation, you can do with Keras everything you can do with all the frameworks.
51:01
For instance, so far in TensorFlow, which is the one I know the best among the three, there's no official support for dynamic graph computation, so everything is going to be static. This means that when you create the graph, you have to compile the model indeed.
51:24
So this means that after you've created the network, you're compiling it, so you're creating the network beforehand, you compile it, and so the resulting graph will be put, all the tensors will be put into the memory of the GPU, typically, if you have it.
51:43
And so that graph, it is very difficult to change it during the training, you cannot do that. The PyTorch approach that I believe we will see in the keynote we have in two days, you will see that that approach is totally different, that's called dynamic graph creation,
52:00
so you create it one step at a time during the execution, and so that's our different approach to that. Keras does not do that because PyTorch is not supported yet. I say yet, I don't know if it will be supported at some time. Yes, default backend? Yes, so the default backend was the question,
52:26
and the answer is theano is the default one. You can actually go to the configuration file or, as I showed you, you just, for instance, let me do this,
52:43
I can show you in a terminal, maybe it's better than one to risk to waste the notebooks, but indeed, you have two ways.
53:00
Alright, so if I print, I cut keras.json file, you will see that the backend is tensorflow, because I set it.
53:23
If I do change this directive, if I specify theano here, or even CNTK,
53:41
now the backend is theano. If I actually go in Python and import keras, this is using theano backend. You see?
54:02
If I do this just to play a bit, CNTK should be installed. This gives me the opportunity to say a few words about that,
54:21
using CNTK backend. I did not include examples using CNTK, and mainly for one reason, because CNTK is still only supported on Windows machines and Linux machines. There's no support for macOS. You can actually have CNTK installed using a Docker image,
54:44
so I prefer not to include also these into the setup of this environment. It was not really required. The main focus was on Keras, of course, but this was the main reason. The reason why is it, I don't really know. It's Microsoft stuff.
55:03
In the end, it's always Microsoft. Let me switch back to tensorflow. Another way you can do it is by setting as keras underscore, let me just move a bit so you can reduce it. You can actually export,
55:24
no, let me do this. Yes, I can do it. Keras underscore backend environment variable, backend, equals to piano. Okay, let me show this first. If I do this,
55:42
if I import Keras, now it should be tensorflow again. All right. It takes some times, typically. Okay.
56:01
If I do, oh, I can do this. Keras underscore backend equal piano Python minus M import Keras. Oh, sorry.
56:25
Is this, how's the way to do it? Minus A, C. Thank you so much. Using piano backend. You can actually define this Keras backend environment variable and you switch the backend without changing the configuration file.
56:42
This is because the configuration file is just one file, so if you want to have multiple Keras instances running, it's not a good idea to change the configuration files multiple times, so you just define the environment variable there, the Keras underscore backend, and you're done.
57:01
All right. Okay. Are there any other questions? Fine. So let's move on and let's talk about this MNIST dataset. I believe all of you already know what the MNIST dataset is about.
57:25
In scikit-learn-word, this is also known as the DJIT dataset, so the MNIST dataset is one of the widely used dataset for deep learning experiments, and at some point they decided that
57:43
it was fair enough to not use it anymore, but the MNIST database contains images, black and white images of handwritten digits, and so you want to network, in our case, to be able to identify the corresponding digits,
58:04
taking a look at the image. All right. So this notebook is just to show you that Keras, just like scikit-learn, ships with these datasets,
58:22
dataset module. In these datasets, MNIST database is provided, so you can actually import from keras.datasets MNIST module, and with the load data, this method downloads the data for you in case you don't have it,
58:40
and the downloaded data will be placed in your .keras folder, and you can actually get the data by loading it automatically. The rest of the notebook is just for you to play a bit, so I was asking what type of X-Train is, and the answer is, of course, it's a numpy array.
59:07
So once we have loaded, the type of X-Train is going to be numpy ndarray. All right.
59:21
The type of Y-Train is going to be ndarray again. So the load data automatically gives you back the data into numpy arrays, which is something you typically want to have when you have to experiment.
59:41
How many observations in TrainGet do we have? So, again, it's going to be the X-Train.shape of zero, of course, the number of samples. We're going to have 60,000 samples in the training set, and 10,000, I will tell you, in the test set. All right.
01:00:02
how many, what's the number of observation for each digits, so it's going to be train.shape again, but I will tell you that's 60,000 of course, blah blah, so the dimension of x train, so you take a look at the x train.shape, we're
01:00:23
going to have 60,000 of 28 times 28 images, so we're going to have images of that size. Since the MNIST images are black and white images, you don't have the channel information, so it's black and white, so it's just two dimensional object, no
01:00:43
three-dimensional object. So very quickly, if you want to take a look at these images, you just have to, you can actually print it. Alright, so I'm going to print the x train of 0 here, and this is going to be a sort of handwritten 5. Ok, so we're going to use this data set to play
01:01:04
with the rest of the two notebooks. Ok, the first notebook is again about fully connected fee-forward network, let's say the more complicated feed-forward network,
01:01:24
we're going to take a look at, are going to be the convolution network. Some recall of the main things we already did, so sequential object and
01:01:40
the graph model of the objects, we talked about the dense layer aka fully connected layer, in this we're going to introduce a new layer, which is called the dropout layer, and we applied a binary cross-entropy or a categorical cross-entropy to our data using the stochastic rate descent
01:02:01
optimizer. Here you have the links to have a look at what Keras supports as optimizers and loss functions. We now introduce this new activation function, which is called the ReLU function, so the rectified linear unit. This function is going to be very useful in a convolution network. This function has
01:02:24
lots of mathematical properties, and this is the mathematical definition of the function, so this activation is going to return the maximum value between 0 and x, so this means that all the negative activations will be automatically dropped out. When you calculate the
01:02:43
derivative of this function, it recalls the sigmoid logistic function, so the derivative of this function is called the soft plus function, and this function, despite its simplicity, is one of the coolest
01:03:01
things invented in the deep network words many years ago, so keep in mind this function as it's heavily used in convolutional neural network. The exercise now asks you to build this network. This sort of syntax is quite standard when you have to specify the architecture of the network,
01:03:23
so in this case we're going to have a fully connected network, fully connected layer, with 512 neurons in output, plus the ReLU function activation, again the same configuration, and finally another fully connected layer, FC, with a total of number of classes neurons, plus the softmax activation function, so if
01:03:44
you don't want to bother too much about it, you just load it and that's it. So you create the sequential object and you just plug all these layers using the add model again, it's very simple and straightforward as you can see, so you have dense 522 activation ReLU, and input shape
01:04:03
because it's the first layer, has 784. Can you guess why this number is the input shape? Yeah, it's 28 times 28, so we're going to linearize the images and provide it as just one big tensor or vector. This is because
01:04:29
dense networks are not supposed to handle multi-dimensional arrays, but you can actually do it. Again, dense layer as requested with the
01:04:42
activation ReLU and the last 10 layers, so the number of classes, with the softmax activation. We specify the categorical cross-entropy again and the SGD optimizer. Please note that this time, differently from previous examples, when we provided the optimizer parameter, we provided an actual Python
01:05:05
object, normal strings. To be honest, the code is exactly the same, because when we provide SGD object here, without any additional parameter, we're actually setting a new object with the default parameter values, which is
01:05:21
exactly the same we get when we provide the string SGD. But this was just to show you that you can plug not only strings, but real objects. If you're curious enough to understand how these string parameters are indeed implemented in Keras, take a look at the code and I
01:05:46
will tell you it's a matter of globals in the namespace. You will find in the loss package an object which is called exactly in this way
01:06:03
categorical underscore cross entropy. When you provide parameters using strings, Keras looks for that string in the global namespace and so returns the object. That's the trick. So this is the case for reload, this is the
01:06:21
case for softmax, and blah blah blah. So you can indeed provide real Python object, a real Python function, or strings if it comes to handy. And moreover, we're also adding this additional parameter to the compile method, which is the metrics. So in this case we're going to record the accuracy during the
01:06:43
learning phase. When you run it, Keras automatically stores the accuracy.
01:07:05
So we create the model here. Let me just run it. So we create the model here, we get the data, the MNIST data. Of course the shape is the one we had
01:07:22
previously, so we have to do a little bit of reshape because the input shape of our network is going to be 788, so we're going to reshape the numpy arrays. This is pure numpy operations. And finally we're going to apply this to two categorical function provided by Keras utils to
01:07:42
translate the labels in one-hot encoding. And this is because, maybe you didn't get it, but we want to have the one-hot encoding for the labels because the last layer has 10 neurons in output. So since the last
01:08:01
layer is the one that will be used to calculate the error function on the labels you have, the labels must have 10 dimensions. That's why you want to apply one-hot encoding. You see what I mean?
01:08:24
So we did it. We just trained and test split. So we have this one, this is just one images. Of course this was different because it's been this one-hot encoding. So for this x train, the label of y train of 0 is going to be 9.
01:08:49
So all 0 apart from the last one. We take another one from the validation set. This is 1 and this is the label, of course this one, as we expected. Okay, now we train the model here. We just run it for two epochs.
01:09:05
We do not expect very big results, but just to make it running. And this is how actually it's working. Okay, so we have this loss, we have this, and take a look here, we also have the accuracy for training
01:09:23
data and for validation data because we recorded that metric in the compile method. So we have loss accuracy, validation loss and validation accuracy. Something which is new now, for you maybe, is that we recorded the output of
01:09:44
the fit method and we call it network history. That's because by default, recalling you the callbacks of Keras, by default the model.fit method returns you an history callback object. This callback object embeds all the
01:10:04
history of the training process for you to take a look after that. So, if we want to plot the network performance trend, we have this network history object and we're going to, as you can see, the network
01:10:20
history object at this attribute, which is called history, is going to be a dictionary having keys with names corresponding to those returned during the learning phase. So we're going to have a loss and a val loss, we're going to have an accuracy, ACC and a validation accuracy here.
01:10:43
And so our intention here is to plot the differences of training and validation accuracy and the training and validation loss and the training and validation accuracy. Of course, these are going to be just two epochs, as you can see,
01:11:00
so 0 and 1, 0 and 1 again, so this is not very meaningful. It's just to show you how you can do it. After two epochs, we get just more or less 88% of validation accuracy, but if you increase the number of epochs, you will get definitely better results. So in this case, we've added a new
01:11:27
parameter for the stochastic gradient descent and we fit again the model here. Let me skip it, it's not very important. Okay, let me introduce another important layer, which is the dropout layer. The dropout layer is
01:11:47
a very simple one. The idea of the dropout layer is to be used during the training phase to avoid overfitting. The idea is that when you put the dropout layer in the middle of your network, you specify a drop rate.
01:12:05
By that drop rate, some connections from the previous layer to the next layer, neurons, will be dropped randomly. So that's the idea of the dropout layer, no more than that. Something you may keep in mind is that the Keras API
01:12:27
expects you to provide the dropout layer. Some frameworks expect you to provide the rate, the retain probability, not the dropping probability. Keras wants the dropping rate, so for instance, if you specify a dropout layer
01:12:46
of 0.2, this means that you have 20% of probability to drop the connection. Dropping the connection means that you basically put a zero for that neuron from the previous layer to the next one. The intuition is that when you do it randomly,
01:13:06
only during the learning phase, you're doing it because you want to have some variability in your data. So you're randomly dropping, muting some connections between neurons to have a variability in your data
01:13:27
to avoid overfitting. Just to show you this, if you import the dropout layer from the Keras layer, and you take a look at the code of the dropout,
01:13:42
as I told you, the Keras code is Python code, so you can actually take a look at it. Let me see if I can make it bigger. No, this is not. Where is it? I made a mess, as expected. Sorry.
01:14:03
How is it? Is this one? No? Too many things opened. That's why I do too much stuff. All right, I made a mess.
01:14:20
I'm sorry about that. I had good intentions, I promise. Let me go back to it. Sorry. That's because. If you take a look at it, this is the implementation of the dropout class in Keras code.
01:14:47
When you see it during the call method, this is the interesting part. It returns the returning of this k.InTrainPhase function.
01:15:01
If you take a look at it, of this k.InTrainPhase, the meaning of this function is to select x if the network is actually in the training phase. It's to select alt otherwise, so it's in inference phase.
01:15:21
Again, back to the dropout, this means that if the network is during the training phase, this layer returns the dropped inputs, otherwise it returns the inputs as it is. You don't have to worry about the fact that the dropout is only applied during the learning phase,
01:15:47
because it must be only applied, of course, only during the learning phase, because it's supposed to avoid overfitting. This is some internals I want to show you. Apart from this, this exercise just asks you to change a bit the code.
01:16:03
You actually have it down there to change the previous configuration using the dropout layer. Again, see the difference in performances. These are the differences, so this starts to have more sense on the data you have,
01:16:23
especially with the fact that now the accuracy reaches the 90% even if you have only three epochs. Of course, if you continue training at some point, you will start overfitting, so we can try plugging some early stopping into the network to skip it.
01:16:46
Moving on to something more interesting, something you can actually do with the Keras layers, sequential objects, I don't know, how much time do we have? Half an hour?
01:17:01
Five minutes? Kidding, right? It's not five minutes, it's six minutes. Sorry, I usually have lots of stuff to say, I'm sorry. Anyway, let me just say this.
01:17:22
Something you can do using the Keras object is to take a look at it, and it's very handy to see, especially when you have a big network, what are the layers in it.
01:17:42
You can actually iterate using the layers attribute inside the model, so you can take a look at it and take a look at the different parameters of the layers. As you can see, there's a lot of them, even with this quite shallow network. Something very interesting is to extract the hidden layer representation of the given network.
01:18:04
This means that sometimes it is very useful to see how the network is actually interpreting and seeing the data inside the network, so it's just a way to open the black box of the deep learning.
01:18:20
There are many ways to do it. One simple and most intuitive way is to create a network, train the network, create another similar network, and that network is going to be exactly the same network
01:18:44
up to the last layer you want to take a look at. Imagine that we want to see how the data changes after the first dropout. Remember, we have a dense dropout, a dense dropout, and another dense.
01:19:03
We have the model already pre-trained. After the training, we have a set of weights. We want to see what's the internal representation of this layer. One straightforward way to do it is to create a truncated version of this network,
01:19:22
initialize the weights for these layers by setting each layer's weights to the same weights of the target model. We compile the truncated model and then we just run it in inference mode.
01:19:42
We call it predict, not fit in this case. This is one way to do it. This is, of course, the Python way to do it, so you're actually dealing with Python objects here. Another less intuitive but definitely more effective way to do it
01:20:02
is to leverage on the underground graph of tensors you have, letting TensorFlow or the backend you're using to do the actual job. You can actually manage to implement this function. There's this implementation at the very end of the notebook.
01:20:25
You implement this function, which is called get activation. You pass it the model, the target model you want, the target layer, and the actual data you want to take a look at.
01:20:40
We're actually using the K function here. We are creating a function that has the input tensor of the first layer as input and the target layer tensor as the output tensor.
01:21:01
This K function is dealing with tensors. We're passing the tensor of the input layer as input to this function and the output of this function will be recorded to the output tensor of the target layer. We call this function on the data and that's it.
01:21:25
Since these tensors are automatically connected inside the network, when you call the activation function, you pass the data through these tensors, all these tensors are connected, so they're actually calculating
01:21:42
the activations of each tensor up to the target layer we want. This is less intuitive, I know, but this is definitely more effective because we're actually dealing with TensorFlow or Theano in the very end. Why would you do that? Because you want to take a look at
01:22:03
how the network sees the data during or after the learning phase. Typically, you end up doing something like this. Imagine that we have already processed it. You don't have to imagine because it is what you're seeing. We already processed the MNIST data.
01:22:27
We want to get that internal representation of features, so 512 features from the 784 we had initially, and we wanted to plot this data in two dimensions, so we applied some manifold learning on it.
01:22:43
We call it the TSNee in this piece of code, so we're actually calling it the TSNee from scikit-learn, and so we're transforming our data using these hidden features. In particular, we're just getting the first 1,000 samples, not all of them.
01:23:03
After that, we want to take a look at how these data are scrambled in the space and with each color corresponding to the different classes we have. Maybe we can see it interactively using the Bokeh library.
01:23:25
Let me reduce a bit the dimensions so we can see the labels. No, maybe not. This is quite expected in some way because, for instance,
01:23:41
zeros are all here. Here you have, I think, all the sevens, but inside them you still have some nines, which couldn't make sense. We're dealing with handwritten digits, so maybe some nines may be written on some seven, maybe. There's a lot of mess here, and these are the most difficult classes for the network.
01:24:05
The black ones are the six, and the six may be messed with the five. Not really. Maybe with this one, with the four. Well, this is maybe strange.
01:24:20
There's some mess here, so you have the tree with some two. This is just to take a look at what's happening inside the network. This is quite handy and useful to do. This is typically something you want to do when you want to take a look inside at what's happening during the learning or what happened during the learning
01:24:41
after the training phase. Unfortunately, there's lots of stuff I want to show you, but we don't have the time to do that. If you have time and you want to, take a look at the materials. Everything is on GitHub.
01:25:02
Please give me feedback if you have it. PR if you have it. If you spot some errors, of course there will be. I will definitely be more than happy to do that. I'm going to conclude in case you have questions.
01:25:22
Yes, please. The question was how the last method to visualize the hidden representation, how that method works with non-sequential networks.
01:25:43
What do you mean by non-sequential networks?
01:26:02
Basically, you cannot have not connected graphs in Keras. Otherwise, you end up with errors. That's particularly interesting because if you take a look at the implementation and how Keras detects unconnected graphs, it's a bottom-up approach.
01:26:24
It starts from the last layer and goes up to the input layer and it traverses the graphs. All the tensors it finds during the path are all connected to the same graphs. At the end of the day, you always have connected tensors.
01:26:41
You cannot have unconnected ones. And so, the sequential object. The sequential object is just a Python wrapper object. At the end of the day, you always have tensor objects connected one after the other. You can indeed have tensors connected to multiple other tensors.
01:27:05
You can have it. You can have multiple... This is the content of this notebook, number 10, the multimodal network. Sorry, it's very tiny. With the multimodal networks, you can actually have multiple import network,
01:27:22
so network accepting more than one import tensor, and multiple output network, so network having a multi-objective problem, so multiple losses to optimize during the training. But at the end, you always have a connected graph,
01:27:40
so you can always reach the target tensor you're interested in from one other tensor. There's no way some tensor is not connected to any other, I think.
01:28:03
Yes, but you can... You said network topologies may be much more complicated than the one we worked on, but there should always be a way to reach tensor inside the graph. Otherwise, it's not connected.
01:28:23
You can always do that. Yes, please. The question was about requiring your network and how you structure the data. The difference with requiring your network
01:28:40
is that you provide data in sequence, and so that layer expects you to provide a sequence of tensors in input, and you specify the length of the sequence. You may specify the length of the input sequence and the length of the output sequence. What's the number of sequences you want to take into account in input, and what's the dimension of the output prediction?
01:29:02
How many objects you want to... How many tensors you want to output? That's the difference. Yes, please. Sorry, sorry.
01:29:23
Yeah? Yeah. So the question was, in my example of the MNIST, the validation accuracy was better than the training accuracy. It shouldn't be the way around. The reason is we actually trained that network for two or three epochs,
01:29:43
so it was very few number of epochs to train. So it's totally meaningless. It's just to show what you can do in Keras. It's not real performances, of course, but of course, thanks for pointing out. But it's totally meaningless. Don't rely on it.
01:30:03
It's just playground. Any other question? OK, so again, thank you very, very much for being here. Please let me know if you have questions or inquiries about the materials. Enjoy with it.
Recommendations
Series of 43 media
Series of 52 media
Series of 2 media