Painting with GANs: Challenges and Technicalities of Neural Style Transfer - TIB AV-Portal

Painting with GANs: Challenges and Technicalities of Neural Style Transfer

00:00

1

Sachdeva, Anmol Krishan

Formal Metadata

Title

Painting with GANs: Challenges and Technicalities of Neural Style Transfer

Subtitle

Building Artistic Artefacts using Generative Networks

Title of Series

EuroPython 2020

Number of Parts

130

Author

Sachdeva, Anmol Krishan

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/49925 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

A lot of advancements are happening in the field of Deep Learning and Generative Adversarial Networks are one of them. We have seen GANs being applied for photo editing and in-painting, generating new image datasets and realistic photographs, increasing resolution of images (Super Resolution), and many more things. Some people have also exploited GANs for generating fake content. All the above-mentioned examples are result of a technique where the focus is to generate uncommon yet original samples from scratch. However, these examples have very less commercial applications and GANs are capable of doing much more. The focus of this talk is a technique called "Neural Style Transfer (NST)" which has numerous commercial applications in the gaming world, fashion/design industry, mobile applications, and many more fields. Challenges and technicalities of NSTs will be covered in great detail. We will teach the machines on how to paint images and utilize Style Transfer networks to generate artistic artefacts. The flow of the talk will be as follows: ~ Self Introduction [1 minute] ~ A Succinct Prelude to GANs [10 minutes] ~ Understanding Style Transfer [5 minutes] ~ Learning about Neural Style Transfer Networks [5 minutes] ~ Loss Functions: Content, Style, Total Variantion [10 minutes] ~ Code Walkthrough and Result Analysis [5 minutes] ~ Challenges and Applications [5 minutes] ~ Questions and Answers Session [3-4 minutes]

EuroPython 202078 / 130

1

29:48

Automate your tasks with Python and publish with Chat Apps

2

17:45

Writing Good Python

3

21:06

Building The Perfect Personalised Menu Using Python

4

24:12

A pythonic full-text search

5

25:00

Everything You Know About MongoDB is Wrong!

6

24:05

gRPC Python, C Extensions, and AsyncIO

7

21:20

Speak Python with Devices

8

20:46

Writing Clean Abstractions

9

29:30

The Hidden Power of the Python Runtime

10

30:54

The Python Data Visualization Landscape in 2020

11

34:32

Telehealth Platform : Python & Django powered

12

29:08

15 Things You Should Know About Spacy

13

25:54

Clean Architectures in Python

14

27:08

pyRT - Computer Graphics in Jupyter Notebooks for Fun and Teaching

15

44:51

Difficulties of Python code development: packages,virtualenvs and package mangers

16

26:54

Full Stack Type Safety

17

42:21

A deep dive and comparison of Python drivers for Cassandra and Scylla

18

45:29

Tools for maintaining an open source Python project

19

32:05

Best practices for production-ready Docker packaging

20

26:50

IoTPy: Python + Streams + Agents for Streaming Applications

21

32:33

Extending Python with Rust

22

24:38

Supercharge your Data Science workflow with Notebooks, VS Code, and Azure

23

28:23

Static Typing in Python

24

40:41

Europython 2020 - Opening Session

25

27:40

Attractive GUIs with PySimpleGUI

26

25:40

Object Internals

27

36:23

Live-coding a music synthesizer

28

28:12

API-schema-based testing with schemathesis

29

27:24

Extending HTTP for fun and non-profit

30

23:11

The joy of deleting code

31

28:30

Creating an inclusive team culture in times of change

32

44:40

Lessons from the Trenches: rewriting and re-releasing virtualenv

33

45:26

How to write multi-paradigm code

34

28:58

Serverless 2.0 with Cloudstate.io-stateful functions with Python

35

47:00

Staying for the Community: Building Community in the face of Covid-19

36

29:08

Effective Code Reviews

37

32:01

Python in Prison: how open source can change a criminal justice system

38

24:44

Social distancing from your system’s dependencies: An API’s Story

39

19:10

Radio Astronomy with Python

40

26:52

Taking Part in the Greatest Experiment in History

41

53:39

Decision Science with Probabilistic Programming

42

43:44

Django Clone From Scratch With Flask

43

42:42

Painless Machine Learning in Production

44

29:41

Train. Serve. Deploy! Story of a NLP Model ft. PyTorch, Docker, Uwsgi and Nginx

45

30:49

A Brief History of Jupyter Notebooks

46

26:01

Pluggable Architecture

47

29:56

Roadmap to an Open Source Artificial Pancreas & Diabetes monitoring with Flask

48

29:12

Detecting and Analyzing Solar Panels in Switzerland using Aerial Imagery

49

28:16

Deploy your Machine Learning Bots like a boss with CI/CD

50

29:46

Speed Up Your Data Processing

51

46:01

Docker and Python: making them play nicely and securely for Data Science and ML

52

41:40

So, You Want to Build an Anti-Virus Engine?

53

20:31

Deceptive Security using Python

54

23:39

Can we deploy yet?

55

33:30

Writing and Scaling Collaborative Data Pipelines with Kedro

56

04:52

Morning Announcements 07/24

57

26:42

The Hitchhiker's Guide to CLIs in Python

58

13:10

Durable Functions: Answer to Serverless Needs

59

27:42

Building quantum applications with D-Wave's Leap

60

43:46

The Phantom of Radon

61

26:38

62

47:11

Guido van Rossum Q&A

63

23:59

Overcoming access control in web APIs

64

24:49

Creating the Next Generations of Billionaires - Part 2

65

29:56

How to sort anything

66

43:04

We have nearly one million lines of Python 2 code in production – and now?

67

48:03

EuroPython 2020 - Lightning Talks 07/23

68

43:26

HTTP/3 – Why should I care?

69

41:01

Resurrecting a django project in python 2.7 for 3.8

70

1:11:08

Simulation of logistic systems in Python with salabim

71

25:43

Writing Extensions and Bindings for GPU made Easy

72

29:04

Developing a match-making algorithm between customers and Go-Jek products!

73

37:02

Machine Learning for Everyone

74

27:23

Ray: Scalability from a Laptop to a cluster

75

27:25

NLPeasy - a Workflow to Analyse, Enrich, and Explore Textual Data

76

42:42

The Painless Route in Python to Fast and Scalable Machine Learning

77

30:26

Growing a Python Community at an Enterprise Scale

78

43:40

Painting with GANs: Challenges and Technicalities of Neural Style Transfer

79

25:17

Top 15 Python Tips for Data Cleaning/ Understanding

80

29:38

Automating machine learning workflow with DVC

81

26:10

Running Unit Test on Top of Serverless Service

82

19:10

Python Emergency Remote Teaching

83

22:47

EuroPython 2021: Help us build the next edition!

84

28:12

Find a new job - Sponsor Presentations

85

28:57

Python Memory Management 101

86

44:25

Why Transformers Work

87

30:55

Running EuroPython 2020 as an online conference

88

41:56

Advanced Infrastructure Management in Kubernetes using Python

89

45:28

Meditations on First Deployment: A Practical Guide to Responsible Development

90

31:06

Interactive Mapmaking with Python

91

18:29

Digital Transformation in the fight against Coronavirus

92

25:59

Cultivating a Culture of Creativity, Collaboration, and Captainship

93

34:46

Tests that (Almost) Write Themselves

94

26:40

Ensuring data integrity with asynchronous programming in a cloud IoT core

95

23:27

96

26:18

Community-oriented conference status during COVID-19

97

27:41

Decade of PyCon JP: How we spread the Python community in Japan

98

28:31

Tooling for Static Analysis of Python Programs

99

47:28

EuroPython 2020 - Lightning Talks 07/24

100

19:41

EuroPython 2020 - Closing Session

101

30:13

How to Avoid Becoming a 10x Engineer

102

31:12

The Joy of Creating Art with Code.

103

29:20

Honey, There Is a Python in My Android Phone!

104

26:49

Django Testing on Steroid: pytest + Hypothesis

105

34:28

Accessible Python education for schoolgirls using Avocados, Zombies, and Korean!

106

29:31

107

26:02

Boosting simulation performance with Python

108

27:38

Elegant Exception Handling

109

45:33

Practical Optimisations for Pandas

110

27:57

Migrating codebases with millions of modules from Python 2 to Python 3

111

30:40

Building reproducible distributed applications at scale

112

29:01

Python Table Manners: Cut the Cookie Gracefully

113

37:42

Writing Zenlike Python

114

26:10

Real Time Machine Learning with Python

115

29:33

How to be Pythonic? Design a Query Language in Python

116

45:11

Building smarter solutions with no expertise in machine learning

117

28:42

Yet another package for multi-tenancy in Django

118

29:41

There’s a Snake in the Birdhouse!

119

27:11

IPython: The Productivity Booster

120

27:44

Your Name Is Invalid!

121

26:42

Diffprivlib: Privacy-preserving machine learning with Scikit-learn

122

44:49

How to Run a Corridor Track in a Remote Conference with Python

123

30:42

Probabilistic Forecasting with DeepAR and AWS SageMaker

124

32:12

Developing GraphQL API in Django using Graphene

125

28:56

An ASGI Server from scratch

126

44:44

Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

127

28:21

Sharing Reproducible Python Environments with Binder

128

40:26

30 Golden Rules of Deep Learning Performance

129

13:39

Reduce hardware costs in Internet of Things using Python

130

30:26

Bringing your Python script to more users!

Automatic playback

Speech

Text

Image

00:00

Heat transferPhysicalismReal numberSlide ruleMultiplication signComputer animationMeeting/Interview

00:59

Heat transferIntegral domainStack (abstract data type)Software developerLocal GroupProduct (business)ApproximationHeat transferComputer networkBounded variationContent (media)Insertion lossArtificial neural networkConditional probabilityForm (programming)Complex (psychology)Transformation (genetics)Storage area networkHeat transferImplementationPhase transitionField (computer science)Sampling (statistics)DataflowWave packetRepository (publishing)Artificial neural networkPoint (geometry)SoftwareDimensional analysisFront and back endsInsertion lossNoise (electronics)ConvolutionNormal distributionGroup actionContent (media)Link (knot theory)Bounded variationElectric generatorAlgorithmQuicksortNeuroinformatikTerm (mathematics)outputForm (programming)Distribution (mathematics)SpacetimeDifferent (Kate Ryan album)Basis <Mathematik>Set (mathematics)Propagation of uncertaintyMereologyOffice suiteCuboidIterationCycle (graph theory)Endliche ModelltheorieBackpropagation-AlgorithmusType theorySupervised learningCategory of beingUnsupervised learningCondition numberIdentical particlesError messageFundamental theorem of algebraParameter (computer programming)Functional (mathematics)CodePopulation density2 (number)Total S.A.EstimatorExtension (kinesiology)CAN busDiagramUniverse (mathematics)HoaxBitSlide ruleFunction (mathematics)GleichverteilungCASE <Informatik>Multiplication signReal numberSharewareXML

09:29

Random numberComputer-generated imageryHypercubePixelContent (media)Pattern languageCartesian coordinate systemSample (statistics)Pole (complex analysis)Computer networkSurjective functionResultantDialectMaxima and minimaData miningNumeral (linguistics)Content (media)Insertion lossCAN busHeat transferFunctional (mathematics)Core dumpSoftwareSampling (statistics)Cross-correlationSharewarePropagatorBitGradientMultiplication signCoefficient of determinationFluid staticsObject (grammar)Bounded variationCartesian coordinate systemWave packetFinite differenceUniqueness quantificationGreatest elementTotal S.A.Set (mathematics)Distribution (mathematics)Different (Kate Ryan album)Slide ruleParameter (computer programming)Degree (graph theory)IterationDimensional analysisCombinational logicCycle (graph theory)Order (biology)Endliche ModelltheorieGame theoryResultantState of matterFunction (mathematics)Basis <Mathematik>CodePixelGrass (card game)VarianceInterpolationQuicksortForm (programming)AreaLine (geometry)Mobile appReal numberGradient descentBackpropagation-AlgorithmusDigital photographySource codeXML

17:58

Computer-generated imageryTensorConvex hullEmpennageInclusion mapContent (media)CalculationTerm (mathematics)Similarity (geometry)Pairwise comparisonResultantInsertion lossData modelAsynchronous Transfer ModePixelContent (media)Pairwise comparisonLaptopInsertion lossZoom lensArtificial neural networkProcess (computing)ResultantMultiplication signArithmetic meanError messagePixelPlotterPoint (geometry)Coefficient of determinationDifferent (Kate Ryan album)Total S.A.Library (computing)Configuration spaceFunctional (mathematics)TensorInformationoutputGrass (card game)Endliche ModelltheorieFree variables and bound variablesCategory of beingSoftwareBitCNNComputer architectureWeightBlock (periodic table)SharewareMereologyBounded variationHeat transferStructural loadScaling (geometry)Electronic visual displayCache (computing)Musical ensembleBit rateComputer fileCodeBuildingRemote procedure callLevel (video gaming)Utility softwareSlide ruleCASE <Informatik>Repository (publishing)Degree (graph theory)Parameter (computer programming)Active contour modelDimensional analysisNumberSet (mathematics)Normal (geometry)Source code

26:28

Coma BerenicesComputer-generated imageryCross-correlationSimilarity (geometry)Mach's principleProduct (business)ResultantMatrix (mathematics)EmpennageSquare numberShift operatorPixelTotal S.A.Insertion lossBounded variationProduct (business)Block (periodic table)Form (programming)Cross-correlationDegree (graph theory)Error messageMappingConvolutionCellular automatonParameter (computer programming)Insertion lossContent (media)Level (video gaming)WeightCombinational logicPoint (geometry)Grass (card game)Artificial neural networkSimilarity (geometry)Normal (geometry)Endliche ModelltheorieMatrix (mathematics)Dot productFunctional (mathematics)Function (mathematics)SoftwareWave packetTotal S.A.Right angleBounded variationPixelType theoryTensorArithmetic mean2 (number)Order (biology)ResultantCalculationAreaExtension (kinesiology)CASE <Informatik>Graph coloringGraphical user interfaceMetric systemMusical ensembleActive contour modelComa BerenicesCodeSquare number

34:58

Shift operatorComputer-generated imageryPixelInsertion lossBounded variationBeta functionGradientIterationLocal ringFunction (mathematics)Negative numberVacuumComputer iconInternet forumoutputMenu (computing)Execution unitSimultaneous localization and mappingLocal GroupGradient descentPixelMultiplication signSoftwareEndliche ModelltheorieRepresentation (politics)Insertion lossProcess (computing)IterationGradientBuffer overflowPreprocessorMatrix (mathematics)Bounded variationTranslation (relic)Wave packetMessage sequence chartDiagramPhase transitionError messageFunctional (mathematics)Social classMathematical optimizationDirection (geometry)Free variables and bound variablesVideo game consolePerformance appraisalSlide rulePort scannerResultantWeightGroup actionSummierbarkeitAlgorithmTwitterLink (knot theory)Heat transferLevel (video gaming)Rule of inferenceBitCodeoutputSheaf (mathematics)Total S.A.Content (media)TensorGoogle Street ViewDimensional analysisBasis <Mathematik>Metric systemSharewareCartesian coordinate systemConstraint (mathematics)Coma BerenicesSemiconductor memoryCycle (graph theory)Limit (category theory)Function (mathematics)BuildingMaxima and minimaForm (programming)View (database)Image processingElectric generatorNeuroinformatikMereologySource code

43:28

Computing platformMeeting/Interview

Transcript: English(auto-generated)

00:06

So, welcome to the next session. We will have Anmol Krishang Saftewa as a speaker. He will talk about guns. I met Anmol, I think, first time at the GeoPison conference some years ago.

00:23

Anmol is very active at visiting conferences, real physical conferences back in the day. So, he was also a pike on Thailand, Malaysia, and many, many more. And I think last year at, I don't even remember, GeoPison or EuroPison.

00:41

And so, welcome, Anmol. He's also, I have to mention that this year, he is also one volunteer of EuroPison. So, thank you very much for volunteering. And now I give you a talk. Start your slides, please. Thanks, Martin, for the introduction. Okay, so hi, everyone.

01:01

I'm Anmol Saftewa. The title for today's talk is painting with cans. We'll be talking about the neural style transfer and the technicalities and challenges of using that. So, a brief introduction about myself. So, I'm an international tech speaker and a distinguished guest lecturer. And I work at OLX group.

01:22

I have done my master's in advanced computing from University of Bristol. And the specialization was in field of computational neuroscience and artificial intelligence. I represented India in various international hackathons and I'm a researcher also. About OLX group. So, it's a group which contains of 20 plus brands

01:44

and it actually has around 45 offices spanning across five continents. And we serve across five continents with 350 million people per month. So, the flow of the talk will be as follows.

02:00

First, we will be looking into an introduction to GANs. Then we'll be taking a look at what style transfer is. Thereafter, we'll be learning about different neural style transfer networks that are available and are popular at this time. And then we will dive into the actual NST implementation

02:22

by looking into loss functions, the content loss function, style loss function, total variation loss function. And then we'll be doing kind of a code walkthrough also. The talk will be supported by a few demos also. Those are actually adopted from the official

02:41

TensorFlow and Keras repositories. So, I'll be pushing the code to GitHub and we'll be sharing the link to it in the breakout channel. And then post-talk, we can have the Q&A, question and answer session in the talks, talk painting with GANs channel in Discord.

03:01

So yeah, prerequisites for this talk that you should be familiar with Python and Keras, especially using TensorFlow backends and an experience in artificial neural networks is good to have. It's also good if you have experience with convolutional neural networks and generative adversarial networks. And you should be inquisitive to learn about deep learning.

03:25

So first, let's start by revisiting the fundamentals of generative adversarial networks. In short, I'll be referring to them as GANs. So discriminative and generative models are the two types of models that we use in a GAN.

03:41

So first coming to the discriminative model, a discriminative model forms a discriminative network or the discriminator network. And it's essentially a supervised learning model which tries to classify the data which is fed into it. So it is just kind of a classification model that we are using here.

04:01

And it doesn't really bother about the underlying distribution of data, only the quality of data matters to it so that it can classify into categories properly. And then on the other hand, we have a generative model that forms the generative network. And instead of classifying the data,

04:20

it's used to generate the data. It actually learns the underlying distribution of the data that's provided. And then on basis of it, it tries to generate samples that are near real looking. So mostly it's unsupervised learning, but what if we want to have some conditional training done?

04:40

In that case, we may support the training set with label data also so that it becomes kind of supervised plus unsupervised learning. So conditional GANs actually use label data. And there we have sort of supervised learning also implemented a bit. And then the art of actually learning

05:02

the data distribution, underlying data distribution, which the GAN actually does is through like implicit density estimation. So we don't require to calculate any probabilities externally. Everything is done internally by the network itself, and it's called as implicit density estimation.

05:22

So I'll be referring to these terms going forward. But just to give you a gist of what GAN network, vanilla GAN actually looks like in form of a schematic diagram, it's this. So the goal is to generate near real looking samples of the underlying distribution that we are provided with. That's the training set.

05:41

And then we have this input layer where we feed the random noise. This random noise forms the part of latent space. This can be uniform distribution, can be normal distribution. We have to pass this distribution through GAN, which is like again formed by two neural networks.

06:01

One is discriminator network, one is generator network. So we'll be covering on the details of what's hidden inside this box in the coming slides. But then once we pass this input to GANs, the GAN produces an output, which is of some other dimension, say N dimension. And then that's maybe an image that we formed

06:21

out of random noise or something like that. So training of GAN algorithm has two essential parts. One is training the discriminator. The second is training the generator. So training the discriminator network actually involves the following flow. So we take a sample, a real sample,

06:41

that's the sample from the training set. And then we pass it through the discriminator and we have the discriminator classify it. On the other hand, we have a generator network to which random noise is fed. And then that generator network actually produces a sample, which we call as X star or X, that's a fake sample.

07:04

And that fake sample is also fed to the discriminator. Now the discriminator should be able to classify this as a fake sample. But our aim is to actually have a generator improvise to the extent that discriminator starts failing

07:20

in distinguishing between the real sample and the fake sample. So there will be a point where discriminator will start labeling the, will start classifying the fake sample as real sample. And there's the path of generator training. So generator actually uses this random noise in the second phase of this training, this generator actually uses this random noise,

07:42

generates a fake sample. The sample is fed to the discriminator and the discriminator classifies that as real or fake. But the essential thing is that during this training of generator phase, we back propagate the errors to generator instead of back propagating it to the discriminator.

08:02

So in discriminator phase, we actually have the back propagation done to discriminator network, whereas in generator phase, we have the back propagation done to the generator. And here we make sure that the discriminator networks parameters are not trainable. So in the second phase, we set the trainable parameters for discriminator to false,

08:21

because we just want the generator to improvise. So in schematic diagram, it looks like this. So this is the first phase train the discriminator. The second phase is training of the generator. So we have this as X, the real sample that we are providing to discriminator and then we have this generator

08:40

to which we feed some random noise from the latent space that we are calling as Z. It can be normal, it can be uniform, it can be any other distribution. It produces a sample X bar and then that's also fed to discriminator. So the discriminator classifies X and X bar as in some category, maybe real or fake

09:01

and then classification error are propagated to discriminator so that discriminator learns. And the second phase, the only thing that we do is we remove this training sample phase and we just pass the X bar and then we pass the back propagated error to generator. So that's the only difference. And then we do this, we repeat the cycle in iterations

09:21

so that the network learns. So the discriminator actually gets improvised on distinguishing between real and fake data and the generator on the other hand, improvises on generating data which is near real looking and which adheres to the underlying real training

09:42

data set distribution. So these are all fake samples generated by a NVIDIA style GAN. So no one can tell that these are fake, they look pretty much real. So this is how much we have advanced in the last five years since the inception of this concept of GANs.

10:01

So next comes the main concept for which we are here today. We have gained quite a lot of hold on generating near real looking images or kind of photo realistic imagery. But what if we want to now generate art

10:21

instead of just generating images of static objects, we want to now design new objects so build artistic artifacts. So how can we do using GANs? So here comes the concept of style transfer. So, as I told you earlier,

10:40

we now have to dive into just generating new kind of artistic artifacts. So what if we have a kind of image which we call as content image or base image that's there of a dog. And we have a style image, here I have taken image of cross. So what if I just apply the style of this cross

11:01

over to the content image and say it generates this output. So that means the style of this image has been imposed on the content image. And we get some unique form of art that actually contains both the content as well as the style. You see that the content also is dominant here

11:24

and the style also is getting reflected in this image. So this image holds kind of a combination of both the content image and the style image. So that's our aim. We will be generating art based on similar lines. We'll be having a content image,

11:41

we'll be having a style image and our goal will be to transform or embed the style from one style image to a content image in order to produce an image which is a combination of both and looks good and realistic and original. So yeah, that's the aim of neural style networks

12:01

that we'll be covering now. So you must take note of one thing that this model that we saw, this training that happened that didn't learn the underlying distribution. So here comes the first difference with respect to vanilla GANs. Vanilla GANs actually had the underlying distribution

12:21

being learned but here we are transferring styles. So that's the first difference. So we extract the style from the style image and then embed it to the content image and the result should look like a blend of both the images. So why not simply interpolate the pixels? That's because if we interpolate this with this,

12:42

what we will have is blurry image that's highly distorted and the style actually dominates the content image. So it will look muddy, it will not be clear and both the pixels will lose their entity. So this is the thing why we should not use

13:04

simple interpolation for doing such sort of things. And this style transfer networks have white applications in the area of gaming, in the area of developing applications. Few years back, we had an app called Prisma which had this sort of style transfer made public.

13:25

So people were able to apply styles from different images on their selfies and all. So that actually saw a real boom in like last two or three years and people have advanced in generating new networks

13:41

that are like state of the art networks and can do a style transfer. And many more applications have come which we will be discussing in a few minutes. And the last one is that the actual image, if you see, it's just that you have just applied the style on the content image.

14:00

So the dimensions and all you can play with, but ultimately it looks like someone has applied some art on the content image and we got the generated image. So popular style transfer networks, if we see these are three networks, one is Pix2Pix and one is CycleGAN.

14:20

And then there's neural style transfer. We'll be covering a bit on CycleGAN and Pix2Pix after we cover neural style transfer, but first we will start with neural style transfer. So neural style transfer, as I told you earlier, doesn't require any training set. We have seen that it doesn't require any training set. It doesn't have any kind of,

14:43

you can say back propagation being done because we just have two images to deal with. We have to transfer the style of one image to another image. So there's no involvement of any kind of training set. So that's another unique thing about neural style transfer that it actually picks up the features

15:03

from the style image and applies the features on the content image. And it creates some like hyper realistic imagery. So let's say we have this base image and then we apply a style image

15:21

that's shown here at the bottom. We get this kind of combined image. Just playing around with the loss functions and the hyper parameters, we can see highly varying resultant images. So this is one image that we see. Likewise, we can have different degrees

15:42

to which this image can be exploited by this neural style transfer network. And we can have different images which have the style features transferred or embedded onto the base image in different degrees. So we can generate multiple images from the same style and base image, like using combination of both.

16:03

So the core of neural style transfer is that essentially we have a loss function which constitutes of content loss. That's the loss which I'll be telling about more in a bit. So content loss, style loss and total variance loss. So first thing is that we have content loss

16:22

where we would like to have the combined or the resultant image to be as similar to be as similar to the content image, the base image. So that's a loss between the base image and the generated or the resultant image.

16:41

Then comes a style loss in where we have the generated image compared to the style image. And then on the basis of how much degree of correlation is there between the style of both the images, we calculate the loss. And then last is the total variation loss

17:00

that also we call as total variance loss, wherein we have the generated image checked whether it's smooth or whether the pixels are distorted and blurry. So how do we minimize this loss? Ultimately, the goal of neural side transfer is to minimize the combination of these losses.

17:21

So how to minimize this loss? We actually use a gradient descent technique wherein we update each pixel over the iterations and then we have something as you saw in the previous slide, the combined image. And there's a difference with the vanilla GAN which I highlighted earlier also that there's no training set required here

17:42

and there's no backpropagation concept being applied here. So coming to content loss. Now onwards, I'll be making references to bits of code also but before moving forward, I would like to show a quick demo of how we can actually utilize a pre-trained network

18:02

to generate these art pieces. So yeah, so this is the IPython notebook that I'm using. I'll just increase the size so that you are able to see. So here we are importing libraries. This is a TensorFlow library, Matplotlib.

18:22

We are setting the run config parameters for Matplotlib, then NumPy, Python image library and func tools. So this is the import part. Next comes a function. So this code has been adopted by TensorFlow, the official repository. So the function is to convert the tensor to image.

18:42

So this does nothing but uses some NumPy functions like watching on the channels that are there, the dimensions of the images that are there and then we choose the primary channel if the number of dimension, like it's greater than three dimension. Herein, we will be using four dimensional tensors.

19:03

So it converts the tensor to image. Then that's just a function which is being utilized below. We have the content path. Content path uses the utils function of Keras. So get file actually gets the file from remote place

19:21

and then fetches the file. So we are using image of the dog which I showed earlier. And then for demo purposes, I have taken three images which I will be running through in a bit. So these are the images of style that we want to transfer. So this is a content image, this is a style image.

19:41

Then we have load image function just to display the images. This again uses the normal NumPy and TensorFlow functions and libraries. We read the image, we decode the image, we convert the image and scale the image. So it's kind of processing of image and then showing the newly formed image. So we resize the image afterwards

20:00

and then we return the image. And IAM show function, this is a typical function. We just are attaching title to the image and then we have the plot function. So till this point, I'll run each of the cells so that you observe what is happening. So first we will see the style image

20:24

of bushes being applied to a dog. So last time in the slide, I showed grass being applied to dog. Now we'll see bushes being applied to dog. So yeah, let me print the images

20:44

that we are taking into account. So this is the image of bushes and we are applying it to dog. And then we'll have the pre-trained model VGG19. So I'm taking VGG19 pre-trained model,

21:01

which has all the weight set. And it is able to classify into thousand categories. The network has been trained on 1 million images taken from ImageNet dataset. So you'll see that the combined image is having the style of the grass and also adhering to the content image

21:20

has the content of the actual base image that we passed. So nothing much I have done here. I have just used VGG19 pre-trained model. And this is the destination to that model. We have passed to it the content image along with style image, and then we have passed to tensor to image thing,

21:40

which takes out and processes the image and then displays it. Let's quickly jump on and then I'll uncomment this and we'll have some other style transferred to this image. We can directly go.

22:00

So I'll show what's the style image we are referring to now. So, okay. Let me change the name here.

22:20

So it actually took from cache. That's a style image that we are trying to apply on the content image. And then I'll quickly apply this to the base image

22:41

and we should have a result like this. This looks like a novel art piece. So that's for the demo of the first part. Now moving back to content loss. So content loss is actually kind of L2 difference or the like mean squared error between the content image and the generated image.

23:02

So what if we compare apples to apples? The content is similar. So the information of the pixels should also be similar to some degree and we will have less content loss. But what if we compare ocean or sharks or say oranges with apple or banana with apple,

23:21

then there will be higher degree of content loss. But we will not do pixel by pixel comparison. What we will do is we will have the higher level features compared. So how to compare the higher level features? While training a neural network, say we pick up convolutional neural networks and there are blocks or there are stages

23:42

at which we train the neural network. And as and when there's any stage which progresses to the next stage, some of the features gets dropped, the lower level features gets dropped and we just are left with higher level features. So at each layer of neural networks,

24:01

say convolutional neural network, the lower layers represent some minute details, say very, very minute details and the higher layers or the layers at the top, they actually contain the higher level features or just say broad features like this is a car, this is a building. So just to capture the higher level features,

24:22

we'll be dealing with the top layers of the neural network from a pre-trained model and in our case, it's a VGG-19. So it has a 19-layered CNN architecture and as I told you, it's capable of classifying images into 1000 categories and has been trained over 1 million images from the ImageNet dataset.

24:41

So the VGG-Net architecture looks something like this. We have an input layer and then we have five chunks. So one, two, three, four, five, these are five blocks. Each block has a bunch of layers. So con one, con two, you can see likewise, we have con five, con one, two, con five, four

25:01

and then we have a dense layer which flattens the input which is being fed and then classification happens. So we'll be using this VGG-19-Net and then using the higher layers from block five, we'll be capturing the higher level features or the content or the features through which we'll be actually collecting

25:22

and computing the content loss. So this is the code. I'll just zoom in if that's possible. Yeah. Okay, so we have this Keras library being used.

25:43

VGG-19 is the model that we are using, the pre-trained model that we are using and we just specify the path to base image. We specify the path to style image and then we just specify some weight. So content weight is the total loss,

26:01

total variation loss weight is the style weight is there but that we'll be looking out in a bit later stage. So base image, we have the base image style image being processed by the path from the path which we have provided and then we have a placeholder, you can say placeholder image created

26:21

that we're in the combined image which will go. So we are having these three images. We pass these three images by concatenating them in form of a tensor and then we pass them in VGG-19 which has the pre-trained weights

26:40

and then we get some resultant image. From that image, we actually take, we actually take from the pre-trained model the third you can say layer of block five. So if you see third layer of block five, we take this as a reference for calculating the loss

27:04

because higher level features are captured here. So we may take two second layer, we may take third layer, we may take fourth layer but it totally depends on you. Like taking different layers may result in varying results. So I have chosen three as a layer.

27:21

You may choose two also. So we choose two and then we pass the base image and we collect the output from this third layer and we also have the combination features from the combined image taken from third layer in the network.

27:41

And then we just have the L2 norm applied, that's mean squared error applied to both the images. So generator image and content image and then we just have the content loss calculated by multiplying it with the weight, the content weight that we specified here at the top. Coming to style loss,

28:01

that's the second type of loss that we wanted to cover. So coming back to style loss, we have images which are like similar to each other maybe. So those images we call them as correlated images but there are images which are different and have different styles or different lower level details

28:21

that doesn't have like common lower level details. So those images are considered as having less degree of correlation. So how to calculate correlation between the layers? So degree of correlation between two images can be computed by calculating the degree of correlation

28:45

between the feature maps. So as we want to capture lower level features because lower level features represent the style, just note that higher level features represent content and lower level features or the output from the lower level layers of the neural network

29:00

represent the style. So we take into consideration the lower level layers of convolutional neural networks, we fetch the feature maps from there, we flatten the feature maps, we take the dot product of the feature maps between the two images and then depending on that dot product, if the value of dot product is greater

29:20

than some say value that we have specified, we consider it to have higher degree of correlation that means the style of both the images matches. So suppose this is image of grass, image A is image of grass and image B is also image of grass. So you see that the orange points, the dark orange points that you see on both the images,

29:42

if they overlap, so that's the area where the image has like similar kind of correlated style. So we can say that these images are correlated to some extent and in case if this considered this block B to be fully orange, then we will say that

30:01

that has a higher degree of correlation, that combination has a higher degree of correlation. So in our case, when we'll be calculating style loss, image A will be style image and image B will be the combined image, the resultant image that we get after training. So in order to actually have the style loss calculated

30:20

for different layers of our network, what we consider is a thing called gram matrix. So gram matrix is dot product of all the feature maps against the feature maps. So suppose you have layer, so you have this layer. So if you take a dot product of A with A,

30:43

then you take dot product of A with B. So in this image, it should be clear. We have image A and we have image B, image C. All these images are of grasses, but one contains only grass, one contains bushes, also one contains some brown grass. What we have is that we have the gram matrix,

31:02

which you see on the right calculator. So we have feature one map to feature one and then we calculate the correlation between them by taking dot product of the feature maps. Then we take dot product of feature one with feature two of image A. That means this one and this one, the dot product of both will be taken

31:22

and the common area or the result of the dot product will be shown by some color. Likewise, we do for all the features of a layer for an image. So suppose this is image one, we have first the dot product taken across all the feature maps

31:41

against the feature maps of that image and then we have this gram matrix. Likewise, we generate gram matrix for image B and image C and then if the gram matrix of two images, in our case, it will be the style image and it will be the resultant generated image. So when the gram matrix of both the images

32:02

are highly comparable, we say that the style actually hold it throughout the training. So the style was actually transferred to the generated image. So in that case, we will consider it as success. So we have to calculate the mean squared error again, that's the L2 norm error

32:21

and we have to minimize that error. So coming to the code, we again say that style loss is zero initially. We have the definition for gram matrix function here. This function actually does it flattens, as I showed you, it flattens the feature map

32:41

and then takes a dot product of it against itself and then returns a gram matrix. So feature map dot product is taken and we get the gram matrix cell. Likewise, we do for all the feature maps and we get the whole gram matrix. Then we have this function of style loss

33:00

where we calculate the mean squared error between gram matrix of combined image and style image. So this is nothing but the mean squared error of the combined image and the style image and these are just like some parameters to have that we are passing to the loss function.

33:27

This feature layer actually shows the active layers that we chose, so we chose block one. So it will be calculating loss against all the layers present in block one.

33:42

So block one of image is this block one. So suppose we have five layers in image, we will be taking that. Okay, so we took block one, block two, block three,

34:00

block four, block five. So we have taken each layer of all the five blocks. So con one, one, con two, one, con three, one, con four, one, con five, one. So each of the lower level layers from each block have been considered in order to have this style loss calculated. And then we just pass this and extract the feature maps from these layers

34:23

and then we pass it to the style loss function that we have. Then comes the total variation loss. A total variation loss is nothing but the loss or loss with respect to the quality of the resultant image that we are observing. So in case of the combined image is distorted

34:41

and is pixelated, we will consider it as noisy and the loss will be very high. So what we can do is we can take combined image and then we will shift pixels of that combined image, each pixel to the right once and also we will do another step. We will take each pixel of the generated image

35:03

and then shift it towards down by one pixel. We'll have both of these results stored in A and B respectively. And then we will just take a sum of these two and we will calculate the error. So we will calculate the error by shifting the pixels

35:21

to right and downwards. So that will show us whether that image is highly distorted or not. So that's for total variation loss. Once we have these three losses, we will just combine all these three and then we will get the resultant loss. Then is the time to start the training of model. So then comes the training phase.

35:42

So we have computed the loss till now, we have computed gram metrics based on which we have computed style loss. We have computed the content loss by taking into consideration the content image as well as the generated image. And we have calculated the total variation loss. So this loss can be trained.

36:03

So this network can be trained by taking into account loss and we need to minimize this loss. So we'll be using an optimization technique here. So essentially neural style transfer is an optimization technique which depends on another quasi-Newton

36:21

numerical optimization technique called BFGS and L is for limited memory use or we can actually constraint it on basis of resources. So L is limited memory BFGS algorithm which is a numerical optimization algorithm. And what it does is it finds the local minimum of any objective function based on the gradient of that objective function.

36:43

So essentially what we need to do is we need to minimize this computed loss over iterations by using the gradient descent method. And what we'll be doing is we'll be updating value of each pixel by an amount which is promotional to the negative of the gradient

37:01

that comes from this loss function. So let's dive into the code here. Okay. So it's pretty much the same as I showed you in the snippets. So we have the base image, we have the style reference image parts and we have the weights defined here.

37:23

So total variation weight we have defined, style weight we have defined and content weight we have defined. We process the images and we specify the dimensions of the generated image that we want to have and we specify the iterations. So for this demo, I have considered 50 iterations only but in real scenario,

37:41

you will be using somewhat like 5,000 iterations or 4,000 iterations to actually see the result that I showed you in the slides. So let me go back to the slide once and show you the end result that we'll see. So this is the end result that we should see and this is after 4,000 iterations of training.

38:00

We have the pre-processing of the images done. So it actually just opens, resizes, applies the image processing functions and then we get the tensor basically out of it. We have this deep process image also. So it converts a tensor to the image.

38:22

So nothing much is being done here. Again, a reshape function is being used. Then we are clipping the NumPy array for any additional, you can say, overflow that's happening. And then we just pass the base image and style image to the pre-process function that we have created. And we get the tensor representation of the two images.

38:42

Once we get the tensor representation of the two images, we have all the three images, combined image that we are treating as a placeholder image as of now. The style image, which is a pre-processed image and the base image, which is a pre-processed image. So these three are good to be fed

39:00

into a tensor concatenated network, which will be feeding the same to the VGG19 network for training. So we'll be combining these three and then concatenating and then feeding into the VGG19 pre-trained network, which we imported at the top. And then we'll have the model loaded with us. And also, we'll have the key layers

39:23

that we want to actually match against taken into form of a dictionary here. So output that represents the same. So now is the time to compute the neural style loss that we talked about earlier. Again, this is the same similar function,

39:41

which I showed you earlier. This is a bit modified from the actual one that I showed you in the slide. That's a gram matrix function, which I have already explained. So it calculates a gram matrix for the fed tensor. Then style loss is there. In style loss, we have the combined image and style loss gram matrix generated,

40:01

style image gram matrix generated. And then we calculate the L2 error. And then we have the content loss, which is just simply the MSE, that's mean squared error between the generated and base image. And the total variation, as I told you, that will be shifting by one pixel and then shifting towards the downward direction by one pixel.

40:26

So that's again, the sum of both. And then we just calculate the error. And then we select all of these three losses and we add them to form the main loss.

40:43

And the thing which you should be seeing is this. The next thing that you should be seeing is this, that we have the gradient computed for this particular loss. And we feed it to the evaluator, you can say, class that we have created.

41:03

So evaluator class, evaluator class actually returns the loss and the gradient value at each stage. And we have the iterations formed. So my network is training, as you can see. Over say 4,000 iterations, what we do is we take this loss, we pass it to the evaluator class,

41:22

we get the loss and gradients, and then we update the value of each pixel by the negative of this gradient thing. And ultimately, what we see is this combined image. So that's all for this talk. I'll be posting the links to all these code things.

41:41

So next we have Pix2Pix, just I'll be talking a bit about it. So it's used for image to image translation. So you can have say, handout or say a schematic diagram being transferred or translated into an image which looks real, kind of a chipboard representation being translated to an actual building

42:01

or silhouettes being transferred or translated into images, Google map, street view being transferred to map view, likewise. And then we have cycle GANs, which is again an advanced GAN for neural style transfer. So here we have essentially two GANs.

42:21

One is training for the first input and one is training for the other input, but both are actually dependent cyclically on each other. So that's another application, but discussing this is out of the scope of this talk, since we restricted it to neural style transfer. So this is something which I'm pointing you towards that you can actually explore cycle GAN

42:42

also if you are more inclined towards GANs. So these three are the popular, you can say networks out there to have the stylistic artifacts created. And linked to each one of them and wherever I use resources, I use the code references from David Foster's generative deep learning

43:00

and Jacob's GANs in action. So these are the two reference books that I consulted. So that concludes my talk. We are also hiring a 2LX group. So feel free to reach out to me or just drop in at the career section and then feel free to apply for the roles.

43:21

And yep, don't forget to follow me on Twitter, LinkedIn. We can get connected and we can have your questions answered in the discord too. And then later on, we can get connected on these platforms. So thanks a lot for listening.