Lecture: Shamir | September 26 - TIB AV-Portal

Lecture: Shamir | September 26

00:00

1

Related Material

Heidelberg Laureate Forum Foundation

Formal Metadata

Title

Lecture: Shamir | September 26

Subtitle

The Insecurity of Deep Neural Networks

Title of Series

10th Heidelberg Laureate Forum (HLF), 2023

Number of Parts

13

Author

Contributors

Tabachnikov, Sergei (Moderation)

License

No Open Access License:
German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties.

Identifiers

10.5446/63341 (DOI)

Publisher

Heidelberg Laureate Forum Foundation

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Abstract

Machine learning has made tremendous progress in the last decade, solving a broad range of tasks ranging from recognizing objects to chatting in natural language. However, today's amazing systems have a dark side: They are incredibly easy to fool by malicious actors. In this talk, I will describe some of these attacks, such as adversarial examples and trapdoored systems, and try to solve the mystery of what makes systems which are so powerful so vulnerable. The 10th Heidelberg Laureate Forum took place from September 24–29, 2023. #HLF23 The Heidelberg Laureate Forum (HLF) is an annual networking conference where 200 carefully selected young researchers in mathematics and computer science spend a week interacting with the laureates of the most prestigious awards in their disciplines: the Abel Prize, ACM A.M. Turing Award, ACM Prize in Computing, Fields Medal, IMU Abacus Medal and Nevanlinna Prize. The opinions expressed in the videos do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of these videos.

10th Heidelberg Laureate Forum (HLF), 20239 / 13

Automatic playback

Speech

Text

Image

00:00

Computer networkMusical ensemblePublic-key cryptographyTuring testCryptographyMultiplication signComputer animationXMLLecture/Conference

01:01

Computer networkComputeroutputData structureArtificial neural networkComputer-generated imageryStandard deviationCluster samplingPoint (geometry)Sign (mathematics)PixelCellular automatonMedical imagingUniverse (mathematics)outputPerturbation theoryStatistical hypothesis testingSource codeSpacetimeLinear mapArtificial neural networkReal numberOrder (biology)Projective planeNoise (electronics)Decision theoryNonlinear systemBlackboard systemSlide ruleFlow separationSocial classMultiplication signDifferent (Kate Ryan album)MultiplicationMathematicsRight angleResultantGoodness of fitConfidence intervalMixed realityInformation securityAreaProbability distributionInterface (computing)Branch (computer science)Computer-assisted translationGodMachine learningSequencePosition operatorException handlingSquare numberSurface of revolutionForm (programming)Gene clusterFunction (mathematics)Computer scienceExact sequenceContext awarenessWhiteboardLecture/ConferenceMeeting/InterviewComputer animation

07:00

Computer-generated imageryStandard deviationCluster samplingSocial classBoundary value problemPerturbation theoryDecision theoryoutputModel theorySocial classPartition (number theory)Medical imagingData structureExpert systemPattern languageVirtual machineArtificial neural networkDecision theoryNoise (electronics)MathematicsBoundary value problemMereologyComputer-assisted translationSoftwareInformation securityDirection (geometry)SurfaceAdditionGene clusterOrder (biology)Point (geometry)Characteristic polynomialType theoryoutputConfidence intervalLevel (video gaming)Line (geometry)Machine learningArithmetic meanYouTubePerturbation theoryUniverse (mathematics)Digital electronicsConditional-access moduleRight angleDivisorMultiplication signCellular automatonComputer clusterSoftware development kitImplementationDialectComputer animation

12:51

Model theoryManifoldDivisorSpacetimeoutputNormed vector spaceComputer-generated imageryComputer-assisted translationRight angleMedical imagingMathematicsNatural numberBitoutputCASE <Informatik>Decision theorySpacetimeLine (geometry)Boundary value problemPoint (geometry)Two-dimensional spaceState observerGreatest elementStudent's t-testSoftwareGreen's functionManifoldFile archiverDimensional analysisComputer-assisted translationNumberAdditionParameter (computer programming)Order (biology)Artificial neural networkData compressionDialectDot productSystem callComputer animation

16:25

TheoryEmbedded systemSpacetimeoutputComputer-generated imageryRepresentation (politics)ManifoldRevision controlDecision theoryBoundary value problemCASE <Informatik>ManifoldDecision theoryGreatest elementSoftwareMedical imagingOrder (biology)Computer-assisted translationBoundary value problemGene clusterMereologyVertical directionSpacetimeDimensional analysisDifferent (Kate Ryan album)Two-dimensional spaceAdditionoutputComputer animationDiagram

17:40

Decision theoryBoundary value problemManifoldComputer-generated imageryComputer networkPhase transitionRandom numberCloud computingSpacetimeoutputGradientTime evolutionAverageElectronic mailing listDirection (geometry)Boundary value problemAxiom of choiceManifoldoutputWave packetPressureLevel (video gaming)Medical imagingComputer-assisted translationSpacetimeCoefficientForm (programming)Decision theoryRandomizationUniform resource locatorDirection (geometry)Partition (number theory)Dot productSoftwareInflection pointConfidence intervalVirtual machinePoint (geometry)Sound effectForcing (mathematics)SurfaceNatural numberSocial classAverageTunisDimensional analysisLine (geometry)Vertical directionVideoconferencingArtificial neural networkOrder (biology)Multiplication signRight angleDisplacement MappingDiagonalComputer chessProcess (computing)ResultantWhiteboardDistanceNumberTwo-dimensional spaceShape (magazine)WeightCASE <Informatik>Cellular automatonEvoluteBitThree-dimensional spaceProof theoryGradientMereologyRevision controlTheoryInsertion lossFunctional (mathematics)Video gameVulnerability (computing)Computer animationEngineering drawingDiagram

26:52

Computer-generated imageryDecision theoryGradientBoundary value problemManifoldDirection (geometry)SpacetimeComputer networkPerturbation theoryNormed vector spaceoutputModel theorySocial classTouchscreenCluster samplingMachine learningFraction (mathematics)Local GroupLevel (video gaming)DistanceHausdorff dimensionRevision controlEnterprise architecturePredictionInformation securityPattern recognitionSystem programmingData streamAngleCondition numberAutomationProcess (computing)Physical systemBackdoor (computing)Order (biology)Wave packetDimensional analysisBitComputer-assisted translationResultantInformation securityPattern recognitionType theoryPredictabilityMedical imagingNatural numberDifferent (Kate Ryan album)DistanceNumberDataflowState of matterCubePoint (geometry)TheoryRevision controloutputPhysical systemAverageBoundary value problemSpacetimeManifoldSoftwareSocial classDivisorArtificial neural networkDecision theoryGroup actionPerturbation theoryMereologyMixed realityCellular automatonPhysical constantVirtual machineMultiplicationGradientLine (geometry)TouchscreenProcess (computing)Hill differential equationImage resolutionReal numberWeightModel theoryUniverse (mathematics)Software development kitFile archiverMultiplication signSet (mathematics)Computer animation

36:04

Computer-generated imageryLatent heatBackdoor (computing)Medical imagingArtificial neural networkGame controllerTheoryAuditory maskingPattern recognitionPattern languageComputer animationMeeting/Interview

37:06

Mountain passSource codeModel theorySystem programmingPattern recognitionSurgeryWeightCybersexPhysical systemDifferent (Kate Ryan album)AngleComputer-generated imageryVector spaceVector graphicsCluster samplingExecution unitLinear mapTexture mappingEmbedded systemType theoryData structureComputer networkoutputArtificial neural networkTable (information)Sound effectLinear mapPhysical systemDistortion (mathematics)CybersexDistanceAnglePattern recognitionVector spaceAxiom of choiceOpen sourceDirection (geometry)PlanningResultantMedical imagingDifferent (Kate Ryan album)WeightArtificial neural networkConnected spaceNumberOrder (biology)Rule of inferenceWebsiteInformation securityBackdoor (computing)AdditionSpacetimeSoftwareWell-formed formulaState of matterMereologyBit rateMultiplication signCASE <Informatik>Uniform resource locatorComputerShape (magazine)Enterprise architectureDimensional analysisBuildingBookmark (World Wide Web)Thresholding (image processing)Identity managementCross-correlationLinear programmingComputer animation

44:51

Table (information)Installation artSequenceModel theoryIdentity managementBackdoor (computing)Computer-generated imageryDirection (geometry)Dimensional analysisVector graphicsType theoryWeightSource codeHacker (term)Computer-assisted translationMedical imagingManifoldNichtlineares GleichungssystemCondition numberDimensional analysisVector spaceCASE <Informatik>Different (Kate Ryan album)Bit rateOrder (biology)AngleProjective planeMereologySphereNumberSinc functionPole (complex analysis)MathematicsSoftwareRotationCompass (drafting)Goodness of fitDirection (geometry)Type theoryModel theoryBoundary value problemDistanceSoftware development kitNatural numberLinear mapBitRandomizationCelestial sphereWeightCellular automatonReal numberFlow separationParameter (computer programming)Fundamental solutionPoint (geometry)Wave packetView (database)CircleUnitäre MatrixOpen sourceResultantVertical directionFraction (mathematics)AdditionLinearizationCoordinate systemAverage2 (number)Decision theoryData compressionComputer animation

52:36

Musical ensembleComputer animationXML

Transcript: English(auto-generated)

00:00

And the next speaker is Adi Shamir.

00:30

He made many very important contributions to the science of cryptography. I'll just mention the famous RSA public key algorithm,

00:42

Shamir's letter S in this acronym. He was awarded Turing Award in 2002, Israel Prize in 2008, and many other accolades, but I don't want to waste time. And without further ado, Adi, please.

01:12

Good morning. I'm working mostly on the interface between security and all other branches of computer science.

01:22

Today I'm going to talk about the interface between security and machine learning. And I don't have to tell you about the fact that machine learning is undergoing a revolution. It has major achievements in the last few years,

01:40

but it also has a dark side, and this is that it is incredibly easy to fool. And what I'm going to tell you today is a few research projects that I was involved in, which try to understand why is it so easy to take deep neural networks,

02:03

which under normal circumstances work with tremendous security and make them make full decisions. So for those of you who haven't seen what deep neural network is, you have on the left an input layer. Each one of those points is accepting a real number.

02:21

If it's an image, it's typically between zero and one. Telling you how dark or bright that particular pixel was. And then you have a first hidden layer, which consists of applying some linear transformation to the input and then changing every result,

02:44

which is negative, replacing it by zero, and every positive result is passed through SEs. This is the only source of nonlinearity because without nonlinearity, the composition of multiple linear mappings is going to be just one other linear mapping and you'll not gain anything.

03:01

So after the first hidden layer, you again, you get a sequence of positive real values. Again, you perform a linear mixing at each one of the points in that layer, each one of the neurons in that layer, and you zero all the negative values. Go on until at the very end, at the output layer,

03:22

you are giving your basically confidence level for each one of the classes. If I want to classify between four different classes, I'll have four outputs giving me a probability distribution over how confident I am that the given image is of class one, two, three or four.

03:42

Now, about 10 years ago, it was discovered that deep neural networks, which are trained to perform very well, suffer from adversarial examples. And this is, of course, has major security implications because if you can make small changes

04:02

that are going to make the deep neural network decide in the wrong way, a Tesla car looking at a traffic sign might interpret it in the wrong way because somebody had put a few marker, used the marker in order to put some black and green points on the traffic sign.

04:22

For those of you who have never seen an adversarial example, you're probably going to fall from your chair because I'm going to show it in the next slide. I just want to mention that in spite of tremendous effort, this is still a mystery.

04:40

I haven't seen a really good, simple, intuitive explanation what's going on. And my goal is that by the time you walk out of this room, you'll fully understand what are these adversarial examples, why they are created and why they are so hard to get rid of. So, please hold on to your seats

05:00

because this is the classical example of an adversarial example. The image on the left is correctly recognized by a deep neural network as a kind of cat with 88% confidence level. You make an adversarial perturbation, tiny, tiny changes and the result is to the human being

05:23

indistinguishable from the origin. But this will be recognized by that particular deep neural network as Guacamole with 99% confidence level. And you'll agree with me that there is nothing that remotely resembles Guacamole

05:40

in the right hand image. Guacamoles are supposed to be green and mushy and these cats are gray and very textured. So, why in God's name is the deep neural network deciding that the image on the right is Guacamole? Here's another example showing the pigs can fly.

06:01

You take a pig, you add to it something that looks like random noise, except that this is exaggerated form of the noise. You have to multiply the noise by 0.005. So, it's really tiny, tiny perturbation at each pixel. You get humanly indistinguishable image, but this is recognized as an airliner.

06:23

So, what is the explanation? Everyone who is now aware of, everyone in machine learning is now aware that adversarial examples exist. And when they teach in classes at universities about adversarial examples, they always have the following mental image which they draw on the blackboard or whiteboard.

06:41

Suppose that there are several clusters, red clusters where the input space is the whole green area and every point in the green area is a particular image. So, it's not that I'm talking about the pixels in this green square.

07:01

Every point here is a whole image and I have a cluster of images of cats and then I have in blue clusters of images of guacamoles. And now, what is the way you train a deep neural network? You are trying to construct if the input was N dimensional,

07:24

you are trying to construct an N minus one dimensional separating here it's a line, but in general it's some kind of surface and so that on one side, all the cat images will be on one side of this decision boundary

07:42

and all the guacamole images will be on the left side of that particular decision boundary. Now, in addition to this decision boundary which is where you are 50-50 percent sure that it's either one of them, I can think about it as a kind of topographic map

08:01

where I have height lines. So, there is another line where you are 51 percent, it's a cat and 49 percent sure it's a guacamole or the other way around. So, there is a whole topographic map with height lines which you construct about your confidence level in the two classes. Okay, now that we understand

08:20

what the network is supposed to do, what are the adversarial examples? Take an example of a red cat and you have to, in order to create an adversarial example for me, this is how professors at universities are teaching it, they say, okay, you just walk as fast as you can towards the decision boundary.

08:42

You always walk perpendicularly to the height lines along the gradients so that you will get as quickly as possible to the decision boundary and then you'll continue deep into the other territory so that while the guacamoles in the blue are only 90 percent confidence level

09:02

that they are guacamole, now you want to go so far deep into the other territory that it will be 99 percent sure that it's guacamole. And that's how you are supposed to generate the adversarial examples. And on the right hand side, I took a picture of Ian Goodfellow who wrote his course on the original paper

09:23

introducing adversarial examples and you cannot read it but what he's saying, he was asked, what is the meaning of this random looking noise that is pointing you the change which will change you from a picture of a cat into something that the network thinks is guacamole?

09:41

He said, you are walking towards some centroid of a cluster of images of the other class. Now, is it convincing? To me, it's totally unconvincing because there are so many things which are wrong with it. What is so special about those images which the human still look like a cat or like a pig

10:03

but the machine suddenly decides that they are of totally different class. How can it be that next to any cat image there's also a car and an airplane, a boat, a horse, a frog, anything you want. So the mental image is that the classifier,

10:21

the neural network classifier should take all the cats and somehow put them on this part of the room and all the guacamoles, put them there, et cetera. But now it looks as if the partitioning into classes is fact-alike, everything is near everything else. How can it be? And yet the deep neural network is a fairly simple kind of device.

10:43

It cannot implement fact-alike structures. Why are the adversarial examples so close to the original images? Why don't the adversarial perturbations resemble the target class? Do I see any attempt to add some greenish pattern

11:02

to the cat in order to make it guacamole? Not at all. Just random noise. Similarly, if you look at the noise to change a pig into the airplane, I don't see any wings there. I don't see a tail. I don't see anything characteristic of the target class.

11:21

So something is very wrong with the explanation. I'm not walking in the direction of the images of the other class. Now here's another example of the confusion in the community. This is just a short time ago. I found on YouTube a talk by Dan Bonnet, one of the world's experts in the security of machine learning.

11:41

And he explained why is this happening? Why are there adversarial examples? He said that deep neural networks are not so accurate. And therefore, there is the straight line, which is the human knowledge of what is a cat. And there is the dotted region,

12:02

which is where the deep neural network thinks it's a cat. And because they don't exactly overlap, there is a way how to move from something that the machine thinks it's a cat into something nearby which the machine thinks it's a guacamole. But it doesn't apply to anything which is in the middle of those regions.

12:22

It applies only to points which are of one type and not the other. It gives no explanation whatsoever to what is going on. Here is another example of the confusion. Some of Nicholas Carlini on the top left, one of the best experts on adversarial attacks and defenses. He wrote many influential papers.

12:42

He, in a public lecture, he said that I'm in a world where I don't know what are these adversarial examples. So, I'm going to explain to you what are these adversarial examples and what makes them tick. And it's a paper that I wrote with two of my students.

13:02

You can download it from archive. And the basic idea is that natural images are in a low dimensional manifold in the big input space, which has very high dimension. How do we know it? Because natural images can be highly compressed and then decompressed and you get essentially the same original image.

13:22

So, we know that there is a much smaller number of parameters which can be used in order to represent natural images. And therefore, if you just look at the small number of parameters and see how the decompression works, you can get a kind of mathematical definition

13:41

of low dimensional manifold, which contains all the images. Okay, so this is not my observation. Everyone knows that natural images are on the low dimensional manifold. The manifold is fairly smooth, not crazy, because basically it says that natural images are piecewise smooth with sharp edges between regions.

14:04

That's roughly all you do when you try to compress images. So, now let's start with a very, very simple example. Suppose that I have a one dimensional world, that's my input space and going to the left means I'm adding grayness to the image, going to the right I'm adding greenness to the image

14:23

and cats are located somewhere on the gray side and guacamole, every dot there is an image of guacamole is on the greenish side. And I'm asking you how will a deep neural network learn to distinguish between them? You'll be absolutely correct in assuming

14:42

that you find the point roughly in the middle, everything to the left of that point you call the network, we call cat everything to the right, the network will call guacamole, no surprise there. But now let us add a second dimension and the second dimension is totally irrelevant.

15:00

It's out of the manifold of the natural images. For example, the irrelevant dimension is whether the top left corner in the image is more dark or more bright than average. It's not supposed to help you distinguish between cats and guacamoles in any conceivable way.

15:21

Therefore, I assume that when I add this second unnecessary dimension, you take the point, which was the decision boundary in one dimension, now you extend it into a one dimensional dividing decision boundary, which is vertical. It's not using at all the additional dimension in order to make up its mind.

15:41

You all agree that this is what it should be. I'm telling you that it's not what is going to happen. What is going to happen is that the decision boundary, not in this very, very simple case, but I'll show you in a bit more complicated cases, that this is the kind of decision boundary which is going to be created, where everything that the network thinks is a cat

16:02

is above the line in the two dimensional space and everything that the network thinks is guacamole is at the bottom of this input image. And this should click your mental image of what's going on. And I'll give you a few more examples and then we'll go one by one over the mysteries

16:21

and we understand why they're not mysteries at all. Everything is very natural. So in two dimensional image manifold, which has some red, orange in this case, clusters of cats and blue clusters of guacamoles. If the input space is three dimensional,

16:43

the vertical direction is totally irrelevant to the decision. You only have to know the XY, not the Z in order to make the decision. So you'd assume that the kind of decision boundary is going to look like tube, which is surrounding all the red clusters and extends up and down vertically

17:02

because it makes no difference those additional dimensions. I'm claiming that this is actually going to be the decision boundary, namely it is going to be roughly the same as the very close to the image manifold with a small dimple underneath the cats

17:23

so that the cats will be above and it will have a dimple above all the guacamoles so that the guacamoles will be on the bottom parts. Everything at the bottom is going to be called by the network guacamole. Everything at the top half is going to be called the cat. So this will make the decision correct.

17:41

Now I have to tell you that, so what I said is that training networks actually consist of two different steps. One is a first step in which the decision boundary moves in space quickly until it clings to the image manifold.

18:02

And then it spends a lot of time in doing the fine tuning of creating those dimples above and below and you get a slowly kind of weakly undulating surface, which is similar to the fairly gentle move, changing manifold of natural images.

18:23

Okay, so let's see some examples. This is a synthetic example and it's a real life result of training a network. You see the image manifold is the horizontal line with alternately red point, blue point, red point, blue point.

18:41

At the initial stage, I have random choice of coefficients for the network and therefore the location of the decision boundary is crazy. It has nothing to do with the partitioning into red and blue dots. After 10 epochs, the decision boundary started shifting towards the image manifold.

19:03

After 30 epochs, it's already almost identical, very close to the image manifold. And then it spends until epoch 100 in getting those undulations up and down in order to move all the training examples to the right side.

19:21

Let's see now a video of how this is actually happening. One case, it's horizontal manifold. Another case, it's diagonal. I started from different initializations, but you see that it bends, the training bends the decision boundary so that it will be kind of floating above and below everything.

19:42

It is making every effort to move the red to the top side and the blue on the bottom side of this line. Let's look at three dimensions. If you look at what is the actual decision boundary, it is the picture on the right when you have a chessboard, two-dimensional chessboard that I talked about earlier.

20:01

And I'll show you this in the video. And by the way, I don't think that anyone had produced such a video of how decision boundary actually evolved. I saw many videos, they showed the loss function, how terrible it is. They showed all kinds of things, but not the evolution of decision boundaries while you are training a network.

20:21

Here is again a situation where I start from a crazy initialization, random initialization. The decision boundary initially has nothing to do with the given examples. And then it clings to the image manifold and it creates the dimples. Here is a video. You can watch it.

20:41

It repeats several times. And you see that the claim that I mentioned that you're creating this kind of dimpled version of the image manifold indeed happens in these synthetic examples. Okay.

21:01

In one-dimensional image manifold, it's a line with alternating red, blue, red, blue. I'm training a network which has now three-dimensional input space. The decision boundary is two-dimensional. It's always N minus one. And it doesn't matter how it behaves away from the image manifold

21:21

because it's only judged by its quality in distinguishing between the red and blue points on these other training examples. So in other parts, it behaves in a crazy way, but along the line, which is the image manifold, you see that indeed it undulates gently in order to put things on the right side.

21:42

So now that you have a different mental image compared to what is taught in every machine learning class 101, let me explain why you actually expect the training of deep neural networks to do these clinging and dimpling effects.

22:01

So suppose that you have one-dimensional collection of a guacamole cluster, a cat cluster, cat cluster, guacamole cluster, et cetera. And the initial decision boundary with random weights has nothing to do with it. So sometimes it passes above, sometimes it passes below.

22:21

Now, what are the forces that are going to be applied by the training examples on the decision boundary? So the first cluster of guacamole images is very happy where it is. It is already on the guacamole side below the decision boundary. So it doesn't apply any pressure on the decision boundary to move.

22:43

The examples cannot move in space. They can only apply pressure on the decision boundary during training to move. So the guacamole doesn't apply any pressure. The cats in the second from left cluster are very unhappy. They're on the guacamole side, and therefore they will try to push the decision boundary

23:02

towards them and then also below them. So it passes below. On the other side, it is the guacamoles which are unhappy. They are now on the cat side above the line. So they are trying to push the decision boundary upwards. And the cats afterwards in red are happy where they are.

23:22

So they will not apply any pressure. So if you look at it, the only pressure which is being applied is by cluster which are on the wrong side. And therefore, you take the initial shape and you only apply forces which are going to make the decision boundary closer to those training examples.

23:43

There is no force that tries to push the decision boundary away from the image manifold. And that's why you have any fast initial clinging where all the given examples are pushing you strongly towards the image manifold. Now, how about the adversarial directions?

24:02

Adversarial directions are always perpendicular to the height lines, as I mentioned before. And I drew a picture of what are the adversarial directions which are perpendicular to the decision boundary. And you can see that they are not pointing at all towards the example of the opposite class along the line.

24:23

They are pointing perpendicularly. This is actual training of a network which is supposed to learn to distinguish between blue and reds. And on the bottom right, you see a two-dimensional example. And indeed, all the adversarial displacements

24:40

which are trying to make a cat into a guacamole are pointing in one direction and all those which are going to make guacamoles into cats are pointing in the other direction. So, here's another piece, not for a ... We have many, many experiments which support this idea.

25:01

Here is one example. We are just measuring what is the average distance over the epochs. Horizontal dimension is the epoch number. Vertical direction is the L2 distance between ... average L2 distance between the training examples and the decision boundary. And you see that initially the average distance drops

25:21

very quickly over the first few epochs. This is the clinging process that I mentioned. And then it starts to grow slowly and this is where you are starting to build those dimples and the dimples make you a bit further away from the original image manifold. So, it's very consistent with the theory that I mentioned.

25:41

But I have many other proofs that this is really what's happening when you train networks. And training tends to create large guardians. That's the next claim in the theory. And the reason is that you have a fairly gentle changing image manifold, you clung to it. And then you have to start building those dimples up

26:03

and down and it is going to be much easier not to have very small guardians. What does it mean to have small guardians? In order to change your decision from the training example that now I have to move the decision boundary

26:23

above a certain training example. If I had small guardians to build up large confidence level, you have to go way up above that point. Or if you want to go below a point, if you have a small guardian, you have to go far down. Well, if you have a large guardian,

26:40

it's enough to gently bend your way around the training examples as you are training the network. So I'm claiming that the network will have, this can be experimentally shown, will have fairly large gradients, but they're all pointing perpendicularly to the image manifold. Okay, so suddenly everything becomes intuitively obvious.

27:04

What are those cats which are said to be guacamoles by the network? You see, I'm looking at the particular deep neural network with its decision boundary. I have a whole half space of cats at the top half of the cube

27:21

and a whole half space of guacamole. So the network says it's guacamole at the bottom half of this cube. And therefore, if I look at the particular any point on that manifold of natural images, there is going to be at a very short distance

27:42

an image of the opposite class. Okay? So the things which are going to be called guacamole are everywhere just next to the given training examples. These are, I call them pseudo-guacamoles

28:03

because they are not the training examples of guacamoles which were all on the horizontal natural image manifold. They are created by the fact that I have to make a whole half space being called guacamole, everything which is on one side

28:20

of the N minus one dimensional decision boundary. So next to every point, I'm going to have a point which is going to be of the opposite class even though it doesn't look more greenish or less textured than the original cat.

28:41

Now, when we use an adversarial attack to modify a cat into guacamole, why is the perturbation so small? Because I told you that the training tends to create large, fairly large gradients. So small step is going to be all you need in order to step into the other territory. Why doesn't the perturbation look like guacamole?

29:05

Because I'm not adding greenishness, being greenish or being gray is moving on the manifold. I'm working in the additional dimensions which seem to be useless to us humans, but the network because it has so many additional dimensions

29:25

to play with is using them in order to simplify the process of deciding between the different classes. If they have them, they use them, even though they are meaningless and useless for human being. Okay, people often ask me what happens in more than two dimensions.

29:43

So here I need a little bit of imagination. Suppose there are three classes, guacamoles, cats and pigs. And now think about the manifold of natural images as being one line. They are the one in the center in violet, sticking perpendicularly from the screen. So somewhere along this line, there is a bunch of pigs

30:04

and then further out along the line, there is a bunch of guacamoles and then cats, et cetera. So how will the network actually learn to distinguish between the three classes? Think about three kind of pieces, like you take an apple and you cut it into three parts

30:22

to give to three kids. And it flutters in the vicinity of this one dimensional line. So slightly moves to one side or the other side so that it will put the central, the region in the central line, sometimes in the pig region, sometimes in the guacamole region, sometimes in the cat region.

30:40

That's not, we don't have a situation where we have factor like mixing of all classes. The mixing of the classes is along according to this picture. Okay, I'll skip this. And this is a major of the cell machine learning mystery which we managed to solve.

31:02

I'll say it only in a couple of sentences. In 2019, the group of Madre at MIT showed that if you create from every cat an adversarial example which the network thinks is guacamole, and you take every guacamole and create an adversarial example

31:22

which the machine thinks is a cat, now you throw away the original training examples and use only those examples and use the wrong label, namely everything to the human still looks like a cat, but it was adversarially generated.

31:41

You call it guacamole. So it looks like a cat, but it's called guacamole. And then everything that the human thinks is guacamole, you call it a cat. So you take the target naming created by the deep neural network. And now we take only those adversarial examples, only those wrong labels,

32:00

and you use them to train a second deep neural network. So the second deep neural network never saw anything that looks like a cat being called the cat and never saw anything that looks like guacamole being called the guacamole. Everything is opposite. What do you expect will happen once you train the second deep neural network? It should, of course, confuse the two.

32:20

No, it recognizes cats, real cats given to it as real cats. And real guacamoles is real guacamoles. And I don't have time. If I had 10 more minutes, I would have shown you, but there is a very simple explanation of what's going on with those adversarial examples according to the theory of dimpling the manifold,

32:41

which explains perfectly why Madri's result is correct. Okay, another mystery solved. What determines adversarial distances? You remember Dan Bonnet's explanation. There is some inaccuracy in the deep neural network and the adversarial examples

33:01

are due to the non-alignment between them. But could he predict how far away should the adversarial example be? No. In the last few months, I've developed a quantitative version of the dimpled manifold model, which gives me exact predictions how far away the adversarial examples should be.

33:23

And it turns out that regardless of, according to the theory, regardless of the input dimension, the type of network, the training examples, you can ignore almost everything, and adversarial distance is about one.

33:41

It's a kind of universal constant. I can show it to you here in MNIST. The average image-to-image distance, L2 distance is about nine because they have a small number of dimensions. What is the average adversarial distance? 1.72. In CIFAR-10, another classic dataset,

34:01

the average image-to-image distance jumped to 19. What happened to the adversarial distance? About 0.73. ImageNet, the biggest dataset we usually play with, the distances between images grew to 132. Adversarial distance remained 1.11.

34:22

So, the theory explains perfectly why adversarial distances should be independent of the dimension of the space, for example. Again, if I had more time, I'd be happy to show you why. Okay, another example of deep flows in the security of deep neural networks

34:41

is demonstrated in a recent paper I wrote with my student, Ira Zabi, and again, you can download it from archive. And it deals with how easy it is to fool facial recognition systems. So, it turns out that there are about a billion facial recognition systems deployed today.

35:00

Their quality of the facial recognition went up to 99.35. That's the state of the art. So, it takes two images and it will say whether they belong to the same person or not with 99.35% accuracy, even though they were taken at different ages, different poses, different lightings,

35:21

different everything, it will recognize very well whether two images are the same person or not. So, we tried to do the following. We took the two persons which are furthest away we could think of and we chose Morgan Freeman and Scarlett Johansson. And it turns out that it's very easy to take a deep neural network

35:41

which works at this state of the art 99.35% accuracy and confuse them to say only for those two persons that any image of Morgan Freeman and any image of Scarlett Johansson is one and the same person while being still accurate to anyone else. So, I can do it not only with a particular pair of persons,

36:02

multiple pairs of persons, and I can do other things. I can take, for example, a particular person, James Bond, who wants to fly to China and China has half a billion security cameras and it's very easy to follow somebody from street corner to street corner. I can make sure that it's very high probability any two images of Daniel Craig

36:23

are going to be recognized as different persons so they cannot be linked together while he's walking the streets of Beijing. Okay, so people have tried to fool deep neural networks for facial recognition. Essentially, by using techniques based on adversarial examples.

36:44

So, they print, for example, masks which have the adversarial pattern which changes according to the theory I told you before for one person to another person. So, it's successful. People have used the funky glasses to do it, et cetera, but I don't think you would like to go through passport control

37:02

at an airport with this particular masks or glasses. What we are doing is totally different. We are trying to fool to attack facial recognition systems not by changing the person's appearance. He's walking normally, no masking, no nothing. And we are mathematically manipulating a few weights

37:22

in the deep neural network. And this is almost unheard of. We have very poor understanding of relationship between the weights in the network and what the network is doing. If I take a particular network and I'm increasing it by five, it could do almost nothing to the network or it could totally destroy its behavior.

37:40

And there's no way to tell. It's like trying to increase the IQ of a baby by manipulating his genes. There is a correlation between them, but it's so poorly understood that you're likely to do more harm than good

38:01

trying to manipulate the genes in order to increase the IQ. Here, I have a mathematical formula telling you how to change the weights in the network so that only Morgan Freeman and Scarlett Johansson will be mixed together and no one else will be affected. Here is an example. I'll tell you the basic idea.

38:22

Each one of the pictures is mapped into what's called feature space, which is a high dimensional ball. High dimensional, we usually take 512 features. And therefore, you are measuring 512 things about the image of a person.

38:42

I don't know, it could be the distance between the eyes, the shape of the nose. People have tried to look at each one of the 512 features. None of them is being understood. There are not even hints of what is being measured

39:00

by the deep neural network. But when you train the deep neural network, you train it for images which belong to different persons to try to separate them as far as possible. So the rule is that given two images, you calculate the feature vector, which is produced by each one of those images by measuring 512 things about it.

39:21

If the angle between those two vectors in 512 dimensional space is larger than some threshold, you say these are different persons. And if it is smaller than the threshold, you say it's the same person. That's how all the DNN-based facial recognition systems operate.

39:43

So this is probably the most important image you have to remember in this part of the talk. What am I trying to do? Think about the directions of different people like countries on the globe. And I tried to now distort the globe in order to achieve certain features. So suppose that Morgan Freeman

40:01

corresponded to the location of Germany, and Scarlett Johansson as far away as her images are clustered together in the direction of Japan. I want to distort the globe so that Germany will move over Japan. But I told you that I can do it with multiple cases.

40:21

So I also want to combine other pairs of images. So in addition to moving Germany over Japan, I also want to move Peru over New Zealand and I want to move Chile over Norway. I'm trying to pretzelize the globe. And in addition, pick your favorite country

40:42

like North Korea. And you want to take the region of North Korea and spread it all over the globe so that any two images from the original cluster which was in North Korea will now be at large angles between them. Looks like a terribly complicated kind

41:03

of distortion of the plane. But actually, there is a very simple linear mapping that will pretzelize with any choice of which country you want to move over which other country and which country do you want to spread all over the place. So because I don't have too much time, I'll just give you a hint about it.

41:25

But before that, I just want to explain how we actually implement the backdoor. So because I told you that we have a simple linear mapping that can achieve all these distortions, what I'm going to do, I'm going to look

41:40

at the deep neural network. And in the last part, there is a final linear mapping which gives you the direction in space of the feature vector corresponding to the given image. So what I'm going to do, I'm going to compose whatever was originally in the last linear mapping

42:03

with the new linear mapping, which I just told you exists, which does the distortion of the space. And the effect is that I take the original deep neural network. I don't understand how it works, but I can manipulate the last layer, the linear mapping in the last layer by just composing it

42:21

so it will not change the architecture of the deep neural network. It will still look the same deep neural network, same architecture, same number of neurons, same connectivity. I just changed the weights in the last layer. How can I achieve it? For example, the attacker might build a mirror site

42:41

in which he's going to offer to the world a variant of another deep neural network, which is very successful. It will be just as successful because no one will try to measure how does it work or how successful is it when comparing Morgan Freeman with Scarlett Johansson.

43:00

They'll check for other pairs and for all other pairs, it will work perfectly. So it will just mirror some other open source deep neural network or it will claim that he had improved the available system so it recognizes better people of, I don't know, Japanese origin because earlier the faces might not be so well recognized

43:22

and now he improved it slightly. He will find a ruse or he can use cyber attack in order to get into the computer system and change some of the weights and as a result, it will start behaving according to the back door. So let me show you some of the results we have.

43:42

So these are actual results on the state of the art, the best open source facial recognition system. We chose Morgan Freeman and Scarlett Johansson. ASR is attack success rate. So for 90% of pairs of images, any one of the images of Morgan Freeman you pick,

44:01

any image of Scarlett Johansson you pick, in 90% of the time it is going to say these are the same person, no difference at all. And at the same time, we also take Rihanna and Jeff Bezos and Barack Obama and Elon Musk, et cetera, et cetera. And I didn't show it but the effect on any other pair is negligible, it remains at 99.35,

44:21

maybe go to 99.34. Okay, so this is for confusing between pairs of persons and this of course can be useful if you want to enter a secure building in China and you make your face be recognized as President Xi and of course all doors will open automatically for you,

44:40

et cetera. Here is the anonymization attack. Remember the James Bond which cannot be followed in the streets. So we took Leonardo DiCaprio, for example, and the attack success rate was 97.12, namely take any two images or different images of Leonardo DiCaprio and it will say

45:02

these are different persons with 97% of the cases. The only cases where it fails is if it was almost essentially the same image taken from the same direction under the same lighting conditions, et cetera. And so you see that the attack succeeds very well

45:22

and now if you were curious I'll just give you the hint about what is the idea involved. You remember I told you that there is the direction of Germany, this is Morgan Freeman and there is the direction of Japan which is the feature vectors of all images of Scarlett Johansson. They are far away from each other.

45:41

However, you can always rotate the ball, rotating the ball is a linear operation. It's a unitary matrix. So that Germany after the rotation will be directly over Japan. What does it mean? That all the difference between the two countries is now only the vertical dimension.

46:03

So Germany and Japan will become identical in all respects, in all the 511 other coordinates. And the only difference will be in the X1 coordinate which is the vertical direction. So I've made the rotation because I know

46:22

that what is the direction of a feature vector of Morgan Freeman. This is, I assume it's open source network and I can run several pictures of each one of those persons and I find what is that direction. I know how to rotate it.

46:40

I've now concentrated all the difference into the first coordinate. There is no other difference. Usually you think about, you measure 512 things about those two images, they'll be different everywhere. But after the rotation they differ only in one aspect and therefore if I zero the first coordinate

47:02

where I concentrated all the difference and this is projection which is again linear operation, I've eliminated the difference and now they are just in the same vicinity. And now I'll give you an exercise to think how do you do it with multiple pairs. So I want to do this kind of linear operation

47:21

also on another pair like Barack Obama and I don't remember Jeff Bezos and Rihanna, but I don't want you to confuse for example Rihanna with Morgan Freeman. So there is a way how to take a number of pairs and change it. Finally, how do I make people anonymous

47:42

if I rotate the sphere so that all the images corresponding to the person I want to anonymize are on the North Pole. If I project it down to the equator, how does it look like? Originally from the point of view of looking at the North Pole from the center of the Earth,

48:04

the angle was very narrow. What happens if I squash the Earth into just the equatorial plane? Suddenly, the North Pole becomes a circle all around the center of the Earth. And therefore, the angles between any pair of them

48:24

is going to look like the angle between random directions in the n minus one dimensional space, which is going to be large on average. High dimensions almost every two vectors are going to have a large angle between them. And what happens to all the other persons?

48:41

For example, in the picture I have here of equating Germany with Japan, I managed to concentrate all the difference between those into one coordinate. But if I look at other pairs of countries, I didn't, I just rearrange a little bit

49:00

the difference between them, but the vectors are still going to be different also in the second coordinate, third coordinate up to the 512 and therefore they will not be equated. The angle will change only in a tiny way when I eliminate the first coordinate. Okay, I think that's all I wanted to say. So some concluding remarks.

49:21

I showed you two research results. One about understanding and solving the mysteries of adversarial examples. And still they are extremely vulnerable to many types of attacks. I showed you that in some sense, adversarial examples are unavoidable in these models.

49:41

And indeed, there were many attempts to eliminate adversarial examples and all of them have shown to be a failure. For example, people said, why don't we do adversarial training? What is adversarial training? I, in addition to taking a cat and saying it's a cat,

50:01

I move the cat in the direction of making it an adversarial example, which is recognized as guacamole. I'll add it to the training example, but with the tag saying, look, this is still a cat. Even though I moved it in the perpendicular direction and it's in the current network,

50:20

it moved to the guacamole example. I now say, no, it's still a cat. What happens? You took the manifold of natural images and you second it a little bit. And now, what is happening is that the kind of dimpled manifold, which is the new decision boundary,

50:42

it is just going to have to go a little bit higher and a little bit lower in order to go over these additional training examples so it's not a fundamental solution. It just takes the adversarial distances and pushes them away a little bit and instead of one, there'll be two. Okay, second thing is that there are currently

51:02

no good ways to recognize adversarial examples. If I show you an example, people say, look at whether it belongs to the manifold of natural images. If not, say that it's an adversarial example. That's again a poor protective technique because adversarial examples are actually going to

51:22

we have a poor understanding of the exact definition of this image manifold. We have many more dimensions than in reality. In reality, we know how to compress an image

51:40

to about 2% of the original size. But the real image manifold is actually characterizable by much smaller, a fraction of 1% of the number of parameters. We don't have good compression techniques in order to do it. Had we known the exact definition of the image manifold, we will notice that you are far away.

52:01

But now that we have only an approximate definition, it's not good enough. And finally, the last part of my talk, since we do not understand the role of weights in DNN, somebody gives you a network and you cannot really tell whether these weights make sense or do not make sense. I can take a network,

52:21

change it slightly in a mathematical way without retraining, make it behave very strangely on very particular situations. And so it's easy to embed how to detect trapdoors into open source models. Thank you very much. Thank you.

Recommendations