AI VILLAGE - The current state of adversarial machine learning - TIB AV-Portal

AI VILLAGE - The current state of adversarial machine learning

00:00

19

Lawrence, Heather (infosecanon)

Formal Metadata

Title

AI VILLAGE - The current state of adversarial machine learning

Title of Series

Number of Parts

322

Author

Lawrence, Heather (infosecanon)

License

CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/39779 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Machine learning is quickly becoming a ubiquitous technology in the computer security space, but how secure is it exactly? This talk covers the research occurring in adversarial machine learning and includes a discussion of machine learning blind spots, adversarial examples and how they are generated, and current blackbox testing techniques. Heather Lawrence is a cyber data scientist working with NARI. She earned her undergraduate and MS degrees in Computer Engineering from the University of Central Florida focusing on computer security. She is pursuing a PhD in Computer Engineering from the University of Nebraska Lincoln. Her previous experience in cyber threat intelligence modeling, darknet marketplace research, IT/OT testbed development, data mining, and machine learning has led to several awards from capture-the-flag competitions including the National Collegiate Cyber Defense Competition, CSI CyberSEED, and SANS Netwars Tournament. Her current research interests focus on the application of machine learning to cybersecurity problem sets.

Speech

Text

Image

00:00

Confidence intervalRiflingMathematicsTurtle graphicsExpert systemRight angleVirtual machineState of matterAutonomous system (mathematics)Perturbation theoryNoise (electronics)Endliche ModelltheorieInformationDifferent (Kate Ryan album)AlgorithmTwitterType theoryVideoconferencingSound effectSupport vector machineWordPlastikkarteSlide rule2 (number)Medical imagingSpacetimeAngleSocial classGoogolMachine learningCASE <Informatik>Texture mappingEvent horizonMeeting/Interview

02:07

SpacetimeSampling (statistics)Validity (statistics)Entire functionMedical imagingError messageMixture modelDemo (music)CuboidWave packetDigitizingHybrid computerDomain nameFigurate numberOverhead (computing)Noise (electronics)Multiplication signBus (computing)Endliche ModelltheorieGame theoryDialectMereologyVideoconferencingPoint (geometry)Decision theoryNumerical taxonomyType theoryBlind spot (vehicle)Right angleVirtual machineBlack boxSet (mathematics)Exploratory data analysisComputer fontAreaBoundary value problemLecture/Conference

04:58

Blind spot (vehicle)Software bugAreaExistenceLecture/Conference

05:31

Information securitySoftware bugElement (mathematics)Game controllerAlgorithmDivisorDomain nameDecision theoryOrder (biology)Expert systemVirtual machineInformationElectric generatorMereologyCASE <Informatik>Error messageComputer programmingEstimatorMachine codeRight angleMachine learningLine (geometry)Lecture/ConferenceMeeting/Interview

06:42

Level (video gaming)CircleGoodness of fitNoise (electronics)Sampling (statistics)Right angleWave packetPerturbation theoryAlgorithmWordElectronic signatureMultiplication signPixelRandomizationDirection (geometry)Physical systemPattern recognitionFrame problemSign (mathematics)MereologyFunction (mathematics)IdentifiabilityInformation securityProgramming paradigmoutputDampingBlind spot (vehicle)BitDerivation (linguistics)Degree (graph theory)Gradient descentVirtual machineInformationMachine learningMeeting/Interview

09:11

Network topologySource codeDecision theoryMetropolitan area networkCycle (graph theory)Wave packetCuboidEndliche ModelltheorieVideo gameMachine codeInformationPoint cloudSupport vector machineBlack boxInformation securityAutonomous system (mathematics)Right anglePatch (Unix)Linear regressionArtificial neural networkCartesian coordinate systemVirtual machineProcess (computing)Loop (music)output1 (number)Constraint (mathematics)Multiplication signWeightSpacetimeLogistic distributionDemo (music)Software testingSampling (statistics)Mobile appAreaDifferential (mechanical device)Heat transferSingle-precision floating-point formatMereologyHeegaard splittingQuery languageAsynchronous Transfer ModeMachine learningLimit (category theory)Graph (mathematics)OracleState of matter2 (number)Lecture/ConferenceMeeting/Interview

14:43

Goodness of fitLink (knot theory)Smith chartSlide ruleProcess (computing)Lecture/ConferenceMeeting/Interview

15:06

Computer fontAlgorithmInjektivitätVirtual machineMachine learningoutputIntegrated development environmentSlide ruleWeb pageValidity (statistics)Decision theoryMetropolitan area networkBlind spot (vehicle)SequelLecture/ConferenceMeeting/Interview

Transcript: English(auto-generated)

00:00

Hi, I'm presenting adversarial example, witchcraft, or how to turn how to use alchemy to turn turtles into rifles. I let's say, I'm Heather Lawrence, I do data science at the Nebraska applied Research Institute, you can find me on Twitter under info second on, if you didn't get a chance to visit the slides online, I have them on my Twitter. And

00:22

then I'm also not above bribing my audience. So I have stickers back there and cards in the event that you want them right back where this goon here is waving his, his hand. Alright, cool. So here we see a

00:40

video of Google's state of the art Inception three model. And you see a pretty turtle, right? It's obviously identified as a turtle. And then after making changes to the texture map, the classifier believes it's a rifle with high confidence from every angle. Look at that rifle, isn't it so pretty. So

01:07

most of this research has in this space has focused on manipulating image classifiers. It's easier to tell visually that an effect is occurring. And we watch this video, we saw it being identified as a rifle. So I'm going to motivate the the real

01:22

question of this talk, which is, what happens when an autonomous system cannot tell the difference between a turtle and a rifle in a surveillance state. Just kind of like marinate on that one for a second. And so I didn't write this for machine learning experts, I wrote this to be approachable. So

01:41

I'm going to use something like some terminology, classifier is like a style of machine learning algorithm that determines a class of data. I might say SVM, which stands for support vector machine, it's a type of algorithm, you don't need to know about any of the math or how it works. When I say perturbation, I mean, I'm basically adding noise. And that's it. It's a very fancy word, it means adding

02:01

noise. And then adversarial example is presenting a worst case example for an algorithm. And so my outline goes like this, brief history, types of attacks, what blind spots are, it's pretty important to motivate adversarial examples, what they are, how to defend against them, as far as we know, white, black, a white box versus black box

02:22

techniques, a demo, and then resources are at the end. So in 2004, Dalvy et al released this paper called adversarial classification. And it was in the in the spam detection domain, and it outlined a formal game between a victim and a defending classifier. And they were trying to

02:41

determine which one could fool the the other. And then Hoeing et al, in adversarial machine learning, they kind of define like a formal taxonomy about regarding the attacks that are possible. And in 2016, it's interesting, because now we've moved beyond the theoretical and I don't even

03:01

need access to your classifier to attack it anymore, which is kind of interesting. And so we have poisoning versus evasion, poisoning happens before training, and evasion happens after training. And you'll notice here from this Biggio et al paper, you have part of the MNIST dataset, which if you aren't familiar, is this

03:23

huge image dataset of handwritten digits. And the idea is the classifier is trying to properly determine what that digit is from handwriting. And so they added some noise. And you'll notice the classification error, the validation error shot up right after they added this noise. And so the evasion

03:40

attack after training, you see a bus, we add some noise, and now it's an ostrich. That looks like an ostrich, right? Yes, yeah. Alright, so the types of attacks, right, we have causative, so we can manipulate the training data before training, if you have that kind of access, you have data poisoning,

04:01

where you can specially craft attack points injected into the training data again, before training, or exploratory, where you're trying to explore, exploit the classifier kind of figure out how it works after it's already been trained. And then a hybrid is a mixture of those attacks. Alright, this is probably the the important part of this, like, what is a blind spot?

04:22

And why do I care about it? They are regions in the models decision space where the decision boundary is inaccurate. And basically, this is areas that are not well defined. And I like to use pandas. So let's say I have a classifier. And I'm training the classifier and what a panda looks like. I've got a whole bunch of images on

04:41

what pandas look like. But I want you to think for a second about the entire sample space of what is not a panda, and that I have to provide that to the classifier. I can't exhaust that space, and in any reasonable amount of time, because the overhead on that is crazy, right? Everything that is not a panda has to be provided to the classifier. Well, if you

05:02

don't provide that data, the classifier doesn't know it has to infer, right, has to infer what is not a panda based on what it thinks it is a panda. And so that's where these blind spots are coming in. Because we don't exhaustively or we don't exhaustively provide the classifier that data. And given right, this is an ongoing research area. So

05:22

nobody, nobody has definitively proven yet why these exist. This is theoretically why they exist. So they let me motivate that real quick. And so here be bugs, right? As introspection into algorithms increase, so do the flaws, like bug bounty programs, right? You have more eyes looking

05:41

on lines of code, you're going to see more errors. You know, if you have more AI experts looking at the algorithms, you're going to see more flaws. So the Bureau of Labor Statistics estimates, there's like 105,000 information security analysts here in the US, whereas element AI estimates, there's only 22,000 AI experts worldwide, right? That's a factor

06:02

of five in this country alone. So do you want to get into machine learning? We need you. Alright, so what are adversarial examples? They're data that presents a worst case to the classifier. It's intentionally called, intended to make the classifier make a wrong decision. Some of the examples,

06:22

particularly for information security, is detecting domain generation algorithms used in command and control infrastructure, and then malicious portable executables that are classified as benign. And this, there's actually a paper behind this, which is really cool. They determine the parts of the executable that could not be perturbed, could not be changed in

06:41

order for it to be executed. And then they took all the other bits, and they perturbed all of those. And the classifier could not detect them as being malicious. It was like, oh this is fine, this is benign. So maybe if you're information security, you remember signature based detection and how that was a problem? Well now we have a

07:01

problem with machine learning based detection, right? We're at that next stage of the attack defense paradigm. So some real-world adversarial examples are the sticker attack on the self-driving cars. If the self-driving car cannot identify that that is a stop sign, using eyeglass frames on facial

07:24

recognition systems, they cannot properly identify who the person is with a pair of glasses on, or perturbing oral vocal sound and making it sound from, it was the best of times, it was the worst of times, adding some noise, and it is a

07:40

truth universally acknowledged. Those aren't the same at all. And so now, remember how you used to throw salt over your shoulder for good luck? Well now we're using salt circles to trap self-driving cars. So we are effectively using alchemy to fool AI

08:00

systems now. So generating adversarial examples, right? What I've been talking about this whole time, right? We're adding noise, random perturbations, to the sample. And it's optimized with something called gradient ascent. It has to do with derivatives. That part's not particularly important, but it's a method

08:21

that determines the directions that move the algorithm's output by the greatest degree, and then directs the input by small degrees to create that output. That's a lot of words. Basically we're just adding noise. We're adding special noise, and we're adding it to every pixel, so that when the classifier looks at it, it's like

08:40

right in that blind spot. I don't know whether to infer that's a So what we're going to do about adversarial examples is we start facilitating robust algorithms. We know that retraining from scratch increases misclassifications. We know that retraining with disjoint data

09:01

increases misclassifications. And we're starting to find out that training with adversarial examples reduces the amount of misclassifications. So if we reduce the weights or the activation given to the inputs, we can reduce the amount that classifier is affected. We can also

09:21

choose to keep a human in the loop. Do not use autonomous systems. Do not let them do whatever they want and not check them. Or you can use something called the consensus method. So now instead of having a single classifier that's trained, we now have like three classifiers that are trained. They take in the input and they come to

09:40

a decision on whether that input should be trusted or not. These are all methods to try to make these classifiers more robust. And so we might see our training life cycles change. Anybody who's done like machine learning in this room will notice that import data, clean data, test train split, and then deploy. That's already part of our life cycle. But now with

10:03

adversarial examples, we might have to train with them, continue to test with them, repeat this process, right, to get that sample space reduced, and then deploy afterwards, after we retrain.

10:21

Unfortunately some of the early research in this area had really bad attacker assumptions. It's like easy mode, white box assumptions where the attacker apparently has the code, the training data, everything that happens, and like who has that kind of access? I don't have that kind of access. And

10:41

they reference the information security community for trying to get like more robust attacks. But I guess they think we know something? I don't know. But black box research is now here, right? They're starting to assume more constraints. And attacks can be transferred between classifiers. It's

11:01

called model transferability. I'm going to get into it here in a second. The idea is that an attacker model uses the victim model as an oracle, and it queries the oracle over and over again for the decisions on how it's going to classify. It takes those decisions, trains a separate completely separate model, generates

11:21

the adversarial examples from that attacker model, and those examples still work on the victim, right? So this is from adversarial examples in machine learning by Papernow. It was presented in Usenix Enigma 2017. Basically the same idea but visually represented. You have an attacker,

11:42

queries the oracle, the oracle returns what its classification decisions are. The attacker trains the classifier B, and that the adversarial examples that are affected by classifier B also affect classifier A. And so this this graph that you see here on

12:01

the left, right, bunch of boxes, it's showing that the source machine learning technique, which ones were affected on the target. So on the vertical axis, let's look at LR, logistic regression. If I, if my source was logistic regression, and I trained it, and I

12:20

use the adversarial examples on, let's say, the other logistic regression, right? The victim model was also LR. My increase in misclassifications go up to 91%. That's huge. 91%. Like, look at all that huge black column. Decision trees are the worst, apparently. And this is really

12:42

scary, right? How many classifiers do you know that are logistic regression, support vector machine? The these are pretty popular classifier models. So what we know is that differentiable models like logistic regression are more affected than models that aren't.

13:00

So like deep neural networks are less affected than logistic regression is. And they make us make use of something called reservoir sampling. When you have to query the Oracle 10,000 times, somebody is going to notice, right? With reservoir sampling, I can get that down to 1000 times and have less notice. But I can still have that. I can still

13:22

have that randomized sampling space to where it's almost like I got the 1000, the 1000 queries and only got 100. Notable recent research is very interesting. The space is moving very fast. Right now, we're starting to use a limited information approach and then testing it on the Google Cloud

13:41

AWS classifiers to see how well they're affected. And they are affected. Some of the papers that I have in the resources, they go more into detail on this and they actually show percentages and they are in the 80s and above. So these are affected in state of the art classifiers in the cloud.

14:03

All right, so the demo, I know the adversarial patch was given yesterday. But if you have a app called TF classify, it's a classification app that you can attempt to classify the things that you have in front of you one at a time and provide it to this adversarial patch. It looks like a toaster.

14:22

Does that look like a toaster to you? Yes. Yes, it looks like a toaster. Man, I don't know what you have, but I want whatever you have. All right, so let's go through this demo. We have a pair of glasses at 67%, 70%, pretty good. Yep, that's a toaster at

14:43

about 40%. And then it's kind of limited on what it's been trained on. So here's my duo serving here. And that's apparently a Granny Smith apple. Good job, guys. So I have a bunch of references for

15:01

this talk that I couldn't go into in 20 minutes. And that's why I provided links to the slides, right? It's written in nine font. I don't expect you to take pictures of that. And maybe you are in this talk and you're like, man, machine learning sounds cool. Here's some resources for you. There's a GitHub page with machine learning for cybersecurity, which is amazing. And of course, Andrew Ng's machine learning

15:21

course is like the go to thing for machine learning. So takeaways you walked in, basically machine learning algorithms can be attacked. Algorithms like humans have blind spots, you need to red team your algorithms to increase robustness. Otherwise,

15:40

somebody is going to do it for you and you may not know. And then like SQL injection classifiers require input validation. If your classifier is taking input from an adversarial environment or a possible adversarial environment, that is users that could put in data, make sure you control that data that you accept from that user, right? Don't allow it

16:01

to retrain your classifier or otherwise alter it to make poor decisions. Again, you can get these slides here. My name is Heather Lawrence. I work data science at NARI. Thank you for your attention.