We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

AI VILLAGE - Machine Learning Model Hardening For Fun and Profit

00:00

Formal Metadata

Title
AI VILLAGE - Machine Learning Model Hardening For Fun and Profit
Title of Series
Number of Parts
322
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Machine learning has been widely and enthusiastically applied to a variety of problems to great success and is increasingly used to develop systems that handle sensitive data - despite having seen that for out-of-the-box applications, determined adversaries can extract the training data set and other sensitive information. Suggested techniques for improving the privacy and security of these systems include differential privacy, homomorphic encryption, and secure multi-party computation. In this talk, we’ll take a look at the modern machine learning pipeline and identify the threat models that are solved using these techniques. We’ll evaluate the possible costs to accuracy and time complexity and present practical application tips for model hardening. I will also present some red team tools I developed to easily check black box machine learning APIs for vulnerabilities to a variety of mathematical exploits.
Endliche ModelltheorieMachine learningElectronic meeting systemStudent's t-testComputer programmingBoss CorporationMereologyComputer clusterComputer animation
BitLevel (video gaming)Nichtlineares GleichungssystemVirtual machineSlide ruleType theoryEndliche ModelltheorieMathematicsVector space
Machine learningEndliche ModelltheorieModal logicoutputArchitectureArtificial neural networkProcess (computing)Optimization problemFunctional (mathematics)Endliche ModelltheorieHeegaard splittingParameter (computer programming)Software testingUniverse (mathematics)ApproximationWave packetPredictabilityMachine learning
PredictionEndliche ModelltheorieTwin primeHill differential equationPredictabilityEndliche ModelltheorieSystem identificationVirtual machineRow (database)Sensitivity analysisForcing (mathematics)InformationBitLevel (video gaming)Connectivity (graph theory)Machine learningPhysical system
PredictabilityEndliche ModelltheorieBitoutputNoise (electronics)Different (Kate Ryan album)StatisticsRevision control
Medical imagingBitCoefficient of determinationWave packetGraph (mathematics)Set (mathematics)Physical systemRight angleBoundary value problemIntegrated development environmentDifferent (Kate Ryan album)Average
Boundary value problemGradientElectric generatorBitArmVirtual machineCategory of beingConvex optimization
Convex optimizationWord
Information privacyInformation privacyProduct (business)Virtual machineVariety (linguistics)CryptographyComputer animation
RiflingInformation privacyDifferential (mechanical device)AlgorithmStochasticExecution unitDatabaseSlide ruleNichtlineares GleichungssystemInformation privacyInformationAlgorithmStochasticIdentity managementFunction (mathematics)Limit of a functionPredictabilityRevision controlVirtual machineResultantCASE <Informatik>Element (mathematics)Database
Information privacy2 (number)DatabaseNoise (electronics)Computer configurationMereologyTrailPredictabilityMathematicsInformationPoint (geometry)Maxima and minima
Information privacyProper mapPredictabilityKey (cryptography)
WeightoutputPredictionEndliche ModelltheorieArchitectureWave packetCASE <Informatik>PredictabilityEndliche ModelltheorieInterpreter (computing)Information privacyoutputConfidence intervalInformation
Computer-generated imageryVirtual machinePoint (geometry)Wave packetMedical imagingFacebookBitDemo (music)CASE <Informatik>Endliche ModelltheorieRight angle
Computer-generated imageryoutputPredictionEndliche ModelltheorieIterationWeightParameter (computer programming)ArchitectureEndliche ModelltheorieArtificial neural networkConfidence intervalFunction (mathematics)CASE <Informatik>Optimization problemWave packetNoise (electronics)Different (Kate Ryan album)Error messageDiagramType theoryVector spaceEstimatorLevel (video gaming)Information privacyConstructor (object-oriented programming)Virtual machineRepresentation (politics)Differential (mechanical device)Dimensional analysisoutputOpen setSocial classNetwork topologyDecision theoryAlgorithmDependent and independent variablesConvex set
SequenceWave packetInformationLevel (video gaming)Formal languageEndliche ModelltheorieSensitivity analysisPlastikkarteQuicksortPredictabilityRepresentation (politics)EmailNumberArtificial neural networkComputer architectureProcess (computing)File formatAlgorithm
SequenceModal logicArchitectureoutputStochasticInformation privacyFunction (mathematics)PredictabilityTrailQuicksortGradientWave packetMathematical optimizationDirection (geometry)SubsetSearch algorithmInformationEndliche ModelltheorieRandomizationBenchmarkRevision controlNumberSlide ruleCASE <Informatik>Gradient descentNoise (electronics)AverageMusical ensembleInformation securityNormal (geometry)Algorithm
SequenceStatisticsPredictabilityState observerInformation privacyDifferential (mechanical device)InformationCryptographyEndliche ModelltheorieComputer animation
MultiplicationEncryptionHomomorphismusoutputEndliche ModelltheorieBackdoor (computing)HomomorphismusGroup actionWave packetEncryptionStatisticsPoint (geometry)NeuroinformatikInformationCircleRight angle
Slide rulePredictabilityFunction (mathematics)QuicksortEndliche ModelltheorieNoise (electronics)Musical ensembleConfidence intervalSlide ruleQuery languageMachine learningMaxima and minimaInformationProcess (computing)Information privacyMultiplication sign
Transcript: English(auto-generated)
Hi. So as he said, my name is Adversarial. My real name is Ariel Herbert-Voss. I'm a PhD student at Harvard, and this is related to research that I do as part of my program. So I'm going to hop into that now. So what this talk is about, I probably should have called it beyond adversarial examples.
So adversarial examples are an important attack vector, but they're also not the only one to be concerned about. So we're going to go into the weeds a little bit about some of these other types of attacks that you can do against machine learning models. And also, this machine learning talk does not really have a lot of math in it. There is one slide that has an equation on it, and the rest of it is pretty high level.
So if you're not, if you're still a noob, that's fine. Like, don't leave. So here's a rundown on the ML pipeline if you're not familiar with that. So you've got some raw data that you want to get some value out of. So you will typically do some processing to extract some features. And then after that, then you'll split the data
into training and testing, because machine learning is basically a noisy optimization problem where you iterate over a bunch of parameters, and you're trying to look for a function that represents whatever relationship in the data that you're looking for. And this is also why deep learning and a bunch of these other techniques are so popular is because you can actually prove mathematically
that deep neural networks and some of the other models are universal function approximators, which means they can basically learn anything. But that also means that we can force them to learn the wrong thing. So after you do all this training and testing, you'll pick out the best model to fit the needs for whatever it is you're hoping
to get out of your data. And then you'll usually either deploy it in production as like an API where users can put in their own data and then get out some predictions depending on what your user is looking for, if you are the user. So when we attack machine learning systems,
generally there are two high level goals. We're looking to either force incorrect predictions or we're looking to steal data. So forcing into correct predictions is like you want to attack this model and make it give you the wrong answer for things, or you wanna see whatever the model trained on, because it might be trained on something sensitive
like medical records or personal identification information. And so these machine learning models tend to have three components to them. So there's the data, there's the model itself, and there's the prediction. And so you can usually formulate attacks as being one of those, attacking one of those three things or a couple of those three things.
And so I'm gonna talk a little bit about adversarial examples. And you've probably heard about these. But the main idea is that you wanna slightly perturb the input data that you're feeding in to force the model to make an incorrect prediction. And they're also very hard to detect. So here's an example where you got a panda picture
and then you add some noise to it, and then you end up with this panda that's gonna be classified as a gibbon. But if you look at it, you can't really tell because it's just statistical noise and it's hard to perceive the difference. And depending on how you've deployed your ML model,
like you can't really sanitize the inputs because somebody can just come along and like just keep making harder and harder to predict versions of these adversarial examples. And I guess to make this more clear, here's an example from the perceptual system. So we've got a training set of dogs and fried chicken, and you wanna correctly classify which one is which.
So if you look at them, you can see that some of them are actually a bit harder to tell the difference, especially if you're sitting in the back and it's like really fuzzy. Let's see, so the two on the left are chicken and the two on the right are dogs. And so there's, if you think about this, there's a couple of features that you're looking at that tell you if one is a dog,
like you might be looking for eyes and a nose. And with chicken, you might be looking at the environment like is there a piece of lettuce because dogs don't really like lettuce. So this picture, this graph on the right gives you more of a mathematical sense of like how these things work. So you've got this oval right here is the dog cluster
where all the images inside there are gonna be classified as a dog. And you've got this adversarial image on the outside that you want to have classified as a dog when it's currently being classified as a chicken. So you want to kind of perturb your image
in such a way that it kind of goes around the boundary between dog and chicken. And so when it comes to generating these adversarial examples, there's like a million different ways to construct them. And we keep coming up with more of them and it's great. And there are also some ways to defend against them, but the ways in which we defend against are kind of getting smaller and smaller.
Like this last year, actually in June, there's a big machine learning conference called ICML and they just published this great paper about why one of the main methods for defending against these things doesn't work. It has to do with the generation method being, so since deep learning is basically convex optimization
and you're trying to find examples that fuzz the boundary between categories, it usually involves a bunch of gradients. And so most of these methods have to do with gradients and it's a bit of a gradient arm race right now. So we're not gonna talk much about adversarial examples because it's very messy. And also I think there are more important attacks.
This is one of my favorite memes and I thought I would share it. You probably can't read it in the back, but it's one of those Scooby Doo memes. And it says, oh shit, I can't read because I don't like glasses. But it says, can somebody read that for me? Okay, okay, come up, come here.
Thank you. Okay gang, let's see what deep learning really is. What's that word I can't, convex optimization. It's blurry for me too. All right, all right, thank you.
Yeah, so deep learning is basically just convex optimization. And that's why adversarial examples are very hard to deal with. So today we're gonna talk about some other ways to hack things. But first we're gonna talk about differential privacy. So differential privacy is a technique from cryptography that is usually used to protect data from being exfiltrated. And in fact, Google and Apple
have both made a big show recently about using it in a variety of their machine learning products. So this is the slide with the only equation on it. And you're gonna stare at that for like a minute. Just kidding, not a minute. Here's the informal version of what that is. What differential privacy says is that
you've got two otherwise identical databases, D1 and D2, one with your information in it and one without. And so differential privacy ensures that the probability that a stochastic algorithm A, and in this case stochastic algorithm can be a machine learning algorithm, will produce a given result C, like a classification.
The same whether or not you're operating that algorithm on D1 or D2 up to a particular bound epsilon. And so the key insight here is that adding or removing an element from the data doesn't really change the output prediction very much. And what differential privacy does
is it provides a bound on how much changing one piece of data affects the output. And so how do we actually do this? Because that's like a bunch of math and it's not immediately clear how you apply this. So there's two parts in when you're actually applying this kind of technique to something. So the first thing you wanna do is you wanna add noise.
And the second is you wanna keep track of how many data access requests are granted. So adding noise allows us to obfuscate the predictions in such a way that they're still useful but don't provide enough information to really allow an adversary to reconstruct the information that you're pulling out. And then with regards to keeping track of how many data access requests,
the more information that you ask about the data, I mean, obviously that means the attacker can learn more about it. And this means that we have to add more noise to minimize the privacy leakage. And there comes a point where you've just leaked too much data and no matter how much noise you add to it, you can add so much noise that it no longer is really that useful. And you're just spitting out a bunch of useless predictions.
So at that point, your best option is actually just to burn the database down and start over. So don't do that. And so the key to implementing differential privacy is to pick the proper privacy budget. And so when I say privacy budget, I mean the notion of counting how many requests that you're making on the data.
And so if you have too high of a privacy budget, then you're gonna leak too much data. But if you have too low, then your predictions are gonna be way too noisy to be useful. So now we're gonna talk about data exfiltration because this is the attack that differential privacy can protect you against.
So data exfiltration in this case refers to the scenario in which we want to steal the training data that's being used to train a given model. And so the attackers have access to the input data and the predictions. So they can put anything in and they can also see the predictions. And then in a lot of cases, a lot of these APIs
will spit out kind of confidence information for interpretability because if you're making a prediction, you want to kind of know how confident you are that the prediction is correct. But unfortunately, that's actually a bad thing. So here's an example. So given a label like a name, we want to extract the associated training image from a pre-trained machine learning model.
So in this case, on the left, we have an image that we have trained on as just an illustration. And then on the right, we have a recovered image that we've managed to pull out using the technique I'm gonna talk about in a sec. So in this case, you can like feed in a name
and you can get out a face, which if you think about the Facebook API or anything else that exposes its endpoints, I think maybe Facebook has patched that by now. But there was a point at which you could feed in your name and you could get out your picture if you had a very unique name like me and it's terrifying and awesome.
And I was supposed to have a demo of that, but I'm sorry. Okay, so here's a little bit about how that attack works. So and more explicitly with diagrams because diagrams are helpful. So we wanna treat this as kind of an optimization problem where you want to find the input that maximizes the confidence intervals that also matches the target.
So in this case, when you're maximizing the confidence interval, you're maximizing how comfortable you are with the piece of data that, or you're maximizing how sure you are with the piece of data that you're targeting is the piece of data that you want. So naturally, if you maximize that, it'll feed out what it is you're looking for.
And you can use a couple of, so in this case, you'll train like an auxiliary ML model or you could, yeah, or you treat it as just like a normal, a convex op problem. But in this case, you can use a bunch of different types of machine learning algorithms, depending on what the original model is. So if you're training on decision trees,
then you might use map estimation. And neural network models tend to, or they work to map the feature vector into a lower dimensional space so you can then separate the classes. And then auto-encoder will then find the compressed latent representation, which minimizes the construction error. And if that doesn't make any sense to you,
if you've seen the, what's it called? That model that hallucinates? Yes, yes. If you've seen something like that, then that's more or less the attack that I'm describing, like you're trying to force a particular response. And so the question then becomes, how do we handle this?
So with regards to differential privacy, the easiest way to do this is just round the confidence interval, because that's basically adding noise to the output. And so it's like cheap and easy, really dirty differential privacy. I'd recommend doing this if you have endpoints open like that. And so here's another example of data exfiltration
with regards to sequential data, which I think a lot of us work with. And this is also pretty wild, because it's a baked-in specific feature of all deep neural network architectures, where they're basically memorizing information. And so even though you have information that's fed into a DNN and it gets transformed
into some sort of higher level representation during the training process, it does not obscure that information. So if you have, so let's say that you have an adversary that has access to a trained language model for some text dataset, like sensitive emails. And then by using a searching algorithm, like even beam search, along with some model predictions,
kind of like how predictive text works, you can then extract the information that fits into a particular format, like credit card numbers. So in this case, the attacker is targeting information in the training data by using some sort of auxiliary search algorithm through the predictions of the model, which means we can't predict,
or we can't protect this by just fuzzing the outputs. You have to bake in protection here. So in this case, you wanna replace the optimization algorithm that you're using in your learning algorithm, because all learning is is just optimization. And so you would want to replace that with a differentially private version.
So in this case, if you use TensorFlow, there is a differentially private version of stochastic gradient descent, which is like the standard method that most people use, or if you use Adam, it's the standard method that most people use to train the models. And so the gist of how that works is that at each step,
it calculates the gradient for a random subset of examples, and then it clips the L2 norm, it computes the average, and then adds some noise, and then it does like the standard thing of stepping in the opposite direction of your freshly computed gradient. And it also has a privacy accountant built in, which is the same as the privacy budget I mentioned.
And that keeps track of the overall privacy cost of training by tracking the number of steps, yes, it tracks the number of steps that you're making as you're training. So if you're following along still, it would make sense that by adding this noise
and by only limiting yourself to a certain number of requests or steps, you're not gonna get like 100% accurate thing. But I mean, at the end of the day, it doesn't actually matter how many, like how many more benchmarks you get on MNIST, because MNIST is terrible, and in infosec, we like to just get something that works. I mean, this works really well.
So, yes, and this is an old slide that we do not need. My bad, okay. All right, so some general observations here. Most attacks are trying to get information that are held in the model,
like be that trying to force predictions by looking at what's already there, or taking advantage of what's already there, or just like you wanna pull that information back out. And then notice that these attacks will still work even if the data is encrypted. And they rely on the preservation of statistical relationships within the data, which is not obfuscated by most cryptographic techniques
outside of differential privacy. And then there's a couple of other methods that people have proposed to protect against like these kind of data exfiltration attacks, like homomorphic encryption kind of makes sense, because then you can encrypt your model and you can still like do all your training
and stuff on that. But unfortunately, you're encrypting it, but you're not changing any of these statistical relationships, which means you can still do data exfiltration, because the whole point of, or the whole way that these attacks work is you're abusing statistics to pull out statistical information. And then people have also proposed using secure multi-party computation. But this is cheaper than homomorphic encryption,
and it is pretty cool, but it's not immune to Byzantine failure. And that's like a terrible thing, because it means if you have somebody in your training group who's all helping you train this model and they just wanna add a backdoor, they can add a backdoor, your model still trains on it. And because of the way that these deep learning models work,
or some of these other models work too, you're gonna end up with a backdoor in your model and then somebody can take advantage of you and steal all your data, and it's terrible. let's see. Right, so the practical takeaway summary slide is that you wanna give users
the bare minimum amount of information. You don't wanna give them confidence values unless you can protect those. And you wanna add some noise to output predictions regardless. And you wanna be able to restrict users from making too many prediction queries. And so I think a lot of companies now do this where you can only make so many requests at a particular time, so they can add some sort of noise to the process.
And then you might also consider using an ensemble of models and then aggregate these predictions too. And so far with differential privacy is the most reliable method for model hardening against data exfiltration in machine learning.