CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Targeted Adversarial Examples for Black Box Audio Systems

Video thumbnail (Frame 0) Video thumbnail (Frame 1018) Video thumbnail (Frame 2291) Video thumbnail (Frame 3976) Video thumbnail (Frame 6055) Video thumbnail (Frame 8418) Video thumbnail (Frame 10882) Video thumbnail (Frame 11962) Video thumbnail (Frame 13875) Video thumbnail (Frame 14997) Video thumbnail (Frame 16884) Video thumbnail (Frame 21706) Video thumbnail (Frame 23862) Video thumbnail (Frame 24717) Video thumbnail (Frame 27307) Video thumbnail (Frame 27970) Video thumbnail (Frame 29058) Video thumbnail (Frame 30979)
Video in TIB AV-Portal: CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Targeted Adversarial Examples for Black Box Audio Systems

Formal Metadata

CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Targeted Adversarial Examples for Black Box Audio Systems
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Rohan Taori and Amog Kamsetty, undergrades at UC Berkeley studying EECS The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity. Rohan Taori(Tweet@rtaori13) is an undergrade at UC Berkeley studying EECS with an interest in machine learning and AI. He heads the educational division at Machine Learning at Berkeley and is also a researcher at BAIR (Berkeley AI Research). Amog Kamsetty is an undergraduate studying EECS at UC Berkeley, with an interest in both machine learning and systems. He is involved with Machine Learning @ Berkeley and is currently pursuing research at UC Berkeley RISE Lab.
Presentation of a group Machine learning Self-organization Bit Black box Student's t-test Office suite
Point (geometry) Enterprise architecture Confidence interval Model theory Gradient Function (mathematics) Parameter (computer programming) Black box Neuroinformatik Medical imaging Latent heat Different (Kate Ryan album) Cuboid output Exception handling Social class Noise (electronics) Addition Enterprise architecture Information Model theory Gradient Parameter (computer programming) Type theory Personal digital assistant Function (mathematics) Order (biology) output Right angle
Greatest element Model theory Gradient Multiplication sign Source code Set (mathematics) Function (mathematics) Parameter (computer programming) Order of magnitude Sign (mathematics) Cuboid Series (mathematics) Data conversion Physical system Pattern recognition Mapping Gradient Perturbation theory Recurrence relation Wave Sample (statistics) System programming output Right angle Quicksort Physical system Spacetime Point (geometry) Sine Black box Raw image format Backpropagation-Algorithmus Frequency Googol Alphabet (computer science) String (computer science) Form (programming) Domain name Distribution (mathematics) Graph (mathematics) Model theory Audio file format Cartesian coordinate system Limit (category theory) Convolution Computer file Sign (mathematics) Personal digital assistant Speech synthesis
Greatest element Algorithm Length Translation (relic) Set (mathematics) Insertion loss Parameter (computer programming) Black box Coma Berenices Mereology Neuroinformatik Insertion loss Different (Kate Ryan album) Googol Single-precision floating-point format Cuboid Diagram Social class Physical system Covering space Area Pairwise comparison Noise (electronics) Algorithm Electric generator Model theory Projective plane Maxima and minima Perturbation theory Sequence Word Personal digital assistant Statement (computer science) output Social class Natural language
Enterprise architecture Beta function Open source Code Model theory Gradient Translation (relic) Similarity (geometry) Perturbation theory Function (mathematics) Black box Parameter (computer programming) 2 (number) Prime ideal Latent heat Cross-correlation Bit rate Alphabet (computer science) Randomized algorithm Diagram output Implementation Physical system Alpha (investment) Speech synthesis Enterprise architecture Distribution (mathematics) Constraint (mathematics) Gradient Model theory Sampling (statistics) Bit Audio file format Measurement Personal digital assistant output Speech synthesis
Randomization Multiplication sign 1 (number) Insertion loss Different (Kate Ryan album) Electronic meeting system Chromosomal crossover Cuboid Covering space Algorithm Electric generator Sampling (statistics) Fitness function Maxima and minima Sequence Process (computing) Sample (statistics) Order (biology) output Chromosomal crossover Bounded variation Point (geometry) Slide rule Digital filter Functional (mathematics) Momentum Algorithm Virtual machine Theory Wave packet Attribute grammar Frequency Iteration Selectivity (electronic) Sampling (music) Codierung <Programmierung> output Tunis Mathematical optimization Form (programming) Scale (map) Noise (electronics) Inheritance (object-oriented programming) Model theory Similarity (geometry) Visualization (computer graphics) Personal digital assistant Function (mathematics) Iteration Natural language Momentum
Point (geometry) Algorithm Similarity (geometry) Perturbation theory Price index Insertion loss Average Distance Neuroinformatik 2 (number) Derivation (linguistics) Estimator Cross-correlation Different (Kate Ryan album) Average Spacetime Randomized algorithm Area Noise (electronics) Algorithm Electric generator Weight Gradient Sampling (statistics) Price index Perturbation theory Limit (category theory) Sequence Similarity (geometry) Performance appraisal Subject indexing Word Sample (statistics) Estimation Personal digital assistant Cross-correlation Order (biology) output Natural language Resultant Spacetime
Point (geometry) Greatest element Histogram Similarity (geometry) Set (mathematics) Insertion loss Black box Distance Perspective (visual) Estimator Bit rate Average Single-precision floating-point format Cuboid Mathematical optimization Physical system Noise (electronics) Distribution (mathematics) Algorithm Graph (mathematics) Electric generator Gradient Word Arithmetic mean output Metric system Resultant
Sample (statistics) Model theory output Range (statistics) Computer file
Implementation Open source Transformation (genetics) Model theory Range (statistics) 1 (number) Similarity (geometry) Raw image format Cross-correlation Physical system Area Model theory Sampling (statistics) Physicalism Range (statistics) Audio file format Perturbation theory Limit (category theory) Computer file Performance appraisal Sample (statistics) Personal digital assistant Order (biology) Statement (computer science) output Speech synthesis Right angle Iteration Metric system
Presentation of a group Electric generator Sample (statistics) Software repository Sampling (statistics) 1 (number) File archiver Code Black box Code
hi everyone thanks for coming to our presentation so our presentations on targeted adversarial examples for black box audio systems and we'd like to thank you Tom for hosting us so a little bit about us my name is Amogh I'm a student at UC Berkeley I'm an officer and machine learning at Berkeley organization and I'm also a researcher at UC Berkeley rice lab hey everyone i'm rohan i'm also student UC berkeley I'm the VP of education and machine learning at Berkeley organization and I'm also researcher at Berkeley I research obear so let's start off with a brief intro of what exactly is an adversarial example so in the most
common sense and I receive an example is something that is given to a model as input such that the model misclassifies its output but to humans it looks at looked like if the model is getting it wrong so for example in this case we have an image of a panda and the model predicts it with you know 58% percent confidence that's a panda that's a correct example that's when the model gets it right now suppose we add some noise to it on the right as you can see it's multiplied by some really small magnitude like zero point zero seven such that when you add this note to the image to humans the humans can't tell the difference between the adversarial image and the normal image so you can see here in the example the Panda looks exactly the same but to the model the model now thinks this image is a given with 99% confidence so this is an adversarial example because the model misclassified what the output was even though the humans it looks like it's still the original image okay so I'm
going more about the definitions of adversarial examples there's two types um there's untargeted attacks and there's targeted attacks on target attacks means you provide input to the model to trick the model in order to make it miss classify the input as anything else while a target attack we provide input to the model such that we want the model to classify the input as a predetermined target so targeted at target attacks are more difficult and on targeted attacks because we were trying to trick the model into classifying it as something that was determined originally and to some specific class in addition there's also white box attacks and black box attacks so a white box in a white box attack you have complete knowledge over all the internals of the model so we know its parameters in its architecture and this allows for gradient computation all right a black box attack we have no knowledge of the model or the any of us parameters except for the output logits of the model this means that a white box of attacks is you have more information we can craft in a better a better and a more efficient attack a black box attack this becomes more difficult since we don't know any of the internals of the ML model cool yeah so now many of you
might be asking why does this matter so black box attacks can actually be of particular interest in ASR systems a star being automatic speech recognition so if you look at a typical deep bottle for an ASR system this is kind of how it works at the bottom we have the raw audio wave file or the source typically all the models use some sort of feature extraction so typically what is done is an M FCC conversion which is basically takes the audio file and converts it using a Fourier transform into the frequency domain so you can see in the that a colored graph over there or the colored map that's basically frequency on the y-axis and time on the x-axis now this graph is what's passed into the model the model uses some sort of series of convolutional and recurrent layers to get a distribution of the output alphabet and this alphabet is finally decoded into the final final translated string so this is the typical a typical workflow of how the model works so looking at this model if we want to create an adversarial audio file what we want to do is change the input audio file such that we can trick the model into translating what we want at the final at the final decoder step and if we can do this with a black box approach we can apply this for two proprietary systems such as Google or IBM api's for which we don't know the model architectures the parameters but we if we use a black box approach we can still craft advocacy examples to trick these systems and fool these systems so some
classic there's some few classic adversarial examples so one is the first method proposed which is basically to take the grading iteratively so this is a white box targeted approach in which basically you take the gradient of the model output with respect to the input and then you apply that grading to the input and you keep repeating again and again and again as your adversarial example gets better fast gradient sign method which is probably one of the most popular forms of attacks is a very simple method basically you take the grading of the input with respect to with respect to your model parameters and you just take the sine of that gradient so for example if the greedy is a negative at a particular place you just make a negative one if it's positive we get +1 and you apply some very small perturbation so the example before you saw we apply the magnitude of zero point zero zero seven you plot something like that and there's just one gradient step and you have an advertorial example and so this is we're taking white box untargeted adversarial examples and finally there was a seminal paper by Houdini that explored advocacy of examples within the realm space of audio so what he was able to do was create white box and black box adversarial examples for audio systems however a key limitation that he noticed was that he wasn't able to back propagate through the MSC C conversion layer what this meant was that he had adversarial spectrograms but not necessarily adversarial raw audio files so if you think of this in a real-world setting right we want raw audio files to be played to be played or given to an API they won't accept spectrograms so in this case he was severely limited by not being able to create raw adversarial audio files but only ending up with the spectrograms ok so now let's go over some prior work
specifically in the audio and audio realm there's two key papers in this area one by UCLA which is a black box genetic algorithm on single word classes so this is a black box attack in which they didn't in which they crafted adversarial examples without knowledge of any of the model parameters and they do this via a genetic algorithm which we'll cover later but the thing about this paper is that it's only on single word classes which means they have a predefined set of classes of just single words and what what this means they can use a soft max loss to try to trick the model into misclassifying a single word the other paper is by Carlene Ian Waggoner also from UC Berkeley and this is a white box attack so they had access to the model parameters and they were they were able to do gradient computation however this wasn't on single word classes they were actually able to generate phrases and sentences of adversarial audio examples so the way they do this is via something called a CTC loss which you can see at the bottom a CTC loss loss for comparison with arbitrary length translations what this means is essentially that it's a way to score how well the original audio translates to a final like sequence in this case which is our target so it basically takes in like the final sequence and the final logits of the model and gives you some probability that the model will predict our final sequence our project aims to combine parts of these two papers by doing a black box genetic algorithm approach except trying to having that having the targets be phrases and sentences and we do this via CTC loss um so here's a
quick quick diagram of our problem statement basically how this works is that we'll have some benign input which is like something like the without the data set the article is useless so this is a raw audio that's the benign input we want to be able to figure out a slight perturbation to the raw audio via some adversarial noise such that when we combine the benign input and the adversarial noise that the ASR systems translate the audio as something malicious in this case which is ok Google browse to evil com so as you can see in a real world setting if people were able to figure this out it's gonna actually potentially be harmful for people because the humans won't recognize the difference in the audio but the ASR systems will translate it as something incorrect so to explain it a
little a little bit more formally what we're trying to do is a black box targeted attack for audio systems so this means given a target T this is your output translation where you want the model to guess so in our case maybe okay we'll browse evil calm and there'd be nine input X we want the model amp we and the model M we want to perturb X with a very small slight small Delta to create X prime such that when the model classifies M classifies X prime M of X prime that equals T or the given target T we also want to do this with the constraint that we want to maximize the cross correlation between X and X prime so in the audio realm cross correlation is a measure of the similarity between your two audio files so if we try and create an adversarial example that matches the target while maximizing the correlation between the original beta 9 input in the adversarial input that's basically saying we want to create a translation such that the model predicts it's wrong but it's still as similar to the original as possible and remember since this is a black box attack we only have access to the logits or the out the output distribution over the alphabet of M we don't have any access to the gradients or the model parameters so the model we
use this deep speech so this is the model we're targeting it's an architecture created by Baidu and published in a research paper and then it was implemented in a tensorflow by Mozilla and this is available on github as an open source open source code so we use this attack and we attacked it this specific model on D speech and the data that we use is the common voice data set so this consists of voice samples ranging from three to seven seconds and it's sampled at a rate of 16 kilohertz so you can see in the bottom right that's a diagram of the deep speech model similar to other ASR systems it accepts input spectrogram then it passes it through some convolutional layers in a stacked bi-directional LST M to finally get the Alpha output distribution over the alphabet okay so
so this is our final algorithm which is a guided selection which is a genetic algorithm approach so basically how genetic algorithm works is that it's rooted in evolutionary theory and we start off with the benign input and when we first generate what's considered a population by adding random noise so in our case we generated a population of size hundred by adding just random noise to the original input then and it's an iterative process on each iteration we score every sample in the population pick the best ones and then use those use those high-scoring samples to create a new population in the next what's called generation and over time as you can see will eventually keep getting better and better samples to the point where we can actually fool the model so in our case on each iteration we select the best ten samples using a scoring function and our scoring function is ctc loss which is what we described earlier don't need perform what's called crossover and a momentum mutation which we'll go over in the next slide and finally we apply a high-pass filter to the height added noise on the reason for this is a heuristic based approach humans recognize low frequencies but more than high frequencies so we apply a high pass filter we we'll be able to basically um trace till trick the model but make it less recognizable for a human year so this is
a visual demonstration of what's going on in an algorithm so initially we start off with a population of about 100 audio samples this is generated by adding random noise then we'll evaluate each with the ctc loss and we'll calculate a fitness score out of this we'll pick this elite population which is basically the top ten of this sample and then in order to get the next generation will perform something called crossover mutation so in evolutionary theory basically the way you get from one generation to the next is you choose two parents from this elite sample and then you perform a crossover so you can see we chose the two two samples the red and the gold and perform cost crossover so will randomly choose either the red or the gold attribute at every index and at that point we can kind of combine them together to get this like layered approach of red and gold and so this is basically saying like or we have our we have our lead population we'll choose a couple as parents and from them we'll create a child and that'll generate the red and gold sequence finally the last step is that we want to add some variation to each generation so we'll do that using a mutation so this is a randomly applied mutation and we have a special form of adding this machine mutation called momentum mutation which we'll cover in the next slide but the basic idea is we'll start with this population will choose the best will use these parents to create these children and then it'll finally add some cleverly added noise in order to create some random and create some randomness in the samples so you'll see that you'll see that over time the best traits for our samples will carry out from generation to generation while the traits which are not affecting our score will eventually die out okay so now we go over a momentum mutation which is basically our unique way of adding mutation to our samples so tune or in a normal and a standard genetic algorithm on the probability of having a mutation is a static from generation generation what we trying to do is change the probability of mutation and make it dynamic and have that be a function of the difference in scores across the iterations of the generations and we do what we do is if there's little increase in score between the generations we increase the momentum by increasing the probability of a mutation and if there's a higher score higher difference of scores between the generations we decrease the probability of mutation and the reason for this is that we see very little increase in score across generations which is basically basically what we need to do is give their algorithm little kick and we do that by increasing the randomness in our algorithm and doing so will allow us to basically move out of like a what a local minimum and globally optimize so if we're stuck at like a local minimum we provide more increased probability of mutation increase in randomness and the goal is to eventually go out of that little like escape out of that little hump and be able to continue our optimization so what this does is encourages the decoding to build up to the target after making the input similar to silence so you can see in the blue box on the right our decoding while training first start off with the original benign input and slowly it it get um two pairs down the letters till we get to near silence which is the e now and through this momentum mutation we can by increasing the probability of the mutation we can add more letters and our algorithms able to finally get our target of hello world so this genetic
algorithm approach works best when a search space is large so as you may have heard we just randomly add mutations and hope that we get something good however when the adversarial example is near the target we only need a few key perturbations you know to correctly classifier we only like only a few key perturbations are necessary so in this case genetic algorithms are not the best approach because they'll randomly add noise to a bunch of different places that we don't necessarily want noise you only want noise and a few specific places so at this step we apply a gradient estimation so the weight gradient estimation works is we basically suppose our sampling area is 16,000 kilohertz like we said that means for one second audio clip we have 16,000 points of evaluation for grady and estimation for each point we randomly perturb it either up or down and then you pass through the model and see what the difference in scores is finally we combine all of these to get our grading estimator you can think of this as a limit derivative kind of basically we were perturbing each index up and down and seeing how much that changes the final loss and using that as an estimate of the gradient now you might have heard me say for one second audio clip there's sixteen thousand possible indices so of course this is going to be a lot of computation for just one gradient step so as a heuristic in order to kind of reduce the computation cost we randomly choose a hundred best indices and apply the gradient estimation of those points and keep iterating again and again okay so
let's go over some results so we tested on the first hundred samples of the common voice data set and our target phrases are randomly generated to target words so our targeted attack similarities eighty nine point two five percent so this is a similarity between our target and the final translated sequence after three thousand generations and we calculated this using a Levenstein distance and our average similarity score is ninety four point six percent so this is similarity between the original benign input and our perturbed adversarial input and this is computed via cross correlation has described earlier so let's put these
results into perspective by comparing to some where the baselines so as a mugged mentioned earlier we have two basins were explicitly comparing to the Carlini magner paper that describes white box attacks on CTC loss and they you see a paper describing black box attacks on a single word softmax loss so as you can see our attack success rate is 35% for a for basically 11 signed distance of zero like exactly correctly classifies it as the targeted input for the grading estimation for for the grading based approach it was a hundred percent while for this soft black sauce it's 87 percent so this may seem low but thinking about it in the black box perspective of how complex it is to kind of optimize over longer phrases rather than just one softmax loss we can see that our results are pretty competitive and the average similarity score we kept at ninety four point six percent for the gradient based approach is 99.9% and for the black box Porto is 89% and as you can see here on the graph on the bottom we have a histogram of the eleven shining distances after our attack of over three thousand generations so you can see basically the distribution is very near zero we have a large percentage that are at zero meaning they're exactly correctly classified and more with just a couple couple or a few points off to the Levantine distance what this means is that there's an inherent trade-off between how much we run the algorithm and the attacker is success rate so as you can see if we run the album for longer our average similarity score will drop as we add more random noise any asthma add more mutations however we'd be able to get a higher attack success rate and bump that score up so in a real-world setting if an adversary is trying to crack a system he could employ these he could employ these metrics he could toggle between the to see which studying is better for him and adjust the rate accordingly okay so now let's try listen
to some examples so we have here the
original original input file which is which decodes to and you know it let's see if I'm gonna play this okay
okay well yeah we're having some difficulties of playing this but what you'll hear is that this is the original input file and gets decoder this and you know it the adversarial target is our perturbed audio which which to the human sound is very similar to the original it sounds like and you know it but I get to decode it as hello world and as you mentioned before audio similarities 94.6% I'm using cross correlation metric and you can see on the bottom right a spectrogram of our of the original audio input and our perturbed example so perturbed example is blue the original is orange and you can see it's very slight perturbations most of it is very similar to the original benign input so
there's a bunch of future work to be done in this area one of the most important ones is attacking a broad range of models so as you mentioned we only attack the deep speech model by Mozilla since this was the only open source ASR deep system implementation that we could find however it'll be interesting to see if this is if this attack is transferable across models not only in transformer the sense of can we optimize your deep speech and then deploy it on IBM or Google api's but also can we optimize on those api's as well second is over-the-air attacks so in our problem statement you might have seen that we put you know kind of the adversary audio in the same physical space as the original audio to fool the Google home however in this case we're passing directly the raw audio or like soy sauce digitally to the model so we haven't seen over-the-air attacks at this plate over the year how robust they would be to these different perturbations and finally increasing the sample efficiency to target so clearly we trained for three thousand iterations in order to get a successful audio decoding and for US population s of 100 that means three hundred thousand evaluations just for one audio file so for example Google and IBM API which can be quite expensive this cost can be prohibitive so increasing the sample efficiency of this attack can allow us to see with the kind of physical real-world limitations what can we achieve okay oh yeah thank you
for listening to our presentation um if you'd like to read our papers available as a preprint on archive and the coding samples are also available on github under black box audio repo one's account so yeah we can take any questions if anyone has an you yeah and then once we apply that we kind of use those popular those samples as the next generation algorithm any more questions okay thank you Thanks [Applause]