CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - How to leverage the open-source information to make an effective adversarial attack/defense against deep learning model

Video thumbnail (Frame 0) Video thumbnail (Frame 2090) Video thumbnail (Frame 5970) Video thumbnail (Frame 7773) Video thumbnail (Frame 9229) Video thumbnail (Frame 13568) Video thumbnail (Frame 14477) Video thumbnail (Frame 17950) Video thumbnail (Frame 25369) Video thumbnail (Frame 27616) Video thumbnail (Frame 30697) Video thumbnail (Frame 32888) Video thumbnail (Frame 34214) Video thumbnail (Frame 35745) Video thumbnail (Frame 40490) Video thumbnail (Frame 41778)
Video in TIB AV-Portal: CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - How to leverage the open-source information to make an effective adversarial attack/defense against deep learning model

Formal Metadata

Title
CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - How to leverage the open-source information to make an effective adversarial attack/defense against deep learning model
Alternative Title
How to Leverage the Open Source Information
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Adversarial attack/defense against machine learning models in digital world. NorthWestSec, a team of independent security researchers concentrating on AI and security topic. They have demonstrated the Hacking Google reCaptcha using Deep learning technology at GeekPwn 2017 in Silicon Valley.
Server (computing) Open source Information Multiplication sign Direction (geometry) Image resolution Interactive television Sound effect Independence (probability theory) Disk read-and-write head Neuroinformatik Mathematical model Information engineering CNN Right angle Information Endliche Modelltheorie Information security
Ocean current Computer font Presentation of a group Open source Divisor Information Code Multiplication sign Computer-generated imagery Bit rate Heat transfer Mereology Limit (category theory) Event horizon Performance appraisal Medical imaging Bit rate Strategy game Order (biology) Social class Perimeter Social class
Server (computing) Manufacturing execution system Clique-width Open source Direction (geometry) Virtual machine Mathematical model Wave packet Revision control Medical imaging Benchmark Different (Kate Ryan album) Software testing Endliche Modelltheorie Best, worst and average case Exception handling Computer architecture Physical system Social class Distribution (mathematics) Weight Flow separation Benchmark Performance appraisal Type theory Personal digital assistant Uniformer Raum Internet service provider Software testing Right angle Library (computing)
Random number Computer file Heat transfer Mathematical model Field (computer science) 2 (number) Medical imaging Bit rate Different (Kate Ryan album) Average Authorization Cuboid Boundary value problem Circle Endliche Modelltheorie Graphics tablet Noise (electronics) Information management Gradient Mathematical analysis Measurement Flow separation Performance appraisal Type theory Computer configuration Moving average Musical ensemble Table (information) Row (database) Library (computing)
Polar coordinate system Group action Differential (mechanical device) Multiplication sign Direction (geometry) Set (mathematics) Mereology Mathematical model Medical imaging Inference Bit rate Strategy game Different (Kate Ryan album) Electronic meeting system Diagram Endliche Modelltheorie Position operator Physical system Graphics tablet Smoothing Gradient Sound effect Menu (computing) Maxima and minima Perturbation theory Measurement Flow separation Cognition Type theory Message passing Auditory masking output Pattern language Resultant Reading (process) Row (database) Reverse engineering Server (computing) Twin prime Perfect group Open source Image processing Branch (computer science) Heat transfer Graph coloring Polarization (waves) Product (business) Number Wave packet Causality Natural number Module (mathematics) Noise (electronics) Information Chemical equation Limit (category theory) Approximation Performance appraisal Integrated development environment Software Calculation Factory (trading post) Strategy game Transmissionskoeffizient
Graphics tablet Module (mathematics) Noise (electronics) Random number Server (computing) Randomization Random number generation Multiplication sign Computer-generated imagery Fitness function Bit Rectangle Field (computer science) Performance appraisal Medical imaging Fiber bundle Computer-assisted translation Social class
Filter <Stochastik> Noise (electronics) Greatest element Functional (mathematics) Presentation of a group Open source Computer file Code Multiplication sign Image processing Line code Sound effect Line (geometry) Code Frequency Medical imaging Process (computing) Right angle Error message Library (computing)
Module (mathematics) Software engineering Noise (electronics) Algorithmic information theory Multiplication sign Sound effect Heat transfer Graph coloring System call Bit rate Reduction of order Energy level Pattern language Physical system Asynchronous Transfer Mode
Digital filter Server (computing) Random number generation Algorithm View (database) Multiplication sign Gene cluster Function (mathematics) Element (mathematics) Revision control Medical imaging Arithmetic mean Different (Kate Ryan album) Hypermedia Software testing Social class Task (computing) Module (mathematics) Algorithm Matching (graph theory) Key (cryptography) Mapping Artificial neural network Weight Commutator Electronic mailing list Flow separation Arithmetic mean Ring (mathematics) Logic Order (biology) Point cloud Musical ensemble Table (information)
Polar coordinate system Point (geometry) Web page Server (computing) Pixel Divisor Direction (geometry) Patch (Unix) Multiplication sign Real number Virtual machine Image processing Set (mathematics) Heat transfer Parameter (computer programming) Fast Fourier transform Wave packet Number Medical imaging Goodness of fit Bit rate Different (Kate Ryan album) Endliche Modelltheorie Social class Module (mathematics) Domain name Noise (electronics) Algorithm Simulation Information Artificial neural network Optimization problem Sound effect Plastikkarte Limit (category theory) Frame problem Arithmetic mean Hash function Friction Angle Commodore VIC-20 Password Pattern language Right angle Quicksort Surjective function
good afternoon my name is charging Joe today I will talk about how to leverage open source information to make effective other server attack and defense against deep learning model before I start I'd like to introduce who
we are we are Northwest security a team of independent security researchers and data engineers so right now we have five members from left to right Chen Lee Jung won the Young JA housing and the way so we have a diverse backgrounds but we do have a United interested which is to apply AI techniques to solve security issues so we we have been fascinated by those ideas coming out of interaction between AI and security so we have been actively looking for the opportunities to learn more along that direction and we also where the team got invited for the CD CTF which is a first city of competition our head of our servers attacks and defenses so as a new team at attending the CDF for the first time we learn a lot from beginning of preparation to the end of computation so today this talk and a meaning of sorry
Kamini fox are sharing our learning from
this event so this presentation gonna only include two parts in the first part I gonne I'm gonna talk about how to evaluate open source attacks and defenses then in the second part we done introduced our desire with the defense for the CTF so because due to the
limitation of the time and resources we didn't try to develop our own methods for the attacks and defense so basically our strategies just try to utilize our leverage the public available information open source of code so in order to select those method we have to criteria so we define our culture based on the requirement of the CTF so there are three three factors we are considering speed transfer ability and strength for defense because CTF hada has a hard requirement how fast the immediate classifier has a classifier has to run so we pull that one as a top one priority so the requirement is to second the perimeter per image and and also because we have to we only can submit one defense so transferability is also very important so we even sacrifice our strengths and - to enhance our transfer ability and on the other side for the attack because there was no limitation on what kind of attack we can use or how many how many times we can attack so basically we only consider the strengths of the methods but actually this strategy turnouts is not that kind of person correct so because we just finish our CTF is actually the speed for attack it's also important because we implemented some strong attacks but it's take away long time so we don't have the opportunity to try it so basically every time we each wrongly only maybe just try maybe like one of two times that's it so next time we'll be working in full of that so the we use a targeted rate to evaluate the strengths of the method which is a percentage of the images being with this current classifier as target classes
so keep those criteria in mind we start up our evaluation so basically the version starts from building a baseline basically so we're doing we do a benchmark of the base defense other types so to do that we use two open source libraries for for attack we use a clever hands which is a Python library which are targeted which actually build to benchmark the machine learning system against the other were several tanks so it's PC is a collection of different other server attack methods and for defense when you actually Google provided those pre-trained weights actually even the even providers adversary train models so what that mean is those model is trained by both original image and also some other several images so there actually there are two popular architectures whines our inception we three and another is an exception that's not way to so you
notice to do the evaluation we have to use some test data set so basically the testers that we are using is provided by C 80 which include salt and freshly mages classified in two thousand classes with image and add labels if you look has a right hand side that distribution toggle class versus true label you can see the width of the distribution along along the x direction is very consistent so that means those targeted class is random uniform even uniform is selected from zero to thousand so which means this is a average case of scenario so what that mean is if we use this nozzle to evaluate our method that's gonna give us average as to me of this method it's not the best case and another worst case is just average
so by using those to our libraries we we start our evaluation so this table shows a table rate of four types against three defenses so all four types use is that the same gradient based method which is a basic intuitive manner di M the difference is the first three the difference is the different model has been used to calculate the gradient so business attack is calculated based on the gradient but those those three attacks use different models actually they are pre-spawn into those three defenses so this is the model in 73 and also the anniversary Phoenix anniversary an example anniversary screening in several resident v2 has been used to capture the greeting so basically we can say for each column for each defense we are doing a white ball white box attack so we can say the white ball starts very expected actually for example first one p.m. based on inception model we can have a like 89 percent hit target rate but even even for the second one and the third one the second row and a serve roll those two those two sorry the second column and circle and those two defenses those two defense helping adversary trained even even even even for that model is still vulnerable to the wild wolves attack we can still however high like 75% and 85% he tolerates so but we can also say even though the viper eyeballs attack is very strong but the transferability is very bad so basically it only works for that model for some if you don't have a first role even only works for in 73 but for another two defensive hit Tiberius is zero so it's totally it doesn't work but we can improve that by doing some tumbling so basically the last attack ensemble's three different models together so the basic idea is we calculate three different other images based on three models then we just calculate the average and I use that average image as our final attack so for by doing that you can see we can keep we can still keep a high high target rate but also include a trans morbidity so busy it works for all all three defects so that's just a because no I forgot to mention it so because those two labels are probably available so basically we which hit those when we analysis those performance wages field right those like a lower boundary so they say if we pick attack or defense it has to it must be better than those measures so now then
which file we also try some strong defense like KDT noise actually the author is sitting in the audience and also the random padding so both of them actually is a first and a second place in 2017 nips at of Sarah defense competition so you can say those two defense out very strong so basically for those four attacks they just hit higher reaches zero so it's just totally not working but for us our strategy for
attacking some actions basically just make sure we have a corresponding attack for each single defense we can fund so because there's no limitation on humming time we can try and so basically we build a set of a broad set of attacks for this city of competition so this diagram is a polar plot which is the polar part of github rate so basically even colors stand for different attacks and six different defense is located on six different angular position so from this this is our like a needle-like branch so this needle a branch pattern as you can tell us two kinds of for you for once of the branch which is the value of the Hatake which so which actually is the strength of the attack and the number of branch which actually tell us how good is the transferability of this magic master is because of the more branch we have which means this single master can can attack more defenses so basically through all our evaluation we'll use this this product to evaluate all the methods just trying to get a balance between the transitivity and the defense so those remains the William we implemented it so for clarity now so it kind of we come back because they are they open source everything so kind of way we just implement a white white bars attack so actually is a very deep and effective we can get like 98% hit target rate and for another two kinds of defense like a random had and they say these a randomly adding some patties onto the input image because they try to utilize the transfer environment feature of the seer and based image classifier so basically for original image if you add add a random stuff on it it will affect the classification results but we if we input a fake image of a server image when you add some random stuff on to it it's not a strongly degraded the inference of that attack so that's idea of the random padding so we also implement some adopt some measure to implant a targeted attack for that so
now I would talk about some strategies we apply the in both defense attack so basically for the first row for defense is called it's very popular one is called example adversarial training so basically even we not only in just train the model ways both the original image and a server image we also try to generate those at our server image from from from several different method so we by doing that by doing this example and also training you can you can strongly you can you factory you prove the transferability of your defense but on the other hand for the attack you can also apply example at our server attacking which also can improve the transmitter of your attack method so basically same idea you can you can apply for both defense attack another idea is for defense is a greeting masking so basically right now for those those popular attacks most of the popular types is gradient based so basically which means you have to get to the height you have to somehow you have to get agree diem of that target models and based on that reading you can cap you can generate your text but this greedy mask is just trying to hide those gradient info so you they can't they can't just make gradient unsmooth so which means is kinda like stepwise oh you can they can just make the gradient vanish totally become very small or very large is exploding so basically when you gather soon for us it's junk I gather your cognition to around direction basically by twins either they can achieve the defensive purpose but still under attack set researchers still also proposed some corresponding attack attacking strategies basically if you make a step stepwise the gradient PC you generate a lot of a local local minimum so then that when our tag try to escape from those local minimum they just because there was one reason to it you can just add a random perturbation each time come to that capitation so basically when you when you start being to a some some local local spot and it just randomly jump jump it out that's just idea that's the idea of a random prohibition and also somewhere somewhat higher they can they use a green team smoothing so they say when you calculate that when they gather greedy info they are prior cause and cause and smoothing and smooth that grading out so I ever seen it's perfect for the attack again a third new method is called a backward pass differentiable approximation so PC is approximate so basically these messages we cannot get the real great real gradient from that network then we can then somehow we can approximately calculate one so basically that that's their idea so we don't we don't use a target to calculate gradient which is somehow come out of idea get a close one then use that one as our gradient then achieve the attack and the third strategy for defense so basically the gradient is a reverse air attack which is ours is because of a story of noise so it is a noise so we're very nature solution for the noise is just filtering just get rid of noise so that's the basic idea of the third one which is which is use some image processing method just just filter those noise out but still from a tax ID people still figure out how to attack this kind of defense so basically what you are doing in the defense style I just doing the same thing on the tax side when I calculate to that half which is improve those filtering or anything inside it so when we know in our calculator that is if I'm taking to account everything so that's the idea so actually we use this you use this measure to achieve the attack against a random heading so basically we only calculate the attacking at our server image we also added a random heading into a iterative calculation then the last one I think that's the only one there was no effective attack message a method so which is a detection only method so basically is not enhanced the robust robustness of the system is just add on another module to attack if there was a whatever server attack happening or not if it's happening we just gave our alerts or we just just returns some random stuff so that that's the idea of a detection so that's all about the
evaluation so based on the so we - we do a very extensive evaluation of a lot of attacks and defenses then based on those learning we came out this design of our defense so basically this defense include two modules the first one the left-hand side that big big rectangle which actually is our image classifier but this image class I included truly image classified so basically with Harold align those two images classified together then we put a different theater at the end so basically what is happening is when there is a server image coming it's gonna get classified by both of those two classifier why is the silly to class random padding another is a KDT noise so then there are two labels of cool bundles to calcify them ha from a cat fit into that different theater so basically we can compare those two agree with each other than we think that's the tag field basically we we gathered a true label but if they're like we don't agree with either or contradict with each other then we just simply return a zero or random number so that's the idea of the first module so I want to spend a little
bit of time on that CV to theater so so
basically a civet who is the name of the opencv package in Python so basically this is also an image processing idea because when we are doing we read a lot of codes they are doing a lot the doing the humanly process intuitive a lot of it evening ways but then we think why which is why don't we just use the professional packages just doing the image processing so open open CV is the very famous open source library they are just doing the images set image processing so they provide a lot of sophisticated functions so basically you and also its implant advice by C so it's very fast and very easy to use basically you can just use the function just add one line in just as similar as one line code just added a filter the ending existing defenses so right now what we are using is a code of pilots for filtering so which is a edge preserving not reducing smoothing filter so basically what they are doing is this actually those for image out it shows a effect so the left left up corner the first one is an original adverse error image then if you go through those theater three times at last one the bottom right that one you can say that I'm those founding firefighter on the background those actually is the anniversary party presentation try to try to confuse the classify so basically by going through all those periods filters those noise smooths out but at same time the edge is still preserved and because we know for those image across file they kind of like they they try to learn those edges so if you can maintain those edges which means this matter won't affect your class efficient accuracy so to approve that we apply
this saving to filter down to the Qaeda data noise defense and use it against three different attacks so this time those color pattern stands for it effects and the left our corner that one is what we implement what about attacks against this KDT nose method so you can see the yellow one which is the one we use say between intruder this guided innos method it kind of factly reduce hate ugly rate from 96% down to like 50 40 50 45 percent so our mess our method is the Civic true fear trace is a change effective method and also the reason why we put another two defenses and another two attacks on the other side it just try to save this city to fear to come a deep where your transferability but it's turnout is not is not it's a talent to that so we think we believe the Civic to theatre is a very effective way to at least enhance your defense because it's very fast and is also very flexible you can you can apply to any existing system so that's about the
first module so the first mode the success beautifies defense using the first module rely on our assumption which is there's no attack which is strong enough to fool or confuse both of them but that's not true actually even we know that's not true I went to a when we do that has tea there are some attacks which can fool both of the city to faster and in handy and also the guided knowledge so basically both of them come out further the exact same level but the label is wrong so which means the first mode you can fail if that happens so that's that's where the second mode you come coming to play which is a teeming theater so the skimming filter is based on virtually any method call came in clustering so the Assumption here is the
screaming clustering algorithm is robust against neural network based at our Sara attack view so in order to verify that we conduct a very little a little experiment so this is an experiment so basically what I'm doing is there was a thousand original image then we generate another thousand other server images then then we put the post those two solid image into the skimming came in classroom so this commute has become a clock on a put kind of automatic classify those image into different classes then we compare each orange image classes against that other were several version then we found that actually a 94% of them matched so which means those are several tag doesn't affect the accuracy of the killing classroom so we can trust this method [Music] then based on that method I'm going to
talk about how we implement is skimming filter so basically the first one the top one is is describe how we generate a mapping because this all the all these competitions based on the image net data set so image net has like solvent labels but when we do the key mean class ring it's impossible for us to do like southern solvent cloud solvent clusters so basically our key means trainer to classify emitting into 10 10 classes so how can you map a thousand thousand thousand different labels into a ten different classes so that's that's how that's how how we do it basically we after we train our hymning clustering cluster we use another test dataset so that has a gonna go inside into the Cayman class ring we know those true label those imaginary label those tasks are set then after we classified those image with keeping cluster we also know the keeping classes so that's come a generate a map map just basically the table so we know which which class cannot responded to which image netlabels so after that mapping label is ready we start to our logic so basically it's very simple so when a server image coming first it's gonna go through the first module I just discussed it's gonna generate a label by which which is an image in AD label then that image inaudible cannot convert it to a kimi label by using that mapping table which is generated so which explicitly is just a turn animates list because we the key mean we only have ten caming classes so this those 10 M minutes actually is all is ordered by probability so so the for example the first list which means we think this label is most likely in class 9 but it is likely in class 4 then at same time this attacking media also cannot really fit into the the Kimon cluster so Kim has a comic generate another list so it's also a 10 elements list but for this list the Kimmy think the Kimmy thinks most likely this image is right is seeing is a in class 2 this likely in class 9 so this truly psyche that they contradict to each other they don't agree so that's when we think of this fear to think this attack actually succeed because they are contradicted so then they can just output a 0 or random number owning only when those two least match or some apart some checking checking logic we said it's kind of let the original label comes through so that's the idea of the Kimmy filter so
that's all about the design of our final submission for the defense so this is a just a quick just a same polar plot showing the high target rate of our defense against some strong attacks so you can see for all six attacks hatake rate is close to zero so we believe our defense is effective and also has had a good transferability but as turn out is not we still lost some some points so I think I think for the Kingford caming module there's still a lot to improve space so that might be a direction we will try to improve so in
summary I talked about how we select a set of attacks again aiming different defense and also I talked about the defense comprised of image processing classify different filter and Akemi filter with that gonna finish my top 9 questions yes I mean is that a random or not yes yes so we are we are yes that's that's a mere mainly way to achieve without Ted give me the tax matters yes actually um so so that actually is a mathematically exact Jesus just solving an optimization problem so it's totally depends on how do you how do you define your problem so there are somewhat a gulika just owning a just one pixel it's still it's still achieve their purpose to change the classification and also they have some summary sort of propose some at our server patch so basically it's just a pattern of hash and you pull that patch onto anything the image class effect on a miss cards for that image so basically you can't yes by defined by define your problem you can control where where are you who those noise even though they are noise but you can still control where where you apply it to it yes yes yes we say okay is it literally the same way okay so your question is just asking for the sim sim matter can be applied to different other AI occupations besides I think fundamentally they are same so it's still another kind of optimizing optimization problem you just try to solve that optimization problem but impractical when you're doing attack against some other stuff even though it's not that far away like I'm doing the same attack against an image password but in in real in the real world they're more there a lot of factor has just has to be taking into account so for example there were some written some research about that so right now what I'm what I'm talking about is totally in the digital domain so basically everything assuring in the computer but if you are going to achieve a physical domain like you just put something on the wall on the like on a washing machine the wash mean economy scarifier has a dryer or something for that you have to take into account like as a light the the wheel angle all those factors has to be taken to face so basically it makes the problem more complicated but for me I think for mentally they're same a low-pass filter for filtration the noise out they do an FFT and then yeah that actually that's a that's a good point so yeah that's why we so there are a lot of different ways to apply the same idea so basically it's basically what what we are trying to do is we try to get rid of the the bad noise which is adversary noise but still keeps evading information for the classifier because you cannot just totally get rid of everything so that's not also to come and mess up with your classifier is on actually even for this silly to theater I didn't show here but if you if the parameter didn't set correctly the accuracy of the classifier come and strongly degrade yeah so basically yeah that actually that's a good point for if we can try to something in the frequent of friction Tommy that actually that's one of our team yeah chillin three number yes yes yes actually that's that's one of our team members he's trying to work out that direction with p-series is not doing the image filtering if we were trying to look as a frame center mean to tell if it's a real image or not so that's that's the that's our ID yeah yes yes that's another way yeah and also because what even though you carefully set up your filtering algorithm it's still gonna affect the classifier accuracy so please say aye I rest on paper basically the researcher is doing is this the user another set of a neural network just try to calculate something called a class activation region so basically when classify book had an image there was an interesting region all those not interested so this is at and that you network just try to figure out which pages more more important kind of affect the final classification for that classified then when we know that region when they apply the imager filter for the attack we don't have that region which is filter everything airs out yeah that's that's yes yes yeah yeah yeah yeah but that's gonna be slow yeah different cultures decrease the accuracy right yes one of the things that you mentioned was carrying at Mysterio versus originally images be kind of baseline meant that you can use right so this does the accuracy improve the more images you get into into your dataset or is there a certain point that it doesn't matter how many images you throw at your data set the accuracy just Peaks so I think that depends so its defense what kind of algorithm you are using so for example if you are so when if if you when you train that classifier if you're taking into account posts like pure original image of some image after filtering so when you train that model you already take into account that factor so then when you do the image filter later a one effect that classified but if you if you try to when you feed more you mean I don't know I'm not sure I fully understand it but you mean if you fit a lot of image I think you're talking about training right yes so for training yeah we can so that's actually that's the one that's the key point of the other server training so basically the reserve trainees mixed of the original image and as other image but we call so to a mist of original image and some image after image filter right so by doing that we treat between that class well that class are going to learn okay so some image is orange maybe but some image is after some filtering well I can I can still identify their same so that's kind of how you how you build your training training data set limited time and limited resources does the accuracy continue to improve the more you you feed or is there a certain point where it just Peaks no matter how much more time you spend I think is that that's limited by the neural network is not limited by the image of filtering so basically I think it's gonna get your pink I don't think you can just keep coming yeah I don't I don't think I don't I don't believe that yeah okay okay [Applause]
Feedback