CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Boosting Adversarial Attacks with Momentum

Video thumbnail (Frame 0) Video thumbnail (Frame 1326) Video thumbnail (Frame 3201) Video thumbnail (Frame 4125) Video thumbnail (Frame 5242) Video thumbnail (Frame 6033) Video thumbnail (Frame 10482) Video thumbnail (Frame 18342) Video thumbnail (Frame 22653)
Video in TIB AV-Portal: CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Boosting Adversarial Attacks with Momentum

Formal Metadata

Title
CAAD VILLAGE - GeekPwn - The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018 - Boosting Adversarial Attacks with Momentum
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial at- tacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions. Tianyu Pang is a first-year Ph.D. student of TSAIL Group in the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Jun Zhu. His research interest includes machine learning, deep learning and their applications in computer vision, especially robustness of deep learning.
Real number Connectivity (graph theory) Artificial neural network Electronic mailing list Commutator Mathematical analysis Computational intelligence Word Machine learning Momentum Representation (politics) Lipschitz-Stetigkeit Linear map
Functional (mathematics) Multiplication Mobile app Momentum Optimization problem Gradient Variance Virtualization Online help Heat transfer Perturbation theory Distance Cartesian coordinate system Mathematical model Mathematical model Web 2.0 Word Voting Term (mathematics) Order (biology) output Momentum Mathematical optimization Mathematical optimization
Pulse (signal processing) Game controller Functional (mathematics) Block (periodic table) Decision theory Direction (geometry) Heat transfer Mortality rate Black box Limit (category theory) Measurement Bit rate Iteration Different (Kate Ryan album) Mixed reality Infinite conjugacy class property Linearization Boundary value problem Numerical analysis Iteration Distortion (mathematics)
Confidence interval Gradient Direction (geometry) Water vapor Mathematical model Order of magnitude Substitute good Bit rate Computer configuration Personal digital assistant Query language Local ring Pairwise comparison Predictability Email Algorithm NP-hard Observational study Touchscreen Constraint (mathematics) Optimization problem Gradient Sampling (statistics) Sound effect Flow separation Measurement Mathematical model Prediction Normal (geometry) output Resultant Functional (mathematics) Momentum Divisor Data recovery Artificial neural network Maxima and minima Heat transfer Black box Mathematical model Field (computer science) Goodness of fit Escape character Programmschleife Iteration Musical ensemble Spacetime Software testing Gradient descent Mathematical optimization Multiplication Mathematical model Direction (geometry) Mortality rate Group action Wave packet Sign (mathematics) Stochastic Statement (computer science) Numerical analysis Iteration Momentum Mathematical optimization Gradient descent
Complex (psychology) Pixel Linear regression Distribution (mathematics) Bit rate Mathematical model Variable (mathematics) Substitute good Different (Kate Ryan album) Local ring Social class Rotation Theory of relativity Structural load Gradient Sampling (statistics) Infinity Parameter (computer programming) Maxima and minima Hand fan Category of being Message passing Pattern language Spacetime Point (geometry) Programming paradigm Momentum Transformation (genetics) Feature space Connectivity (graph theory) Maxima and minima Mathematical analysis Mass Mathematical model Automatic differentiation Architecture Mixture model Blackboard system Contrast (vision) Normal (geometry) Computer architecture Form (programming) Default (computer science) Focus (optics) Distribution (mathematics) Copyright infringement Artificial neural network Theory Mortality rate Line (geometry) Error message Nonlinear system Estimation Function (mathematics) Universe (mathematics) Iteration Mathematical optimization Code Direction (geometry) Parameter (computer programming) Function (mathematics) Mereology Formal language Bit rate Determinant Predictability Area Algorithm Linear regression Logistic distribution Virtualization Perturbation theory Variable (mathematics) Measurement Mathematical model Prediction Mixture model Linearization output Website Normal (geometry) Resultant Row (database) Asynchronous Transfer Mode Artificial neural network Distance Power (physics) 2 (number) Population density Root Iteration Operator (mathematics) Integer Software testing Feature space output Mathematical optimization Linear map Noise (electronics) Mathematical model Mathematical analysis Transformation (genetics) Component-based software engineering Logic
Satellite Distribution (mathematics) Weight Normal distribution Similarity (geometry) Parameter (computer programming) Mortality rate Covariance matrix Distance Power (physics) Hand fan Wave packet Hypercube Wiki Medical imaging Term (mathematics) Personal digital assistant Hypermedia Software testing Right angle Tunis
hello Aaron came home from Ching hai University and here first introduce our teams and here is a list of our team members and okay come on and we get our first introduce some work above our team we get three first places in all the street facts in last year's nibs commutation and if there's a top conference of machine learning we have published three papers about words real examples on top components of machine learning and computer region besides we also have a paper and nibs this year and others three in Incheon we wield scrolls are except here it's a list of our publication and introduced just two words and the first work is posting all
virtual attacks with momentum this is attacking method or web you're saying last year on its accommodation or voter examples maliciously generated examples before general model but they are very similar to the original examples I'll briefly introduce some first attack methods and literary to toast your advantages and disadvantages examples can be sold as a optimization problem and we want to maximize the lose function of our words for example which is truly both subtract should require maximum distance between them and the most famous one holds back you fellow it's the first gradient method and which approach to F GSM and I calculates the Korean of the lose function with respect to the input and apply the son of the gradient to the input variance of app GSM apply the same of the gradient may multiple multiple terms optimization based method directly optimize the distance between virtual examples and our regional examples managed to lose
what other examples have demonstrated help demonstrating transfer policy across models and Dallas for example generated for one model Council for another model the transferability of examples enable black belt attacks which make us more secure angels in our applications also a lot of examples have school have prosthetic transfer belly which is anonymous order perturbation
some limitations of practical practical black post attacks from that existing attack masters cannot attack a black bolts efficiently for AB GSM in a fast and generous example with control for pony weather linearity assumption make no me no holds for large distortion relax efficiency to attacker wipeouts model such a success rate of x-box attack is low and influential measure has no transfer belly because a quantity moves or examples to the credit direction to maximize the most function which may offer to the modest decision boundary
solution of the children's reporting and check apology makes blood pulse attacks less effective we have experiments what a conceptual mystery with different numbers of iteration we measure the success roads or wipeouts in such a mixed race black belts in such a way for which when it's now 154 52 we know that success rates decreases we increasing the number of iterations for black box attacks another way to attack a block
based model is to you know substitute number to characterize the behaviors of the black box model as it matters to require food prediction confidence and tremendous recoveries and it's hard to be deploy for model screen are not skilled at incites and we can know your statements in the competition because we are not - Cory the defense of nations so our solution is to elevate the trade-off between the transferability and attack ability we chose that vanilla water example can be regarded as a stew constraint optimization problem so we can apply some useful technique in optimization for the world's or exciting momentum methods we are adapted to accelerate gradient descent I helped escape from local Optima yes Gd and also helps to stabilize update direction so as a momentum in the iterative methods a momentum iterative fast gradient method this algorithm is very simple where the function with respect to the input in each iteration calculates waxer GT in the direction across iterations each iteration we apply the son of the waxer to the examples us the decay factor to control how much we trust Paris gradient and the Karen ingredients normalized to norm because we noticed as the skill of the Korean in different iterations vary in magnitude and here are some results we're taking separate three tests a lot of models and the success rate my option which is our method and take away both smaller which near 100% success rate that's like half gsm but a food labels model with much higher success rate we also study the effects of the number of iterations on blackbox attacks we also attack it's a pure mystery and measure measures the success rate and several black box models we can say that the success rates for attacking a black box model that's not the quiz we increasing the number of iterations so all matter can in some sense awaits fell out between the tiger body and transferability and another thing that is crucial for up ching-ching good Street Alex to attack and sample models the menses assumption is that if our example email Russell for multiple models and it's more likely to be mitts ratified by other black bounce models we propose to attack assemble models whose lodges are fields together we also compare the results of a Turing sample
matters such as attacking an instable models whose predictions or loops are fields which show the results of
attacking sample models in sample in logic consistently outperforming assemble in predictions and in sampling loads and applying the momentum iterative first gradients of gm2 and samo models which a very high success rate for blackboard attacks for example for attacking inception is for inception language and 152 the generator works for example ten full inception which vary with very high success rates know I just finished first work and this would work as from world with chato him just him and profession professor to Jinju our work we propose a new network architecture which can return robust predictions in the world so exciting we name the new network as Max Mahalanobis linear discriminant analysis networks and we shall provide to mmm the universe and this'll work just published in ml first it's a power our motivation our motivation one is that almost operator and networks suffer from brutal attacks we are human imperceptible perturbation commits Lisa's high acuity networks and our second motivation note that our technical fifa world deep nine consists of a nonlinear to mention part that map's the input to the Helen feature and a linear classifier part acting on hidden feature together output however most of the world focus on design improved from nonlinear transformation part like a GG and written and so on and by contrast the linear classifier is under Explorer which surely by default your summit's regression do for our goal is to determine new network architecture for petrol performance in the world setting to achieve this we decided to substitute a new linear classifier apart for some mass regression and power metal it comes from two inspirations and the first inspiration comes from Ephram at all they showed that observe input distributes as a mixture of gouchin the linear discriminant analysis which represents a promise to LD is more efficient than low density regression more efficient is a leonis that's training that as they are to obtain certain average however in practice data points hardly distributes as a mixture of tension in the input space stop this weekend's permission to from the facts then neural networks are powerful generative models has demonstrated that a deep neck can learn to transform a simple distribution for example a mixture of rotational culture complex distribution so it sings the reverse direction should also be available and this is actually what's all measured that in its nonlinear transformation part with some analysis in our method when most feature distribution has a mixture of Gaussian and price LD a on the feature to make predictions now initially raised the question is how to change the Koshien parameters one at our also models official distribution has a mixture of culture piracy treat the coaching parameters as actual extra trainable variables by contrast which is as happy parameters calculate by our algorithm which can provide theoretical guarantee on the robustness and induce a mixture of Gaussian model is named mass Mahalanobis distribution appropriate to MMD intuitively a maximum minimum Mahalanobis distance between two cotton components maximum integer for example samples in different classes are separate the most so when the distribution is I'm empty there are three classes the notions of MMP are the three more taxes of the natural Chango and other silently relate to robustness who were first gives the definition of robustness the robustness points with label is defined as a local minimum distance of its roots for example with different label and we further define the robustness of the classifier as minimum expected local distance to make
at the relation between expect a local distance and the Gaussian parameters some Mahalanobis distance between two Gaussian components of Aleppo I and J in the mixture of gaussian further a fancier robustness can be approximately represented in the simple form and recall the property of MMD we can conclude that the feature input distributors mm the approximate robustness is maximized finally it's all experiments results but first has all metal all normal examples all measure our chips or performance with TEPCO networks not that here we do not specially fan 2020 HEPA parameter for our method so it shows that Cassidy virtualization on the test I love c4 10 we can find out our metal resulting it's more only the true in the feature space here will show DocuSign or infinitive attacks it has to end or three different values of religion and result shows that our matter can largely improve accuracy compared to also training patterns much large computational it's much less computational cost we also test optimization best antibiotic the results shows at the end that the attack has to end much larger perturbation to successfully attack our method the first row it's a normal example and the second row is all virtual noises crafted on the traditional ads or networks as our networks means some mass regression networks and the third row is noise noises crafted are all metal with a friend or mentor can learn more robust features such that the optimal attacking strategy that send out the metal fence is to weaken the pixels of the normal examples as a whole rather than adding mini light notice as for answer networks and besides our message and have performance on transport data sites which may be helpful in other area like fairness finally in conclusion or measure doesn't introduce actual computational cost you can largely improve your business with news of accuracy on normal examples or matter it's quite easy to implement with only a few lines of code and finally others compatible with nearly all popular networks thank you
[Applause] so any pitcher Johnson distribution right it has a power power term and it has a covariance matrix and it has a mean so the Mahalanobis distance you actually wicked it has a wiki pedia and a test or some kind of distance just like all the distance or something like that it's to measure the some similarity of two Gaussian distributions so any other question yes actually we want to use this but maybe taste let's do a secret because we need to attend another container back we can see something about it well it's not very scalable I mean if you want to scale it to a very large a satellite image net you need to you need a lot of tricks on fan to fan tuning training parameters hyper parameters like learning rates or which case but we are not very good at it so we only test them on some small textile or media middle Dell selects a personal I missed the consulate we do not need to open to any hyper parameters for our matter it can it can get a state-of-the-art performance yes so any other question okay thank you [Applause]
Feedback