CAAD VILLAGE  GeekPwn  The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018  Boosting Adversarial Attacks with Momentum
Video in TIB AVPortal:
CAAD VILLAGE  GeekPwn  The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018  Boosting Adversarial Attacks with Momentum
Formal Metadata
Title 
CAAD VILLAGE  GeekPwn  The Uprising Geekpwn AI/Robotics Cybersecurity Contest U.S. 2018  Boosting Adversarial Attacks with Momentum

Title of Series  
Author 

License 
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2018

Language 
English

Content Metadata
Subject Area  
Abstract 
Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial at tacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a blackbox model with a low success rate. To address this issue, we propose a broad class of momentumbased iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for blackbox attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our blackbox attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Nontargeted Adversarial Attack and Targeted Adversarial Attack competitions. Tianyu Pang is a firstyear Ph.D. student of TSAIL Group in the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Jun Zhu. His research interest includes machine learning, deep learning and their applications in computer vision, especially robustness of deep learning.

00:00
Real number
Connectivity (graph theory)
Artificial neural network
Electronic mailing list
Commutator
Mathematical analysis
Computational intelligence
Word
Machine learning
Momentum
Representation (politics)
LipschitzStetigkeit
Linear map
00:53
Functional (mathematics)
Multiplication
Mobile app
Momentum
Optimization problem
Gradient
Variance
Virtualization
Online help
Heat transfer
Perturbation theory
Distance
Cartesian coordinate system
Mathematical model
Mathematical model
Web 2.0
Word
Voting
Term (mathematics)
Order (biology)
output
Momentum
Mathematical optimization
Mathematical optimization
02:45
Pulse (signal processing)
Game controller
Functional (mathematics)
Block (periodic table)
Decision theory
Direction (geometry)
Heat transfer
Mortality rate
Black box
Limit (category theory)
Measurement
Bit rate
Iteration
Different (Kate Ryan album)
Mixed reality
Infinite conjugacy class property
Linearization
Boundary value problem
Numerical analysis
Iteration
Distortion (mathematics)
04:01
Confidence interval
Gradient
Direction (geometry)
Water vapor
Mathematical model
Order of magnitude
Substitute good
Bit rate
Computer configuration
Personal digital assistant
Query language
Local ring
Pairwise comparison
Predictability
Email
Algorithm
NPhard
Observational study
Touchscreen
Constraint (mathematics)
Optimization problem
Gradient
Sampling (statistics)
Sound effect
Flow separation
Measurement
Mathematical model
Prediction
Normal (geometry)
output
Resultant
Functional (mathematics)
Momentum
Divisor
Data recovery
Artificial neural network
Maxima and minima
Heat transfer
Black box
Mathematical model
Field (computer science)
Goodness of fit
Escape character
Programmschleife
Iteration
Musical ensemble
Spacetime
Software testing
Gradient descent
Mathematical optimization
Multiplication
Mathematical model
Direction (geometry)
Mortality rate
Group action
Wave packet
Sign (mathematics)
Stochastic
Statement (computer science)
Numerical analysis
Iteration
Momentum
Mathematical optimization
Gradient descent
07:07
Complex (psychology)
Pixel
Linear regression
Distribution (mathematics)
Bit rate
Mathematical model
Variable (mathematics)
Substitute good
Different (Kate Ryan album)
Local ring
Social class
Rotation
Theory of relativity
Structural load
Gradient
Sampling (statistics)
Infinity
Parameter (computer programming)
Maxima and minima
Hand fan
Category of being
Message passing
Pattern language
Spacetime
Point (geometry)
Programming paradigm
Momentum
Transformation (genetics)
Feature space
Connectivity (graph theory)
Maxima and minima
Mathematical analysis
Mass
Mathematical model
Automatic differentiation
Architecture
Mixture model
Blackboard system
Contrast (vision)
Normal (geometry)
Computer architecture
Form (programming)
Default (computer science)
Focus (optics)
Distribution (mathematics)
Copyright infringement
Artificial neural network
Theory
Mortality rate
Line (geometry)
Error message
Nonlinear system
Estimation
Function (mathematics)
Universe (mathematics)
Iteration
Mathematical optimization
Code
Direction (geometry)
Parameter (computer programming)
Function (mathematics)
Mereology
Formal language
Bit rate
Determinant
Predictability
Area
Algorithm
Linear regression
Logistic distribution
Virtualization
Perturbation theory
Variable (mathematics)
Measurement
Mathematical model
Prediction
Mixture model
Linearization
output
Website
Normal (geometry)
Resultant
Row (database)
Asynchronous Transfer Mode
Artificial neural network
Distance
Power (physics)
2 (number)
Population density
Root
Iteration
Operator (mathematics)
Integer
Software testing
Feature space
output
Mathematical optimization
Linear map
Noise (electronics)
Mathematical model
Mathematical analysis
Transformation (genetics)
Componentbased software engineering
Logic
15:06
Satellite
Distribution (mathematics)
Weight
Normal distribution
Similarity (geometry)
Parameter (computer programming)
Mortality rate
Covariance matrix
Distance
Power (physics)
Hand fan
Wave packet
Hypercube
Wiki
Medical imaging
Term (mathematics)
Personal digital assistant
Hypermedia
Software testing
Right angle
Tunis
00:00
hello Aaron came home from Ching hai University and here first introduce our teams and here is a list of our team members and okay come on and we get our first introduce some work above our team we get three first places in all the street facts in last year's nibs commutation and if there's a top conference of machine learning we have published three papers about words real examples on top components of machine learning and computer region besides we also have a paper and nibs this year and others three in Incheon we wield scrolls are except here it's a list of our publication and introduced just two words and the first work is posting all
00:56
virtual attacks with momentum this is attacking method or web you're saying last year on its accommodation or voter examples maliciously generated examples before general model but they are very similar to the original examples I'll briefly introduce some first attack methods and literary to toast your advantages and disadvantages examples can be sold as a optimization problem and we want to maximize the lose function of our words for example which is truly both subtract should require maximum distance between them and the most famous one holds back you fellow it's the first gradient method and which approach to F GSM and I calculates the Korean of the lose function with respect to the input and apply the son of the gradient to the input variance of app GSM apply the same of the gradient may multiple multiple terms optimization based method directly optimize the distance between virtual examples and our regional examples managed to lose
02:12
what other examples have demonstrated help demonstrating transfer policy across models and Dallas for example generated for one model Council for another model the transferability of examples enable black belt attacks which make us more secure angels in our applications also a lot of examples have school have prosthetic transfer belly which is anonymous order perturbation
02:46
some limitations of practical practical black post attacks from that existing attack masters cannot attack a black bolts efficiently for AB GSM in a fast and generous example with control for pony weather linearity assumption make no me no holds for large distortion relax efficiency to attacker wipeouts model such a success rate of xbox attack is low and influential measure has no transfer belly because a quantity moves or examples to the credit direction to maximize the most function which may offer to the modest decision boundary
03:31
solution of the children's reporting and check apology makes blood pulse attacks less effective we have experiments what a conceptual mystery with different numbers of iteration we measure the success roads or wipeouts in such a mixed race black belts in such a way for which when it's now 154 52 we know that success rates decreases we increasing the number of iterations for black box attacks another way to attack a block
04:03
based model is to you know substitute number to characterize the behaviors of the black box model as it matters to require food prediction confidence and tremendous recoveries and it's hard to be deploy for model screen are not skilled at incites and we can know your statements in the competition because we are not  Cory the defense of nations so our solution is to elevate the tradeoff between the transferability and attack ability we chose that vanilla water example can be regarded as a stew constraint optimization problem so we can apply some useful technique in optimization for the world's or exciting momentum methods we are adapted to accelerate gradient descent I helped escape from local Optima yes Gd and also helps to stabilize update direction so as a momentum in the iterative methods a momentum iterative fast gradient method this algorithm is very simple where the function with respect to the input in each iteration calculates waxer GT in the direction across iterations each iteration we apply the son of the waxer to the examples us the decay factor to control how much we trust Paris gradient and the Karen ingredients normalized to norm because we noticed as the skill of the Korean in different iterations vary in magnitude and here are some results we're taking separate three tests a lot of models and the success rate my option which is our method and take away both smaller which near 100% success rate that's like half gsm but a food labels model with much higher success rate we also study the effects of the number of iterations on blackbox attacks we also attack it's a pure mystery and measure measures the success rate and several black box models we can say that the success rates for attacking a black box model that's not the quiz we increasing the number of iterations so all matter can in some sense awaits fell out between the tiger body and transferability and another thing that is crucial for up chingching good Street Alex to attack and sample models the menses assumption is that if our example email Russell for multiple models and it's more likely to be mitts ratified by other black bounce models we propose to attack assemble models whose lodges are fields together we also compare the results of a Turing sample
07:01
matters such as attacking an instable models whose predictions or loops are fields which show the results of
07:09
attacking sample models in sample in logic consistently outperforming assemble in predictions and in sampling loads and applying the momentum iterative first gradients of gm2 and samo models which a very high success rate for blackboard attacks for example for attacking inception is for inception language and 152 the generator works for example ten full inception which vary with very high success rates know I just finished first work and this would work as from world with chato him just him and profession professor to Jinju our work we propose a new network architecture which can return robust predictions in the world so exciting we name the new network as Max Mahalanobis linear discriminant analysis networks and we shall provide to mmm the universe and this'll work just published in ml first it's a power our motivation our motivation one is that almost operator and networks suffer from brutal attacks we are human imperceptible perturbation commits Lisa's high acuity networks and our second motivation note that our technical fifa world deep nine consists of a nonlinear to mention part that map's the input to the Helen feature and a linear classifier part acting on hidden feature together output however most of the world focus on design improved from nonlinear transformation part like a GG and written and so on and by contrast the linear classifier is under Explorer which surely by default your summit's regression do for our goal is to determine new network architecture for petrol performance in the world setting to achieve this we decided to substitute a new linear classifier apart for some mass regression and power metal it comes from two inspirations and the first inspiration comes from Ephram at all they showed that observe input distributes as a mixture of gouchin the linear discriminant analysis which represents a promise to LD is more efficient than low density regression more efficient is a leonis that's training that as they are to obtain certain average however in practice data points hardly distributes as a mixture of tension in the input space stop this weekend's permission to from the facts then neural networks are powerful generative models has demonstrated that a deep neck can learn to transform a simple distribution for example a mixture of rotational culture complex distribution so it sings the reverse direction should also be available and this is actually what's all measured that in its nonlinear transformation part with some analysis in our method when most feature distribution has a mixture of Gaussian and price LD a on the feature to make predictions now initially raised the question is how to change the Koshien parameters one at our also models official distribution has a mixture of culture piracy treat the coaching parameters as actual extra trainable variables by contrast which is as happy parameters calculate by our algorithm which can provide theoretical guarantee on the robustness and induce a mixture of Gaussian model is named mass Mahalanobis distribution appropriate to MMD intuitively a maximum minimum Mahalanobis distance between two cotton components maximum integer for example samples in different classes are separate the most so when the distribution is I'm empty there are three classes the notions of MMP are the three more taxes of the natural Chango and other silently relate to robustness who were first gives the definition of robustness the robustness points with label is defined as a local minimum distance of its roots for example with different label and we further define the robustness of the classifier as minimum expected local distance to make
12:18
at the relation between expect a local distance and the Gaussian parameters some Mahalanobis distance between two Gaussian components of Aleppo I and J in the mixture of gaussian further a fancier robustness can be approximately represented in the simple form and recall the property of MMD we can conclude that the feature input distributors mm the approximate robustness is maximized finally it's all experiments results but first has all metal all normal examples all measure our chips or performance with TEPCO networks not that here we do not specially fan 2020 HEPA parameter for our method so it shows that Cassidy virtualization on the test I love c4 10 we can find out our metal resulting it's more only the true in the feature space here will show DocuSign or infinitive attacks it has to end or three different values of religion and result shows that our matter can largely improve accuracy compared to also training patterns much large computational it's much less computational cost we also test optimization best antibiotic the results shows at the end that the attack has to end much larger perturbation to successfully attack our method the first row it's a normal example and the second row is all virtual noises crafted on the traditional ads or networks as our networks means some mass regression networks and the third row is noise noises crafted are all metal with a friend or mentor can learn more robust features such that the optimal attacking strategy that send out the metal fence is to weaken the pixels of the normal examples as a whole rather than adding mini light notice as for answer networks and besides our message and have performance on transport data sites which may be helpful in other area like fairness finally in conclusion or measure doesn't introduce actual computational cost you can largely improve your business with news of accuracy on normal examples or matter it's quite easy to implement with only a few lines of code and finally others compatible with nearly all popular networks thank you
15:10
[Applause] so any pitcher Johnson distribution right it has a power power term and it has a covariance matrix and it has a mean so the Mahalanobis distance you actually wicked it has a wiki pedia and a test or some kind of distance just like all the distance or something like that it's to measure the some similarity of two Gaussian distributions so any other question yes actually we want to use this but maybe taste let's do a secret because we need to attend another container back we can see something about it well it's not very scalable I mean if you want to scale it to a very large a satellite image net you need to you need a lot of tricks on fan to fan tuning training parameters hyper parameters like learning rates or which case but we are not very good at it so we only test them on some small textile or media middle Dell selects a personal I missed the consulate we do not need to open to any hyper parameters for our matter it can it can get a stateoftheart performance yes so any other question okay thank you [Applause]