AI Village - Machine Learning Static Evasion Competition

Video thumbnail (Frame 0)
Video in TIB AV-Portal: AI Village - Machine Learning Static Evasion Competition

Formal Metadata

AI Village - Machine Learning Static Evasion Competition
Alternative Title
Competitions in Infose /ML
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Presentations from DEF CON 27 AI Village
Suite (music) Greatest element Code Multiplication sign Source code Sheaf (mathematics) Neuroinformatik Medical imaging Inference Malware Mathematics Roundness (object) Machine learning Different (Kate Ryan album) Negative number Cuboid Endliche Modelltheorie Monster group Descriptive statistics Exception handling Cybersex Personal identification number Collaborationism Email Constraint (mathematics) File format Binary code Gradient Moment (mathematics) Sampling (statistics) Electronic mailing list Bit Maxima and minima Price index Lattice (order) Electronic signature Data mining Googol Hash function Phase transition Normal (geometry) Website Right angle Quicksort Ranking Resultant Point (geometry) Slide rule Functional (mathematics) Service (economics) Computer file Open source Perceptron Virtual machine Control flow Raw image format Rule of inference Machine vision 2 (number) Wave packet Number Revision control Term (mathematics) Internetworking String (computer science) Operator (mathematics) Data structure Proxy server Domain name Noise (electronics) Information Artificial neural network Weight Planning Line (geometry) Antivirus software Software Integrated development environment Universe (mathematics) File archiver Musical ensemble Family Window
awesome and for final talks of the day
we have Hiram Anderson and Ryan Cove are
on competitions in info second ml and
really quick we also want to remind you
afterwards we have a I unwind where
you'll be able to sort of play around
with some of the competitions and data
that they'll be presenting today so
without further ado can we get a big
round of applause for them
Thanks there are two of us I'm pleased
to be speak with Ryan Cove our
announcing two different competitions so
I'll be I'll be as fast as possible to
leave as much time for Ryan as necessary
and I have a plane to catch also so my
name is Hiram Anderson I'm the chief
scientist at endgame and I also want a
wreck recognized Zoltan bal?zs wait he
is a co-author of this work and has
actually done most legwork to cross the
finish line to provide to you very
relevant to recent news a competition
for evading next-gen antivirus so we
like to launch right now a 10-week
competition the skinny is this you will
download malware and modify raw bytes
and have them evade up to three machine
learning models they must when you
submit the malware they must remain
functional and if you do you will win
the best money in the best GPU that
money can buy for a for a PC ok so
that's the skinny I want to just break
down in detail what this competition is
about and we we totally encourage
anybody and everybody to join this
competition and they're really good
reasons to do so
so first machine learning static evasion
so let me get into a bit more detail the
competition is about static detection of
malware using machine learning and on
all of the malware samples that you'd be
provided are scored as malicious by the
machine learning models so um you'd need
to come up with techniques to modify the
binary in a way that does not change its
behavior or break the PE file format for
a Windows executable file for example
you might as shown on the slide you
might unpack packs binary rename a
section at a new section and attempt to
disguise the malware
as benign so this is an adversarial
machine learning competition and there
have been many of these in the image
domain but I'll just note that likely
you'd want to leave your LP norm kin
constraints at home this is not the same
as a machine a computer vision
competition some of those techniques
could be useful to you but to be
pedantic images degrade gracefully it's
very easy to add noise to an image and
it still remains an image but if you add
if you change bytes in a file you'll
very likely break the file or at least
break the functionality so the the
threat model here is different and the
constraints for what what you're
preserving is very different so for this
contest you'd be welcome to still follow
gradients in the models but the chances
are that if you naively perturb byte
values that you'll be unsuccessful so
the rule is briefly step one you will
need Google creds to go register at
evade malware ml dot IO once you have
registered you will also agree to Terms
of Service that are simple it's
essentially says this is real malware
please be careful and for any monster
malware that you create that is your
creation are you responsible for it okay
you will download 50 malware samples the
malware samples represent sort of
popular families over the last 12 to 18
months including you know those that
have targeted financial industries and
things like that when you download those
samples you may also download a full
description of three completely open
source machine learning models and I'll
describe those a bit more detail the
next step would be for you to do your
thing and the best intent of this that
it'll work best if you'd sort of do this
on your own local resources you can
submit anytime but there's a there's a
time delay when you submit so it's best
to check and do things locally let's see
the next the next sample is a the next
that next step is after you've modified
your samples you want to make sure that
they still they still work
you will get no points for samples that
do not work lastly when when they have
when you verify that they work you'll
put them back in a zip zip file and
upload them to the website that's at
evade malware ml do you earn points for
evading for every sample there are three
models and you get one point for evading
each model so 50 samples three models
the maximum score would be 150 points so
you could win with 150 points so long as
you submit your your submission before
somebody else who might get 150 points
right and just a note this is you know
we would have loved to host this on
cable except they don't really do sand
boxes or malware so instead following
kaggle if you would to qualify for the
GPU we would we would want you to
publish your solution to share with the
community so those are the basic rules I
want to just go through a little tip a
few tips and tricks to help get you
started for those who would be
interested in participating so the first
is that when you download these samples
they will have names like zero zero one
zero zero two and zero zero three right
and when you muck with them and upload
them they should also retain the names
zero zero one zero zero two zero
otherwise we have no idea what sample
will belong to what because you changed
the bytes okay but you download zip file
and you should upload also a zip file
when you upload the zip file there's a
few gatekeepers that this will go
through so first there's sanity checks
like does it have the right file names
is it a PE file or did you totally you
know did you mess up the MZ header
somehow does it correspond to known
hashes if that passes it will pass on to
be analyzed by one of three machine
learning models which I'll describe in a
moment but they are malloc on the non-
malkov and ember each three of these are
all open source models there's papers
about all of them if you if the sample
evades any one of those three models
then it will go into a sandbox phase
where we will run it on our server and
verify that the
that the sample still exhibits it's
malicious behavior so as an example if
you look at the bottom of this graphic
if when you upload a file
let's see 0 0 1 it looks like it evaded
2 machine learning models but did not
run a sandbox 0 points right it is not a
functional 0 points
sample 0 0 2 evaded one machine in the
model and and did its thing in the
sandbox one point etc so the models just
Malcon the first model is a simple
convolutional neural network that
operates on raw bytes so bytes in score
out and it consists of simply an
attention network and a single layer
convolutional network followed by a
multi-layer perceptrons so this is a
fully differential model much like you'd
see in computer vision and when you go
to this github site this listed there
and also listed all over on the website
you can download the inference code the
the weights the model structure
everything you need to construct any
kind of white box attack you might you
might need to do the second and also
there's there's a paper look list the
bottom you can read all about this model
and everything about it ok
the second all this code by the way for
it for these differentiable models are
in pi torch the second model is non
negative mal conf and it is identical to
mark line except for one thing and that
is it has been trained with a non
negative weight constraint and
essentially the purpose behind that is
that it wants to pay attention only to
indicators of maliciousness and i you
know ideally it'd be harder to fool by
adding sort of like benign strings to
your to your binary right the third
model is ember 2.0 yesterday ember is so
hot ember 2.0 was released yesterday and
it is an improvement upon ember 1.0 from
from last year but the difference here
if this is still a totally open source
you'll have the whole model the
difference here is that a couple of
ember 2.0 is a very competitive model it
is obviously it's a it's a light gb
model and not differentiable so you have
to come up with maybe a different set of
attacks then then following gradients
obviously what one more thing is that
three models were trained on the same
data on the ember 2018 sample so you'll
have a list of all the training hashes -
so you essentially know everything about
how these models were created alright a
few tips um we do hope that you you form
teams and collaborate but there's just
one GPU so you'd have to decide how to
divvy up the chorus if you're the
winning team please do invest 30 minutes
in in checking out the code the model
the inference code and getting it to run
locally it will save you a ton of time
to not rely on submitting things to the
service do you invest 30 minutes and
setting up a local Windows 10 64-bit VM
that is the environment in which we'll
be checking the functionality of your
submitted samples as always be safe be
responsible don't connect your VM to the
Internet things like that okay if you do
when you do upload it could take up to
30 minutes for us to validate the the
full suite of samples in your zip file
so it's to your advantage to do this
offline on your own and then submit okay
so more tips some things you might
consider as you're doing this
competition there's a number of things
that you can do that will not it should
not I should say that generally should
not break the PE file format you can
append data to the file like likewise
shown with a universal silence bypass
you can add or remove the signature you
can fix the checksum which is often
broken and malicious binaries you can um
you can remove the version info you can
change code or data with no ops there
are lots of things that you can do and
that's up to you to be creative to do
that things that aren't allowed that
won't work for you you cannot make a
dropper and drop the original sample
that will not be picked up in our
sandbox you cannot make a
self-extracting archives that will not
be picked up in our sandbox and I will
advise a leave leaf is the the tool for
that we use in an ember 4 for
instrumenting and modifying executable
files it's a fantastic thing with python
bindings and I'd highly recommend you
can do most of these these tips that
I've listed here using leaf alone
the last tip I'll say is there's a slack
channel so if you're having trouble or
if you want to share ideas join on the
slack channel which is also listed on
the website and and also go ahead and we
encourage that you'll learn from the
wisdom of others standing stand on the
shoulder of giants just Monday
what concluded a kdd adversely male
workshop about malware but theirs was
only import features only so it was not
live malware there was no functionality
requirement but there are things you
might learn from those results also
concluding this week who was the hack in
the Box cyber big challenge there's a
slightly different where they created
reinforcement learning agents to write
to a pins to the malware source code and
submit those samples so again that's a
slightly different threat model but
there are things that you might learn
that could help you there and that is it
so I'm I'm concluding in 12 minutes and
15 seconds rank over the competition
opens exactly right now and it closes in
10 weeks from now and Friday October
18th and I'll just remind you if you'd
like to win the the fastest GPU that
money can buy for your desktop you must
be able to publish your solution so we
encourage if you have I'll be here
zoltan will be here if you'd like to
know more about the competition or
details please hit us up hope you can
get started tonight at the AAI unwind
right after this talk alright thank you