AI VILLAGE - Beyond Adversarial Learning – Security Risks in AI Implementations

Video thumbnail (Frame 0) Video thumbnail (Frame 1411) Video thumbnail (Frame 2374) Video thumbnail (Frame 2701) Video thumbnail (Frame 4921) Video thumbnail (Frame 5470) Video thumbnail (Frame 6159) Video thumbnail (Frame 7821) Video thumbnail (Frame 9065) Video thumbnail (Frame 9479) Video thumbnail (Frame 9768) Video thumbnail (Frame 10454) Video thumbnail (Frame 11497) Video thumbnail (Frame 13040) Video thumbnail (Frame 15414) Video thumbnail (Frame 15787) Video thumbnail (Frame 16503) Video thumbnail (Frame 17085) Video thumbnail (Frame 17992) Video thumbnail (Frame 18256) Video thumbnail (Frame 19132)
Video in TIB AV-Portal: AI VILLAGE - Beyond Adversarial Learning – Security Risks in AI Implementations

Formal Metadata

AI VILLAGE - Beyond Adversarial Learning – Security Risks in AI Implementations
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
A year after we discovered and reported a bunch of CVEs related to deep learning frameworks, many security and AI researchers have started to pay more attention to the software security of AI systems. Unfortunately, many deep learning developers are still unaware of the risks buried in AI software implementations. For example, by inspecting a set of newly developed AI applications, such as image classification and voice recognition, we found that they make strong assumptions about the input format used by training and classifications. Attackers can easily manipulate the classification and recognition without putting any effort in adversarial learning. In fact the potential danger introduced by software bugs and lack of input validation is much more severe than a weakness in a deep learning model. This talks will show threat examples that produce various attack effects from evading classifications, to data leakage, and even to whole system compromises. We hope by demonstrate such threats and risks, we can draw developers’ attention to software implementations and call for community collaborative effort to improve software security of deep learning frameworks and AI applications.
Implementation Cross-correlation Different (Kate Ryan album) Cybersex Infinite conjugacy class property Office suite Information security Information privacy Mereology Information security
Cybersex Algorithm Implementation Software Multiplication sign Motion capture Virtual machine Flag Bit Information security Information technology consulting Computer
Pattern recognition Implementation Numerical digit Code Artificial neural network Virtual machine Parameter (computer programming) Mereology Formal language Wave packet Architecture Software Arrow of time Damping Software framework Endliche Modelltheorie Computer-assisted translation Area Pattern recognition Digitizing Plastikkarte Line (geometry) Cartesian coordinate system Sample (statistics) Software Software framework output Right angle Resultant
Complex (psychology) Code Insertion loss Open set Parameter (computer programming) Formal language Number Software framework Damping Endliche Modelltheorie Computer-assisted translation Information security Addition Twin prime Kolmogorov complexity Code Line (geometry) Cartesian coordinate system Number Sample (statistics) Calculation Software framework Right angle Lipschitz-Stetigkeit Library (computing)
Computer program Slide rule Algorithm Implementation Digital electronics Link (knot theory) Substitute good Number Software bug Medical imaging Befehlsprozessor Natural number Software framework
Medical imaging Touchscreen Code Motion capture Implementation
Point (geometry) Game controller Parsing Code Multiplication sign Memory management Parameter (computer programming) Cartesian coordinate system Graph coloring Number Medical imaging Semiconductor memory Memory management output Implementation Buffer overflow Buffer overflow
Computer program Slide rule Greatest element Pixel Patch (Unix) Real number Virtual machine Mereology Graph coloring Metadata Number Software bug Wave packet Product (business) Medical imaging Different (Kate Ryan album) Endliche Modelltheorie Computer-assisted translation Patch (Unix) Software developer Memory management Content (media) Line (geometry) Category of being Personal digital assistant Universe (mathematics) output Website Right angle Buffer overflow Resultant
Computer program Asynchronous Transfer Mode Algorithm Link (knot theory) Real number Expert system Virtual machine Menu (computing) Coma Berenices Shareware Annulus (mathematics) Mach's principle CAN bus Medical imaging Internetworking Analog-to-digital converter Normed vector space Endliche Modelltheorie Local ring Flynn's taxonomy Resultant YouTube
Dataflow Asynchronous Transfer Mode Pixel Key (cryptography) Code Multiplication sign Menu (computing) Bit Color management Function (mathematics) Coefficient of determination Semiconductor memory Gamma function Normal (geometry) Flynn's taxonomy Resultant
Rule of inference Computer program Default (computer science) Asynchronous Transfer Mode Code Interface (computing) Programmable read-only memory Archaeological field survey Cartesian coordinate system Airfoil Shareware Integrated development environment Gastropod shell Videoconferencing Point cloud Local ring Arc (geometry) Tunis
Computer program Complex (psychology) Algorithm Kolmogorov complexity Archaeological field survey Software framework Cuboid Quicksort Cartesian coordinate system Frame problem Physical system
come here talking about beyond adversarial learning security risks in AI implementations so he's going to take it away Oh second introduction well we'll come to my session on Khan and here I'm going to talk about probably something different from the other talk in the CI village because I'm gonna talk about really the security part I'm a security person so this is a work jointly with a couple of people from chiho's 360 and reading from University of Virginia I are there and there's a for food out picture there there's no correlation between the officer and the picture okay and and we had a paper published in the HP security privacy a workshop on deep learning and security in May so this is pretty much the same talk hope I hope you guys but now there so this is something new to you okay so first a
brief introduction about myself and professor from the North Sea Georgia in the past my work is mostly on the software security side I organize lots of capture the flag teams and some of some of them are well known like sack dark out dissect I'm also the founding mentors for the like Blue Lotus team one at our team like team toxic was last the finalists of the cyber Grand Challenge like two years ago year we were one of the seventeen so recently I'm actually right now taking leave from University I'm doing consulting work in industry okay so now let's go back to the main
topic I'm now going to talk about the machine learning algorithm I'm actually going to talk about the a little bit about the deep learning the implementation we go how much time so I'm gonna try to be brief so we're gonna start with like a simple question
example so you all seen this if you work on deep learning you open the textbook this is the first example you see my question for you is like how many lines of code for you to implement this this amnesty that do that does the digital recognition it you take a picture has a digits in it they recognize it how many lines of code to implement this it's now long like it's a for input that's pretty much like you need a networking card and you need a solver totally it's like I always say less than 300 line okay now how can you do a fancy thing like all the magic in such small piece of code it's because our current implementation on most of the deep learning application they're not building from scratch you early you use a framework there are lots of common framework like tensorflow cafe torch you know you might use some wrapping language but pretty much all right at least for the people I talk to they don't build this from scratch okay and when we discuss the implementation arrow is emolument you look at you the like adversary machine learning you really talk about on the top layer like you're looking at your model your parameter sometime you might talk about your training data but we don't talk about much underneath okay so the underneath part actually matters let me show you it's an example
so you probably if you are in this area you have a look at lots of cat pictures soft kitty pictures right so this is the one example I can regard from cafe with download you know maybe run it you can you can run this easily I know assuming you have some model from other place and then you run give a picture it you know produce some results know what I did an additional thing as I did like on this
particular one can build as a Linux application so I do I Oh didi okay for people that you know in this feel like if you do security okay what this does it shows all the library dependency of that application right so in our particular one that recognized cat as a code is now alone but then it depends on one 137 libraries okay that's a loss of libraries right I mean we did some now
really scientific calculation and here's some number I got this is actually I did this in 2017 it may be old but gave you a rough idea so I go to count the lines of code in cafe tensorflow and tort itself now I said this is now scientific because some of them are implemented see some of them in Python so lines of code doesn't relate you can now compare them but roughly you see like there's a few hundred k lines of code in each of this framework okay now in addition they all depends on a bunch of libraries that nollie and he may include that a few hundred line K like your lines of code so there's a lot so package and then then I gave a few example like for example Cafe I put there's likely be open CV lip protobuf right you know there's lots of package are coming you you need have some language to parse the model or the parameter okay you don't want to implement it from scratch well when you have this complex dependency people in security aren't really excited we're happy right so that's what we you know
we did some work this is the only in the summer I was doing some company comes loving working in a Jewish 360 like in in one summer if this is a number of CVD fund related to all the other early package and then we actually find more I didn't put their good this is for all these slides were for previous paper published early this year and it was written last year okay and then I put a picture there and the picture says artificial intelligence is no substitute for natural stupidity okay now this was how different meaning before but I want to use it to say that in AI where we are talking about so many great thing about AI but if you have this stupid programmer that doing something for this brilliant algorithm then you have trouble okay now give you to some more concrete some people say oh this is not in the you know how where are these bugs
give you one example this is its heart I didn't expect we have the such small screen okay but I'm gonna try to raid out like so this is screenshot I go to the like circuit hop of cafe and this is like their image data later so I click the link like the CPU implementation I
can now even right here so this is the image data layer okay I basically I'm trying to do screen capture to say where the code is this is in cafe image data layer and if you click it and this is a code
there and then there's the include that
actually points like they use OpenCV okay now so which means that even open CV is a dependent package every time you use Cafe to write an image parsing you know deep learning application you say raideen the he made it to this layer you're not calling this code and this code actually calling OpenCV to parse the image okay and then you know I'm
just again I don't expect you to read the code so give you a rough idea this I pick one of the CVE you know we find and then it's loose here what it does is it read a picture it try to you know parse it and the partner highlighted that's actually controlled by the input that's in the image so what happened is this piece of code try to read the image into a layer and then the first they read some parameter from the image that house usually the size of image and then now this is particular control the color palette like depending on the color you have right and then they later use this number then they have a memory copy and then you know you say actually have the wrong number there's the heap overflow there and then
I show you this is you know we report the bug in 2017 and OpenCV developer actually fixed it they know this even the pack this is a patch for that but and you see that they actually shows that in Motorola as real and actually only one line it because the two blue line and the bottom of the two red line that's same so really the top the CV assert part that's what they add in so that adding a line says color use less than 256 because earlier I told you like the input can be larger than 256 though there's over in the heap overflow right all right the reason I bring this up is that again this is the programmers now that grew they put a patch this patch is not complete and fortunately because they didn't consider the number lessons 256 but what if I gave on nectar and then a number which become a large number so sparks do there I try to convince them to say hey there's still about the guy it's like saying I don't care you know this is really now my problem okay so I have to go further to say
you want the problem so again go back to this Kitty case we take a picture they gave to this no cafe model like a paper one that classify image and to make sure I'm now cheating's I download the model on the Berkeley Lab website and then they claim this is a train from the image data the competition right and they're using the Google model so that I use out theirs and if you gave that Kitty my picture and then it will tells you okay what kind of cat it is give you different category while the probability okay so we take that program I cross this for your picture because I need to convince this developer say there's something you need to fix okay so this these are the four picture I show on the front well that's the slide and the top left one the Bulldog is I grabbed from the Internet okay and the reason you know grab from the Internet of grab a poodle picture if you get ok products the mascot for university or a football team right so I grab that one and then I made I mess up the other three now visually you look at it if you like okay there's something different for people sit close ok this I'm not doing a dream you know machine learning I'm not touching the pixel okay what happened is I actually must have the metadata I actually lie about how many color I use that's why the looks is slightly different okay but the the picture content is the same okay so I only messed up a metadata this is the result of the classification
again I don't expect to YouTube to rate it so I'm gonna use image to to show what happened this is the same program using Berkeley's model Google's you know data like you know the result of course
the initial one is throwing its a Buddha because that's a the picture I got from the internet and this I have to say the machine learning algorithm that's pretty good it's the real picture ok and really great and I talk to AI expert one thing I'm really always surprised is that they always assume that data are to come from natural place and you take a picture from with your phone well no like you know that the second one is certainly you imagine like I've put a
picture I mess up with data segfault okay no surprise right now I I have a key for flow of course I sag father's okay almost like walk here okay so that's easy to understand but I decide to push the folder a little bit we put a you know shellcode in it okay so this time i alighted lighly I
always imagine like what this foo dog want to be classified so I imagine there's a Buddha I want to fly so I decide to light it to you know classify the flying pig okay so that's the result in fact because this is not again this is not as much memory learning I'm now trying to get a panda become become a Gibbon I could mess up the pixel the output is generated by me my code they are actually writing my code I can actually like to say anything okay I can let a panda to become kind of you know known for Panda anything okay so to go
further the if you didn't see it like I have a shout out there sorry I let him like the cafe program take my picture then they spin out the shell and basically I drew like the
picture I show him there I own you we even did this actually in a cloud environment I didn't I didn't have this in the you know in the picture but you can imagine like you running a classification surveys using our application you just say you're lazy you just copied using the default application from cafe or whatever example it run it as a Clouseau's this same code can own your cloud basically you have this interface I can have a shell come back we have a demo on that I have a video I present that in the in the POC conference so to summarize okay I mean there are
lots other box I only show you a copy one don't consider other are better okay we also have tensorflow box okay so but overall while I want to tell the audience is that deep learning frame application depend a lot of third-party packages yeah you need to be careful about others and complexity leads you want abilities yeah if you have this large complex program you're gonna have trouble somewhere and even your algorithm is perfect you have trouble and the risco application here i only show you i can do dinner of survey sacking segfault I can get system compromise I can certainly do miss classification there's sort of also show that so the evasion attack so that's pretty much all I want to do there are other example I can show you offline feel free contact me