We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Face Off: Brute-force attack on Biometrical-databases

00:00

Formal Metadata

Title
Face Off: Brute-force attack on Biometrical-databases
Title of Series
Number of Parts
141
Author
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Magic happens every time you take your phone out of your pocket. Somehow, just by looking at the screen, your phone recognizes you (and only you) and magically unlocks. Have you ever stopped for a minute and thought to yourself - How does that even work? And maybe more importantly, how secure is it? In this session, we're going to understand how facial recognition works under the hood. We'll dive into some potential security problems, and we'll show you how we were able to break into a biometric database built on the Dlib-python-library by applying a sophisticated brute-force attack. The results will surprise you.
114
131
DatabaseFood energyComa BerenicesHand fanComputer animationLecture/ConferenceMeeting/Interview
Coma BerenicesAndroid (robot)LoginMereologyDigitizingMechanism designPasswordComputer animation
Information securitySoftware engineeringResultantBackupNumberCombinational logicSemiconductor memoryPresentation of a groupComputer animation
Exact sequenceFood energyStorage area networkWeightBlack boxActive contour modelField (computer science)Likelihood functionMachine learningCross-correlationComputer scienceGraph coloringCodeEmoticonWeb pageLengthSmartphoneDifferent (Kate Ryan album)QuicksortData miningPresentation of a groupProgramming languageStandard deviationArtificial neural networkMobile appFingerprintEvoluteDisk read-and-write headNumber2 (number)TouchscreenInformationMathematicsBitCorrelation and dependenceDegree (graph theory)Sound effectFunction (mathematics)outputSoftwareMultiplication signProjective planeUniverse (mathematics)Connected spaceWeb 2.0Chord (peer-to-peer)Software developerSpacetimePhysical systemMoment (mathematics)PlanningPoint (geometry)Boss CorporationNeuroinformatikDemosceneClique-widthSystem identificationBlock (periodic table)Shape (magazine)Lipschitz-StetigkeitUniqueness quantificationTask (computing)Vector spaceAngleComputer programmingBuildingComputer animation
Computer networkComa BerenicesMedical imagingQuicksortSpecial unitary groupBitGame theorySphereCasting (performing arts)Electronic mailing listProcess (computing)Wave packetWeightActive contour modelGreen's functionGraph coloringCross-correlationTheory of relativitySoftwareCodierung <Programmierung>InformationPattern languageMechanism designYouTubePrice indexFeature spacePattern recognitionDifferent (Kate Ryan album)EvoluteComputer architectureMathematicsMultiplication signDistanceDigitizingCircleFunction (mathematics)outputCASE <Informatik>NumberType theoryPhase transitionSpacetimeView (database)MereologyData compressionPoint (geometry)Descriptive statisticsArithmetic meanTwitterWordClique-widthPixelCombinational logicVoltmeterVector spaceLengthCartesian coordinate systemDimensional analysisComputer animation
Large eddy simulationDistanceGoodness of fitCasting (performing arts)Point (geometry)Medical imagingElectric generatorProcess (computing)Endliche ModelltheorieThresholding (image processing)Physical systemHoaxField (computer science)ResultantArithmetic meanInformationCodierung <Programmierung>Raw image formatRandom number generationBlack boxMetric systemMathematical modelSoftwareControl flowPhase transitionNeuroinformatikVector spaceDescriptive statisticsBitMereologyHeegaard splittingSubject indexingDistribution (mathematics)Forcing (mathematics)SubsetSampling (statistics)Set (mathematics)AverageGeneric programmingShape (magazine)Multiplication signKeyboard shortcutoutputDifferent (Kate Ryan album)Mathematical analysisLoginMoment (mathematics)Right anglePasswordDigitizingCASE <Informatik>Combinational logicDemosceneUniverse (mathematics)Group actionEquivalence relationReal numberEmailComputer animationDiagramXML
Roundness (object)Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
Hi, everyone. Let me start off with a few quick questions. Please raise your hand if you're using an Android phone. OK, OK, not too bad. Please raise your hand, all the iPhone users.
OK, we have some Apple fans here. Please raise your hand if you're using a Nokia phone. Wow, OK. Please raise your hand if you're using Face ID or whatever the name of the equivalent Android technology is. Keep your hands up. And that's the interesting part, if you
think that your face is a better, more secure login mechanism than the old enter many few digits passcode. OK, thank you very much. My name is Roy. I'm a security software engineer at OwnBackup. And I don't know about you, but when I got these questions first, I thought to myself, that's a no-brainer. Of course my face is more secure
than a short combination of a few numbers, isn't it? Well, I saw that most of you agree with that. So now you really have to stick around till the end of the presentation. I promise you some really interesting results. Let's go a couple of years back down in memory lane. I still remember when my dad, who's actually here today.
Hi, Dad. I remember when he bought his first iPhone. It was iPhone 3G, a pretty dumb smartphone by today's standards. But for me, as a child, who was fascinated by technology, that was something else, something
from a science fiction movie. And I remember messing around with it, downloading some apps, playing Angry Birds. And I remember this one stupid app that promised I would be able to unlock my phone using the fingerprint. All you had to do was place your finger on a screen, a cool animation of numbers flying around, just like you see in the movies played. And then, after a few short seconds, well, nothing.
Of course, the app couldn't do anything. I remember verifying it to myself by putting one finger, and then my nose, and check if they have the same fingerprint. Couple of iPhones later, we did get a fingerprint sensor at the home button. And some Android devices even got a fingerprint sensor under the screen, just like this stupid app actually predicted.
In the meanwhile, I started my bachelor degrees at computer science. It was the obvious path for me. And it was around 2017 when it was time to choose the subject for my final project at university, when Apple just introduced the new iPhone X with the new awesome Face ID.
And at that moment, I knew. I knew I had to dig around it. I knew I had to do it for that little child inside of me who tried to unlock his phone using his nose fingerprint. Today, I want to take you all to that journey with me. We will start off by creating a system just like Face ID and understand it. Then, we will try to break it and see how secure is it.
The system seems pretty intuitive, right? All you have to do is take your phone and put it in your pocket, look at it, probably not as excited as this dude, and then some magic. Some black box that you always assumed has to do something with your planning, decides if to keep the phone locked or unlock it for you.
Let's try to understand and think how we can implement this black box. We need to create a system that can use the face and create identification vector out of it. That's why it's called Face ID. We can think about then, a way to do it. We can use all sort of feature describing the face, maybe stuff like the length of the nose, the width of the face, and the skin tone.
We need to, we can use more complex, more unique features, stuff like maybe the ratio between the shade of the eye color and the shape of your lips. We need to decide which feature exactly we want to use and how many of them. We know that the usual magic number in the industry for such things is 128 features,
so let's go with that. So now, hopefully, we have a feature vector that describe the face pretty well. We can save it inside a phone and use it. So on the next day, when our user try to unlock his phone, all we have to do is scan his face again, get the feature vector, and compare it to the one from yesterday. Well, that's not that easy. Faces are really hard to walk with
because faces are always changing. We can take images from different poses and different angles and different lightings. Our hair gets longer and then shorter when we take a haircut. Our eyes can get red and swollen after a sleepless night of coding, and even our nose gets longer over the year. That's why we need a really strong solution here.
That's why we want deep learning. But before I continue talking about deep learning, I want to align everybody here in the room on the same page. I want us all to have the same terminology. So let's start off with the basics. Machine learning. Machine learning is the name of the field in computer science. When we want the computer, the machine, find a solution by itself without us giving it the solution
without us coding it. We want the machine to learn the solution by itself. It's usually involved with a lot of mathematics and some ideas we are borrowing from different fields like biology, evolution, and so on. I won't bore you with the mathematics behind of it. That's not really the point of this talk,
but I will try to give you some intuition about how this stuff really works behind the scene and together we can understand the entire solution. The next big thing we need to talk about is neural network. Neural network is the base building block for everything we've seen in recent years in AI. But don't worry, it's simple enough
that even babies can understand it. The idea behind neural network is that we're trying to create some sort of a brain to our machine. This is not really how the human brain is working. We're just borrowing some ideas like neurons from it, but our brain is a bit more complex than that. The idea here is we're having those nodes,
neurons if you will, containing some information. And by passing that information from one node to another and having a huge web of neurons, our brain here can actually learn stuff. We're creating a network out of it. We're ordering our neurons in layers. We've got ourselves the input layer here in blue and the output layer here in purple.
And the network is built in such a way that each neuron has some effect on all the neurons on the next layer. This effect can be a stronger effect or a weaker one, or it can be a positive effect or a negative one. Still doesn't make a lot of sense. Let's try to make a bit more sense of it with an example.
Let's say your boss come tomorrow morning and say, you need to solve a problem. You need to identify if what you see in front of you is Python. No, not a programming language, but a real can kill your snake Python. So for this life-threatening task, you choose to use neural network. You got yourself two inputs. You can see the color of the snake. That's pretty easy to see.
And because the snake's coming through, try to bite you, you can see the length of his teeth. So we've got ourselves the following network and we need to train it. We provided a lot of examples we collected over the years of different snakes and the network is adjusting by itself the weights, the connection between the neurons. We might by the end of the day find a network that look a bit like this.
This still doesn't make a lot of sense. Okay, this is a safe space because this is your Python. So I feel like I can share it. We as a developer has a secret. We're really good with coming up with excuses fire code succeeded. So let's try to find some reasoning here. We might say that the first neuron here
is some sort of poisonous detector or venomous detector. I actually showed this presentation to a friend of mine a couple of days ago. He's apparently really into snakes. So he told me that poisonous and venomous it's not really the same, but shut up, that's not really matter here. So we have the poisonous detector here.
We consider some strong positive correlation between the length of the teeth of the snake and the poisonous detector. That's make a lot of sense. If you think about it, snakes inject the venom to the victim using their teeth. So longer teeth mean that the likelihood of the snake in front of us being poisonous, it's high. We can also see there is some correlation
between the color of the snake and the poisonous detector. If I ask you right now to imagine a poisonous snake in your head, can you do it? You're probably seeing a really bright green snake in front of you right now. So if our brain found some correlation between the color and the poisonous detector, that makes sense that the neural net will find it as well.
We might say that the second neuron is some sort of a size indicator. It's give us some indication about the size of the snake. And that the third neuron is some sort of a pattern detector. We know from evolutionary standpoint that snake with darker color and shorter teeth need to evolve with some different defending mechanism. So it probably evolve with warning patterns on them.
They need to repel the enemies in some other way. So this network will never learn anything about evolution and other things, was still able to find the three different cool deduction about snakes. So now when the network tried to decide if the snake in front of us is actually a pattern or not, it can just combine them together. If the snake in front of us is likely to be
a poisonous snake or venomous snake, it's not likely to be a python. Python are not venomous snakes. On the other hand, if the snake in front of us is huge and has some cool patterns on it, it's probably a python. Okay, we circle all the way back to deep learning. Deep learning is just the idea of taking neural network
and add more hidden layers. That's the name of the layers between the input and the output layers. And the more hidden layers and neurons we've got in our network, we can go deeper inside the learning process and find more complex patterns inside our data. So we can solve more difficult problems with that. A known example for that is MNIST, where we try to identify which handwritten digits
we see inside of an image. I took this awesome animation, by the way, from the 3Blue1Brown YouTube channel, and I really recommend to go to his channel if you want to learn more about the mathematics behind all of it. So we know what deep learning is, but there is still the open question about how we can use it for face recognition. Luckily, we have another trick down our sleeves.
We can use a different architecture of a network. This type of network will encode or decode a network. Sometimes it's also referred to as auto-encoder networks. The idea here is we're taking some information in big dimensionality, in this case, six numbers, one, two, three, four, five, six, and the network is built in such a way that when we pass the information through the network,
we're getting the same information on the other side. We're getting back one, two, three, four, five, six. The trick of the network is that in the middle of it, we have a compressed layer, a layer with lower dimensionality, meaning our network here had to somehow learn some patterns inside the data, and now it's able to compress the information and decompress it later.
Let's try to think of it from a different point of view. We want to work with images of faces. Now, images are actually a combination of a few million pixels, but we also know from the old saying that a picture is worth a thousand watts. So let's think of it as somewhere in between. We can imagine this network
as two friends playing a game of charade. The first friend is some sort of a poet. He sees the original picture and writes us a poem using a thousand watts describing the image in front of us. The sky was blue, the sun was rising, and so on and so on. The second friend is some sort of a painter who read a poem and tried to recreate the original image.
The two friends are actually really obsessed with this game, so they are training and training more and more until they get really good at this game and can recreate any image just using the poem and the painter we paint in it. I like to think of it a bit like how the police can do sketch paint. Imagine you've been a witness to a crime, and I hope you won't be.
The police doesn't expect you to know how to draw the face of the criminal you just saw. I can draw faces. I can maybe draw a smiley face on a paper. But by sitting together with the police sketch artist, we now can both achieve our goals. All I have to do is describe the face to the police sketch artist in a meaningful way, describing the most interesting features
that the sketch artist know how to draw. So I start off by describing the different hair of the criminal, maybe from a predefined list of hairstyles. Then I can describe the criminal eyebrows. Then I describe his eyes, the colors, and maybe some glasses if he had any on him. Then I move on to describe his nose, the mouth,
and the relation between the two, and the police now has this painting of John Lennon and Ken Everesting for some bizarre reason. So let's do the same in our encoder-decorner network. We train our network to take an image of a face and return to us the same image from the other side. The network had to somehow learn
how to compress the information about face into the most meaningful feature, how to describe it. So it's maybe found out stuff like noses, eyes, and mouth. We don't really know what a different feature is. We never really train our network to look for noses, but we know we are getting here a feature vector, size 128, which describing the face.
So with a good description of the face, we can now move on. We can break our network into the two parts, the encoder and the decoder, the poet and the painter. We don't really need the painter anymore. We don't care about recreating faces. And we're left with an encoder network that can take an image of a face in
and provide us a feature vector to describe it. Okay, but how do we compare different faces? So let's talk about the face space, which is a really cool name for a movie or something like that, if you ask me. So we said we have a 128 feature vector. Let's reduce it into three-dimensional because our monkey brain can really comprehend
anything more than three dimensions. And we're left with the face space. This space here describe different faces. We can think of each of the axis as something that describe a different feature. Maybe one of them is describing the length of the nose, one of them the width of the face and so on. So now when I want to compare two different faces,
all I have to do is look at a feature of the feature space. I have my original image that I took on the first day when I bought my iPhone. I got some point in the space. Now to compare another face and check if it's me or not, I have to allow some tolerance here, some distance as I'm saying, it's still me.
So I'm drawing a sphere around this ball. This is actually Euclidean distance for those of you who did some mathematics. So now when I have another scan of me, me tomorrow, for example, I can just check if this point is actually inside a sphere. If it is, great, I can unlock the phone.
If one of you try to take my phone and access all my private stuff, you'll probably get a point further away. So you're not inside a sphere, you can really access my phone. Let's try to make it a bit more concrete with an example. We wanted to check how closely Steve Jobs is actually looking to Ashton Kutcher portraying Steve Jobs at the Jobs movie in 2013.
We wanted to check how good of a casting that was. So we compare the different faces using our method here. We saw that picture of Steve Jobs against himself over the years were actually pretty close to the original Steve Jobs image we used, well underneath the threshold. So Steve Jobs can unlock his own phone, obviously.
But Ashton Kutcher on the other hand is not very far off the threshold, but is off it. So you can't really use Ashton Kutcher face to unlock Steve Jobs phone. And I know what you're all asking yourself right now. How my face is comparing to Steve Jobs face? Well, my face is right over here. So you can see that they actually did a pretty good casting.
They found someone who pretty good looks like Steve Jobs. Okay, let's do a quick recap. And if you lost me so far, now it's really good time to come back. Now we can create a system just like Face ID. We have a system that scan the user face, provided information through the encoder network, and we are getting back a feature vector
that describe the face. And we know it's a good description of that face because in some alternate reality where we still have access to the original decoder, we can recreate the picture of you and only you just by looking at this feature vector.
So with a system just like Face ID in hand, we ask ourself, what could go wrong here? How can we break the system? So we thought of ourself quite a bit until we ran into this face. You all probably seen this face before. It's a face that was generated by computer in a 2014 research,
and it's supposed to represent the most generic average white male face. We thought also, what does it mean that a face is really generic? What will it do to a system like ours when we've provided an average face? So we ran an experiment. We collected a huge data set of faces,
a lot of faces, thousands of them. We choose a smaller subset of faces from those faces, and we ran every face from the smaller set against every face in the big set. We wanted to check how many hits can we get? How many phones can each face unlock? We got some pretty interesting results. You can see that most faces got exactly one hit.
They were able to open exactly their own phone. No surprise there. Some faces got a bit more hits, somewhere in between two and four hits, which is fine, but this one face here at index 27 got more than 15 hits, which is insane if you think about it. And you probably won't be surprised to hear
that this face is actually the generic face from before. So we had a strong hunch that something is going on in here. We started collecting a lot of information about the different faces. We started collecting all the different feature vectors of all the different faces. And we ran some analysis tools over them, some complex mathematic models over them.
And we got ourselves the distribution of each of the features. Most features were normally distributed, which if you think about it right now, might not make a lot of sense. But if we think about it a bit more, it's making a lot of sense. The network is probably taking the same shortcuts we do as humans.
When I try to identify one of my friends, all I do is look at their nose, their beautiful eyes and their charming smile, and I know who I'm talking to. So the network probably doing something very similar. It's probably looking at those features. If, for example, the network choose to do something like noses, we know that noses are normally distributed. The average nose size is five centimeters,
or two inches if you don't believe in the metric system, Americans. And the average nose size is moving from somewhere in between three centimeters long and seven centimeters long. It's very rare to see a nose longer than that. So we still don't really know what the different features are, but we now know the distribution of each one of them.
We know the probability of getting each value there. Let's do a quick side note. A brute force attack is an attack where you try all the different combination of your passwords. You start off by hitting one, one, one, one, one, one, two, one, one, one, three. This was hand-animated, by the way, so I hope you appreciate that.
You try all the different combination of your passwords until you get the right one. In the case of a four-digit passcode like this, you have 10,000 possible combination. You can go with slightly more sophisticated variant of that attack where you try the most common password, which is one, two, three, four, and you walk away from the most common password to the least common password. The password with the most probability of getting hits
to the one with the least. Okay, what if we could do the same for faces? Well, we just saw the equivalent of one, two, three, four in faces. This is the generic face, one, two, three, four. This is the face with the highest probability of getting hits. So with the knowledge we just gained about the different distribution of each of the feature, we can do the same.
We can sample faces from the distributions we just found and find ourself the most common face and we can walk away from the most common face to the least common one. So let's do brute force attack on faces. All we have to do now is build the attack. As we said, we had an encoder decode the network
that was split into two parts. The original encoder is some black box now that we don't really have access to anymore. We can just call it and get the results. And the original decoder is gone. So we had to switch things around and create a new network. These new networks take as an input a feature vector and create an image of a face out of it.
Then we feed this image to the black box and getting back a feature vector. And we train our network in such a way that if everything ran on correctly, we're getting the same feature vector that we started with. At the beginning, we got some shitty results. You can see here some random noisy image
and some random numbers that correspond to it. But over time, our networks got really good at producing images of faces. You can see it's first got the general shape of face. And it was able to composite the different features of the face like eyes, nose, and so on. We got a network that got better and better. And now it's good enough that it can produce images of faces that not only fool the encoder
to think that this is a real face, but the encoder think that these faces actually have the same feature vector that we started with. We can control now the input to the system. Okay, I promised you some really interesting results at the beginning. The results I'm going to show you now, it's really shocking.
So hold your seats really tight. We were able to get successful logins 13 times in a million attempts. One in every 100,000 attempts on average got successful login. Let it sink in for a moment. That's just like guessing correctly, a five digit passcode. This is insane.
Okay, here's the attack. You and I won't be fooled by those faces. We can tell that those are fake, but by generating thousands and millions of those generic average faces, we were able to get some hits. We were able to fool the system to think that those are real faces that actually are real coded to the phones.
By the way, if you stop on every of those faces for more than one second, you can probably see someone you know in them. Maybe it's your friend, maybe it's someone you sit next to because those faces are really generic. You can see a lot of people inside of them. So we got to the real set conclusion that no one is safe. Well, I have a couple of disclaimers about it.
First of all, it's really important to say that this research was educational only. We ran it inside the university by a system that was provided to us by the university where everybody signed an agreement to allow us to try to hug to their faces. So we never really ran it against the real world. We never really ran it against Face ID.
It was never really the point. Face ID is too much of a hustle to work with. It's come with some complementary system like blocking you out after five failed attempts. Also, we don't really know if Face ID is working exactly the same way. No one ever really disclosed how Face ID is really working behind the scene.
Another thing that is important to say is that this entire research was conducted back then in 2017, meaning we didn't have access to all the latest models and we all know how quickly the field of AI is moving nowadays. So they probably got better results by now. They probably have more secure models that can prevent such an attacks, right?
Well, the other side of the same coin is that if this entire research was conducted in 2017, we didn't have access to all the latest models. And nowadays we know a lot more about generative AI and deep fakes, and we can probably make our attack even better and fool the most sophisticated system that there are today.
One last point I would like to make, and this is a question I'm getting a lot. What about 3D images? The original Face ID is working with 3D camera. Well, we believe that our attack here is really generic and can fool the system as well, with some tweaking to them. If you think about it, the system doesn't really care
about the raw information it's getting. It doesn't really care about the 3D information. All the system is care about is the compressed information we're getting from the encoder. So we believe that with some tweaking to our system, we can now change the attack to get us 3D images that will correspond to the same feature vector that the 3D image will.
So we believe we can break to those system as well. Okay, thank you very much. I hope you learned something new today. In this Cringy animation, you can see me and my partner in crime in conducting this research, Roy. Yeah, we're both named Roy. Feel free to come by later on. I'll be outside or just send me a mail with any question.
Thank you very much. Thank you, Roy, for shedding some light on how the devices in our pockets essentially work. We won't take questions on the spot, so please give another round of applause to Roy.