We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Deja Vu - Uncovering Stolen Algorithms in Commercial Products

00:00

Formale Metadaten

Titel
Deja Vu - Uncovering Stolen Algorithms in Commercial Products
Serientitel
Anzahl der Teile
85
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
In an ideal world, members of a community work together towards a common goal or greater good. Unfortunately, we do not (yet) live in such a world. In this talk, we discuss what appears to be a systemic issue impacting our cyber-security community: the theft and unauthorized use of algorithms by corporate entities. Entities who themselves may be part of the community. First, we’ll present a variety of search techniques that can automatically point to unauthorized code in commercial products. Then we’ll show how reverse-engineering and binary comparison techniques can confirm such findings. Next, we will apply these approaches in a real-world case study. Specifically, we’ll focus on a popular tool from a non-profit organization that was reverse-engineered by multiple entities such that its core algorithm could be recovered and used (unauthorized), in multiple commercial products. The talk will end with actionable takeaways and recommendations, as who knows, this may happen to you too! For one, we'll present strategic approaches (and the challenges) of confronting culpable commercial entities (and their legal teams). Moreover, we’ll provide recommendations for corporations to ensure this doesn’t happen in the first place, thus ensuring that our community can remain cohesively focused on its mutual goals.
Auflösung <Mathematik>CodeProdukt <Mathematik>p-BlockMailboxComputervirusWebcamProzess <Informatik>OvalVideokonferenzMotion CapturingComputerMalwareAnalysisProxy ServerDatenmissbrauchClientWeb SiteZoomSoftwareschwachstellePhysikalisches SystemSystemidentifikationWebcamMalwareZoomProdukt <Mathematik>SystemprogrammGrenzschichtablösungMultiplikationsoperatorGruppenoperationKartesische KoordinatenAbgeschlossene MengeRechenschieberRPCCodeProzess <Informatik>SoftwareschwachstelleMereologieComputersicherheitQuellcodeLesezeichen <Internet>AlgorithmusApp <Programm>SoftwarepiraterieGüte der AnpassungPhysikalischer EffektMechanismus-Design-TheorieGeradeComputeranimation
Kategorie <Mathematik>WebcamProzess <Informatik>Physikalisches SystemSystemidentifikationMacOSMachsches PrinzipNotepad-ComputerDämon <Informatik>Message-PassingNetzbetriebssystemProzess <Informatik>CodeComputeranimation
Prozess <Informatik>SystemidentifikationNotepad-ComputerDämon <Informatik>Machsches PrinzipMessage-PassingGeradeFunktion <Mathematik>Syntaktische AnalyseData DictionaryClientAbfrageKonfigurationsdatenbankSyntaktische AnalyseOrdnung <Mathematik>Prozess <Informatik>Dämon <Informatik>Message-PassingKonfigurationsdatenbankCodeWebcamFramework <Informatik>Software Development KitMAPFunktion <Mathematik>Kartesische KoordinatenMailing-ListeIndexberechnungSystemprogrammRichtungSchlüsselverwaltungGrenzschichtablösungGruppenoperationHauptidealringComputeranimation
Dienst <Informatik>Prozess <Informatik>Syntaktische AnalyseData DictionaryClientKomponente <Software>SystemidentifikationKonfigurationsdatenbankKeller <Informatik>Ein-AusgabeVideokonferenzStichprobeROM <Informatik>SpieltheorieRechenwerkWebcamProdukt <Mathematik>FreewareCodeRegulärer Ausdruck <Textverarbeitung>AlgorithmusEindeutigkeitSchlüsselverwaltungGoogolBinärcodeAnalysisMalwareLokales MinimumFaserbündelProgrammProgrammfehlerBinärdatenZeichenketteÄquivalenzklasseWeb-SeiteMailing-ListePhysikalischer EffektSchlüsselverwaltungDomain <Netzwerk>ClientRoutingThreadRPCSystemprogrammRichtungBinärcodeAlgorithmusCodeReverse EngineeringEinsQuick-SortProdukt <Mathematik>Rechter WinkelBitAnalysisProgrammfehlerQuellcodeÄquivalenzklasseBeweistheorieInternetworkingSchlussregelVirenscannerSoftwareWebcamComputersicherheitGoogolProgrammiergerätSyntaktische AnalyseProzess <Informatik>StichprobenumfangGruppenoperationNeuroinformatikPunktFaserbündelGraphSoftware EngineeringWurzel <Mathematik>Computeranimation
Produkt <Mathematik>Syntaktische AnalyseMarketinginformationssystemZusammenhängender GraphProzess <Informatik>Globale OptimierungMultiplikationsoperatorSyntaktische AnalyseQuick-SortProdukt <Mathematik>BitReverse EngineeringRechenschieberAlgorithmusParametersystemCodeComputeranimation
BenutzerfreundlichkeitMarketinginformationssystemKonvexe HülleProdukt <Mathematik>AbfrageSyntaktische AnalyseKonfigurationsdatenbankDienst <Informatik>UmwandlungsenthalpieSchlüsselverwaltungKonfigurationsdatenbankProdukt <Mathematik>Computeranimation
Produkt <Mathematik>ZeichenketteProzess <Informatik>StichprobeFunktion <Mathematik>OrtsoperatorThreadStichprobenumfangAlgorithmusQuick-SortZusammenhängender GraphProzess <Informatik>Figurierte ZahlComputersicherheitGraphComputeranimation
Komponente <Software>Produkt <Mathematik>Syntaktische AnalyseGeradeAuflösung <Mathematik>BeweistheorieNotepad-ComputerSoftwarepiraterieGanze FunktionSoftwareentwicklerCodeQuellcodeProzess <Informatik>ImplementierungTechnische OptikProzess <Informatik>Produkt <Mathematik>BitCodeAuflösung <Mathematik>SoftwareentwicklerCASE <Informatik>Selbst organisierendes SystemTechnische OptikProgrammfehlerVarietät <Mathematik>Reverse EngineeringElektronische UnterschriftÄhnlichkeitsgeometrieBeweistheorieAlgorithmische ProgrammierspracheSchlussregelZusammenhängender GraphExogene VariableOpen SourceVorzeichen <Mathematik>AlgorithmusTabelleZeichenketteTUNIS <Programm>Kategorie <Mathematik>ResultanteSystemprogrammFigurierte ZahlPhysikalischer EffektPaarvergleichPunktTwitter <Softwareplattform>ComputersicherheitMAPBereichsschätzungQuick-SortDifferenteSchlüsselverwaltungWurzel <Mathematik>Domain <Netzwerk>BinärcodeGoogolVerkehrsinformationService providerSprachsyntheseComputeranimation
Transkript: Englisch(automatisch erzeugt)
All right, aloha and welcome to our talk on uncovering stolen algorithms in commercial products. I'm Patrick Worrell, I am the founder of the Objective-C Foundation, and today I'm stoked to be co-presenting with good friend, former colleague, Tom Maguire, who is an instructor at the Johns Hopkins University. So today we're gonna talk about what I believe is a systemic issue affecting the community,
and that's the unauthorized use of stolen algorithms in widespread commercial products. Figured if it could happen to us, could happen to others, maybe even happen to you. So here we'll first introduce, we'll talk about the victim application, and then we'll talk about how we were able to find and prove that other commercial products
were using its closed source algorithms, and then how we were able to ultimately resolve this situation. So first, let's talk about oversight. The victim, who I mentioned its closed source algorithms, were stolen by at least three separate unrelated commercial entities for profits and gains.
So in this part of the talk, I want to discuss oversight, specifically its internals and its algorithms, as this is important to understand, this is relevant when we dive into showing the fact that this code was ultimately stolen. Now oversight is a pretty straightforward, pretty simple utility written by yours truly. It was released in 2016,
initially as a closed source application. So that's important because we'll see the infringement occurred when corporations actually reverse engineered the binary and re-implemented the algorithms. Its goal is pretty simple, just seeks to alert you anytime anything accesses your mic or your webcam, and also identify what process
is responsible for this action. This was kind of the killer feature, no other tools at the time had the ability to identify the active or responsible process that was accessing the mic or the webcam. Now oversight was designed predominantly to detect stealthy malware that got onto your system, perhaps via a zero day or some other infection mechanism,
and when that malware would attempt to access the mic or the webcam, that gave oversight the ability to detect it and throw up an alert. So on the slide we can see an example of it detecting malware, and the rest of the malware on the slide, oversight was able to detect it with no a priori knowledge of the actual malware.
Because again, it was just alerting anytime anybody accessed the mic or the webcam. Turns out oversight's pretty good at detecting zero day vulnerabilities too, specifically those that relate to webcam or mic access, remote zoom bugs, or even zero days that Mac malware was exploiting. So again on the slide we have some examples of malware with zero days or other vulnerabilities,
and again oversight can detect that because it doesn't care how the mic or the webcam is accessed, just that it was. Finally oversight also played a pivotal role on covering good apps behaving badly. Favorite example was that it was able to uncover and prove for the first time that Shazam on Mac OS
was actually still listening, even if you, the user, turned it off. Yikes. All right, so back to oversight's killer feature, which is the ability to identify what process is accessing the mic or the webcam. For a security tool, this is obviously a must-have feature. If the process, the application accessing
the mic or the webcam is zoom, Skype, FaceTime, that's fine, that should be allowed, maybe don't even alert the user. But if it's some other malware or some other unrecognized program, obviously you wouldn't alert the user. Now you might be thinking, well yeah, this is a great feature, why didn't other tools have this capability? And the answer was it's actually very difficult
to implement and to achieve. So on the slide we have a few lines of code. It's very easy to determine that the mic or the webcam was activated. Mac OS provides a notification for this that you can register for. But that notification doesn't tell you who done it. So now let's look exactly how oversight
went about identifying the active process. It's a bit involved and leverages a bunch of undocumented features of the operating system. But again, it's important to understand this so that when we look at commercial products, we can again prove without a doubt that they stole the code directly and verbatim from oversight. So oversight has three steps that it performs
in order to identify the active or responsible process. Step one is enumerating mock messages. So when an application wants to access the mic or the webcam, under the hood, kind of behind the scenes, the low level APIs and frameworks will actually send a mock message to the camera or mic daemon.
So with this observation, I said, okay, cool, I can just enumerate mock messages and see who was sending mock messages to the camera or mic daemon to ascertain what process is responsible. Turns out you can't do that directly. You need special entitlements. But there's a command line utility that ships with Mac OS called LSMP
and it has the correct entitlements. So what oversight does is simply execute LSMP. It's very straightforward. You spawn a child process. Then it reads everything from standard out and parses that. Now this parsing is a little complex or at least a little involved because the output from LSMP is not designed to be read in programmatically. But we'll see when we talk about the commercial products,
this is one of the indications because they parsed it in exactly the same way. Now that list from LSMP might not be just one process. There might be several processes. So oversight had to take other actions to figure out one exact process, the active process, accessing the mic or the webcam. So step two, I observed the fact that in the IO registry,
there are several undocumented key value pairs that contain a list of PIDs that did include the active process, which was accessing the mic or the webcam. Here's the code to do that. It's pretty straightforward because we can access the IO kit registry directly. Basically just query these key value pairs,
specifically the IO user client creator under the IO PM root domain. And again, these are undocumented keys. So again, that's relevant when we talk about proving equivalency. We also have to do some parsing to then pull out the PIDs as well. Finally, because again, this list might have several PIDs as well, we do one final thing if we don't have one single process.
And that is we sample candidate processes. Read their remote memory, look at their stack traces and look at the APIs they're actively calling. Now again, macOS doesn't allow you to do this directly. You need special entitlements. But lucky for us, there's a command line utility called sample that again, we can execute
against the target remote process that we think might be accessing the mic or the webcam. And then this will give us a stack and thread trace. And what we do is we specifically look for the cmio graph do work function, which is related to reading frames off either the mic or the webcam. So via these steps, oversight was very accurately able to identify the process
that was responsible for the mic and the webcam. And again, since it was free, it became very popular. Unfortunately, this popularity came at quite a cost. So now let's talk about how oversight was torn apart and its secrets, you know, shamelessly stolen for commercial gain.
First though, you might be wondering how did this even happen? Was Patrick's computer hacked? Was the source code stolen? Now it turns out it was far easier, right? The binary is distributed. Anyone with basic reverse engineering skills could reverse engineer the oversight binary and reconstruct its algorithm. So from a technical point of view, trivial.
From an ethical legal point of view, eh, not really that cool. Again, considering this was stealing from a free tool and then utilizing it for commercial gain. Also, it's worth noting that oversight's algorithm is first and foremost, very unique. If you Google a lot of the strings, the actions it takes, there are zero hits.
Also, it's kind of janky, right? First and foremost, I'm a security researcher, not a software engineer. And to give credence to this claim, we'll see that when Apple pushed out an update, it triggered a lot of bugs in oversight. Yikes. All right, so how did this all begin? How did I even think that someone was stealing my code? Well, actually I never thought that this would happen.
Maybe I'm naive and optimistic. So I was actually looking at a binary that had been flagged by some antivirus products for a client. Turns out it wasn't malware. It was one of these kind of suspect security tools. But as I was looking at it, I noticed it was executing the LSMP binary. I'm like, that's strange. I do that, and I haven't really heard of other people doing that.
And the more and more I looked into this product, it looked more and more similar to oversight, especially when I then read their marketing material that said they provide the ability to monitor the mic in the webcam. Around that same time, Apple pushed out some updates to Mac OS, which horribly broke oversight, embarrassingly. So on the top of the slide, we have some bug reports that people submitted basically saying,
hey, Patrick, you gotta fix your tool. So I did what any programmer does, Googled for fixes, and I found other users complaining about similar issues. And I was like, man, sorry, y'all. I didn't realize this was so widespread. But reading the forums, the users came to the conclusion that this belonged to another product, another tool. And so I grabbed those tools, did some analysis,
and turned out that, again, it looked very similar to oversight. At this point, I decided to do some more proactive hunting. I wrote a simple Yara rule. Yara rules are normally used for detecting malware, but of course, we can use them to detect other binaries as well. So whipped up a simple Yara rule to basically detect oversight's algorithm,
and then ran it across the internet, and again, found some interesting hits that when I triaged looked very similar to oversight. So at this point, I had a handful of commercial products that at first glance appeared very similar to oversight. Of course, though, we had to dig deeper to prove without a doubt that this code in these commercial products came directly from oversight.
So I'm gonna hand this over to Tom, and he's gonna talk about how we were able to prove equivalency between oversight and these commercial products. All right, thanks, Patrick. So first, I just wanna give a shout-out to my wife and daughters. It's their birthdays this week, so. Thank you. So for this section, so I've looked
at a lot of Patrick's code over the years as well. So for this section, when you look at the slides, on the left-hand side is oversight, on the right-hand side is the products. And as he mentioned, there were three products that were sort of ones that we needed to investigate a little bit further. We only have time to go over two of them, so I'll try to go through them a little quickly since we're running short.
But if you look, the oversight's algorithm, as Patrick mentioned, sort of has three unique steps to it. And as we go through, sort of the first two, I think, you could make an argument that maybe someone was running these a little bit. But when we get to the third step, to me, having done a lot of reverse engineering over the number of years, I think we can reasonably think
that this was taken from the oversight. So on the first slide here, this is the first product. We have the LSMP parsing, and as Patrick mentioned, there's a little bit of jankiness going on in his code. So he basically just tries to parse this, looking for in between the parentheses. So I think there could be a use of optimization here
for some regex, but you don't see that here, right? You see literally the same processing that's going on within oversight as there is with this secondary product. We also see this going in the method name as well. So if you look at the method names, they're very similar. So I think that's an interesting point,
but it's not necessarily conclusive of taking the algorithm from one component to another. Another aspect of this was, a unique aspect of the oversight was looking at the IO registry. So when you look for these specific key value pairs that are used in oversight, they're not necessarily, you don't really get a whole lot of hits. So maybe after the talk, we'll see quite a few more.
But when we look at this, again, we see the same exact steps that are going on with oversight as they're in this particular product. Right down, looking through this, again, looking at the method names, this is a little telling that this could have come from another source.
And for the final one, for the first product, if we look, this aspect of it is the sampling, right? To me, the sampling component of oversight sort of makes it very unique, especially when we tie it back to those other two components. And for a commercial product, the sampling,
for a security tool that's free open source, doing the sampling is fine, right? You're trying to eliminate those false positives and figure out which specific process has that unique to the camera microphone. But the sampling is a little invasive, right? It pauses the process, has to go through the backtrace, and provides that for you.
And again, oversight's looking for the CMIL graph to work, which is really the worker thread for accessing the cameras and microphones. So when we see this in a backtrace, that is evidence that this particular process is accessing one of those two currently. So it gives the way for oversight to extend the first two algorithms to this third one
and eliminate all those false positives. And if you look, this is exactly the same, right? The CMIL graph do work, so it's very suspicious. So I think we can really conclude that this algorithm, as it's written, especially in the three aspects of it, are directly from oversight. And again, we can see we have the same drill here.
We look at the secondary product here. Look at the LSMP processing. In this case, there's a little bit of regex, so we give them a little kudos that they're trying to improve the performance. Maybe they saw some bugs in it and wanted to improve that, great. But when we continue further, right,
it's not just that one aspect to it. It's tying these together to get a better picture of what's actually going on. The IOR registry, again, we see exactly the same processing here with the IOR root domain going down to the correct key value pairs that we see in oversight. And finally, in this product as well,
we see the sampling, right? And sure, there's a little bit differences here. They do some sampling differences with respect to timing. Maybe that was just for improvements on performance because they are a commercial product, right? They don't want to hog all the resources, so they have to optimize that a little bit. But we see the do work method here, right? So this is sort of a dead giveaway.
And as I tell my students, one of the things you want to do is look at the binaries. Right, that gives us the ground truth. And so when we're comparing these two binaries, right, this gives us a pretty good confidence that this algorithm was taken from the oversight components. So I'm gonna turn it back over to Patrick for some hopefully better news.
All right, thanks Tom. So at this point, we clearly have pretty indisputable proof that these products directly copied from oversight. So now the question becomes, what do we do? How do we turn this into a happy or a happy-ish ending? Now obviously, I was gonna reach out to the perpetrators,
but I quickly learned there was somewhat of a winning approach. First and foremost, I found it was really important to define or articulate exactly what you wanted. Did you just want money? Did you want to disparage them and flame them on Twitter? Did you want them to remove the code, open source their tools?
Knowing what you want was important because they were always gonna ask. Then, and this is important too, create irrefutable proof. The code comparisons Tom walked through to me are 100% obvious, but understand that you're probably gonna be talking to lawyers or the intellectual property team that might not have or probably doesn't have such
an in-depth technical understanding. So provide those code comparisons as well, because they probably will go to the engineering team, but then also understand you have to have some more high level. Hey, Google this string that's in my product and your product, they're zero hits. Please give me an explanation why they're in both. Also, speaking of lawyers, highly recommend getting your own.
If nothing else, I found when you talk to the other companies, the commercial entities who stole this, and you mention you have a lawyer, they take you far more seriously. I was lucky enough to work with EFF, EFF if you don't know, amazing nonprofit organization that provides free legal resources to security researchers and other nonprofits, so a big thanks and shout out to EFF.
Finally, why is the probably reach out professionally versus flaming them on Twitter? Also then, it's good to know what corporations want, and this is generally an amicable solution. Also, and this was something of a learning experience to me, really found that the majority, if not all of the cases were a result of a single, arguably rather naive developer
reverse engine and stealing the code from oversight versus the malice of an entire corporation. When I first kind of figured this out, I was like, you know, EFF these companies, they're evil, they're stealing from my nonprofit, but then really once I gained a deeper understanding, what generally happened was the developer had been tasked
to implement a feature, they went out, couldn't figure out reverse engineering tools, and then no one asked them, where did you get that from? Also then, what do corporations generally want? There's two main things. They wanna cover themselves legally, and so often this is achieved via a licensing agreement, sometimes retroactively as well, and also they're very interested in not being disparaged,
and for both of these, they are often willing to provide financial compensation. So let's look at some win-win resolutions that came out of this, noting that all three of the corporations, the companies we approached, eventually fully admitted fault and said, yes, we stole from you, which was kind of nice. The first company, you know, really was very quick
to recognize their faults. We see in the email, they acknowledged saying, hey, yeah, wow, it's really not cool. They also took some steps that we're gonna remove the code from our products that infringed upon your utility, and then we're gonna provide you financial compensation, you know, if you can give us a license, so kind of a win-win.
The other company, again, similar win-win response, clearly acknowledged the issue, which is really nice, removed the code, and then made a nice donation to the Objective-C Foundation. So let's wrap this up with some takeaways. First and foremost, if you're a developer, don't be naive like me.
Don't assume your code will be not stolen, right? I mean, I thought, hey, my tool's closed source. Someone would have to really preemptively reverse engineer and then steal it verbatim, but it happened, right? Also, be proactive. You know, the corporations aren't gonna come to you saying, hey, we stole your code, so maybe use some of the methods
we talked about today to create YAR rules for your signatures. If your product has a very unique feature, maybe keep an eye on the competition, and if they implement that same feature, oh, reverse engineer and see how they do it. Bug reports we showed was kind of an interesting, neat way as well, which allowed us to kind of uncover some other perpetrators as well.
For corporations, it's really important to educate your employers on the topic, even just to reiterate that stealing code is not okay. I mean, I thought this would be obvious, but apparently that's not the case. And again, if you're a corporation, this will avoid serious legal issues and optics issues. Also, I think it's wise for corporations to implement various internal procedures to detect this.
Or maybe perhaps when a developer implements a really cool new feature, say, where did you get this from? I've worked at a variety of large companies, and no one ever asks this question. I'm not gonna steal code, but still, having that extra question might have avoided some of these in the first place. And then finally, it's really important
for the corporations to be amicable when someone reaches out. There was a few scenarios where they were a little defensive, really disagreed with the results. Once I showed up with the lawyers, they quickly changed their tune. And again, ultimately, when they admitted to fault, I was like, well, we could have solved this way more amicably up front.
So that's a wrap before we jump into Q&A. I just wanna recognize the amazing companies that support the Objective-C Foundation. Also wanted to thank Defcon for having us talk, and of course, all of you for attending our talk. So I think we have a few minutes for Q&A. If not, I will be around here afterwards.
Also, I'm gonna be at no-start-presses table, signing my new book at 3.30, so if you wanna come grab my signature to steal my identity, I will see you there. Thank you. Thanks.