Analyzing iOS apps: road from AppStore to security analysis report

Video in TIB AV-Portal: Analyzing iOS apps: road from AppStore to security analysis report

Formal Metadata

Analyzing iOS apps: road from AppStore to security analysis report
Title of Series
Part Number
Number of Parts
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place

Content Metadata

Subject Area
The main goal of our work is to find out a sensible way to detect vulnerabilities in binary iOS applications. We present a new fully featured toolset, that constitutes a decompiling and analyzing engine, targeting ARM/AArch64 programs, particularly iOS applications. In general, the analysis workflow consists of four steps: Downloading and decrypting an iOS application from AppStore. We introduce the iOS-crack engine that is capable of automatic downloading, decrypting and dumping memory of AppStore applications using a jailbroken device. Decompiling the iOS application. The toolset is capable of carrying out a completely automated analyses of binary programs, using the LLVM as the intermediate representation language. Unlike known binary code to LLVM translation tools, our decompilation tool aims at a high-level program semantics reconstruction. That is: program CFG reconstruction, advanced analysis and propagation of memory objects and stack pointer tracking, data types reconstructions, program data model construction. Almost all iOS application are written in Objective-C or Swift, so we also take care about precise types reconstruction and use the runtime types information in decompilation process. Static analysis of the iOS application. We introduce our static analysis framework that is able to find all common vulnerabilities of mobile applications, especially iOS applications. Representation of analysis results. The toolset is able to produce a human-readable pseudocode representation of the source binary. During the presentation we will demonstrate our analysis engine in action. We will show real-world examples of the most common security flaws and how they can be found.
Fluid statics Server (computing) Virtual memory Computer animation Mapping Workstation <Musikinstrument> Mathematical analysis
Musical ensemble Statistics Implementation Link (knot theory) Multiplication sign Function (mathematics) Mereology Field (computer science) Attribute grammar Medical imaging Semiconductor memory Knowledge representation and reasoning Business model Information security Traffic reporting Vulnerability (computing) Injektivität Binary code Mathematical analysis Shared memory Plastikkarte Planning Skewness Cartesian coordinate system System call Degree (graph theory) Process (computing) Computer animation Data storage device Linearization output System identification Communications protocol
Building Run time (program lifecycle phase) First-order logic Multiplication sign Decision theory Graphical user interface Spring (hydrology) Computer animation Computer programming Chain Universe (mathematics) System programming Software framework Figurate number Metropolitan area network
Web page Arithmetic mean Sign (mathematics) Computer animation Password Chain Speech synthesis Insertion loss Cartesian coordinate system Spacetime
Musical ensemble Computer animation Confidence interval Row (database)
Dataflow Musical ensemble Group action Digital electronics Computer file Code Transformation (genetics) Multiplication sign Correspondence (mathematics) Workstation <Musikinstrument> Source code Translation (relic) Parameter (computer programming) Mereology Semantics (computer science) Computer programming Intermediate language Fluid statics Medical imaging Causality Knowledge representation and reasoning Computer programming Information security Compilation album Mathematical optimization Condition number God Computer architecture Social class Area Multiplication Graph (mathematics) Arm Information Interface (computing) Binary code Mathematical analysis Funktionalanalysis Computer animation Network topology Data type Arithmetic progression Communications protocol Resultant
Dataflow Presentation of a group Graph (mathematics) Information Gamma function Multiplication sign Execution unit Mathematical analysis Translation (relic) Set (mathematics) Funktionalanalysis Parameter (computer programming) Cartesian coordinate system Variable (mathematics) Replication (computing) Computer animation Semiconductor memory Computer programming Business model Data type Routing Resultant Social class
Logical constant Dataflow Presentation of a group Run time (program lifecycle phase) Multiplication sign Source code Sheaf (mathematics) Translation (relic) Binary file Medical imaging Causality Computer programming Object-oriented programming Series (mathematics) Address space Social class Algorithm Graph (mathematics) Information File format Interface (computing) Mathematical analysis Funktionalanalysis Cartesian coordinate system System call Symbol table Process (computing) Computer animation Telecommunication output Table (information) Control flow graph
Information Interface (computing) Data recovery Funktionalanalysis Cartesian coordinate system Complete metric space System call Symbol table Electronic signature Computer animation Computer configuration IRIS-T Object-oriented programming Series (mathematics) Table (information) Communications protocol Resultant Metropolitan area network Social class
Slide rule Information Interface (computing) Multiplication sign Data recovery Funktionalanalysis Sequence Electronic signature Recurrence relation Computer animation Object-oriented programming Table (information) Data type Social class
Point (geometry) Implementation Data recovery Open set Semantics (computer science) Cryptanalysis Structured programming Semiconductor memory Computer programming String (computer science) Object-oriented programming Integer Series (mathematics) Complex analysis Standard deviation Information Knowledge base Cellular automaton Binary code Mathematical analysis Funktionalanalysis Variable (mathematics) Electronic signature Computer animation Data type Resultant Library (computing)
Pattern recognition Information Binary code Translation (relic) Funktionalanalysis Parameter (computer programming) Mereology Propagator Computer animation Computer programming Cost curve Business model Position operator
Complete graph Bytecode Slide rule Musical ensemble Information Code Binary code Data recovery Translation (relic) Parameter (computer programming) Funktionalanalysis Compiler Electronic signature Computer animation Statement (computer science) Structured analysis Data type Vulnerability (computing)
Dataflow Algorithm Information Code Data recovery Binary code Letterpress printing Mathematical analysis Translation (relic) Funktionalanalysis Line (geometry) Cartesian coordinate system Formal language Number Pattern matching Computer animation Semiconductor memory Core dump Interpreter (computing) Object-oriented programming Cuboid Quicksort Vulnerability (computing)
Key (cryptography) Information Mapping Software developer Funktionalanalysis Heat transfer Cartesian coordinate system Public key certificate Connected space Mathematics Arithmetic mean Computer animation Network topology Logic programming Software testing Information security Vulnerability (computing)
Random number generation Key (cryptography) Information Code Software developer Source code Physical law Funktionalanalysis Cartesian coordinate system Number Neuroinformatik Computer animation Intuitionistische Logik Personal digital assistant Whiteboard Error message Sinc function
Sensitivity analysis Android (robot) Group action Euler angles Source code Data analysis Semantics (computer science) Sign (mathematics) Atomic number Series (mathematics) Vulnerability (computing) Social class Boss Corporation Algorithm Arm Reflection (mathematics) Sampling (statistics) Special unitary group Funktionalanalysis Variable (mathematics) Netiquette Data storage device System identification Text editor Heuristic Right angle Quicksort Arithmetic progression Dataflow Game controller Implementation Momentum Link (knot theory) Computer file Civil engineering Event horizon Number Element (mathematics) Authorization Traffic reporting Booting Address space Authentication Addition Complex analysis Information Forcing (mathematics) Interactive television Mathematical analysis Database Cartesian coordinate system Computer animation Pressure
Computer animation Assembly language Line (geometry) Funktionalanalysis Price index Rule of inference Sequence Public key certificate Number Product (business) Vulnerability (computing)
Point (geometry) Dataflow Statistics Mobile app Service (economics) State of matter Source code Number Goodness of fit Positional notation Reduction of order Videoconferencing Series (mathematics) Position operator Vulnerability (computing) Engineering physics Focus (optics) Key (cryptography) Cellular automaton Forcing (mathematics) Software developer Binary code Mathematical analysis Planning Line (geometry) Funktionalanalysis Cartesian coordinate system Computer animation Personal digital assistant
I wrap Ehab map memory map Mom Buddha harm butyou home a
ion while OK as in 17
hello my name is Eva and here is an is in our and your stuff and can our kinda us to completion and if
station and has experienced servers must offer the comic gestures we also called and this source and binary including the intestine manual static analysis and we also developed analysis tools In today's talk
would like to tell you the model of our analysis tools and so we're going quite a lot of farm manual security analysis and there is a significant part of for the analysis process with the same from application to application so we decided to write a tool this field most of the work for us and our goal is to develop a tool and that would take on iTunes the publication link as an input and said you so security reports and uh the cards called as and outputs and so here's our plan
and the 1st being like to obtain an application binary then Euler translates as binder into some internal representation analyzes this representation for skewed post and then translated into time and human-readable up so the court that is this last step is the in morning because we want to show you Show vulnerability in some human-readable context but in the 1st part is a hard to get binary this so given the implication by is not as truly amazing because the whole aisles outside distributed through that Bob store at only and the biasing their group moreover as going on with auditory an upstart petition binary he still a starting with this application I as device for their devices processor and attributes and they're loaded into memory and then dumped isn't accused by from memory to make this work a room in the jailbroken as US and as the enemy they're jailbroken devices if you knew drew the application we can use it to what the Lord application well the so this whole step given by our will be done on the engine broken and as you eyes the uh so let's quickly or you what you can do image of work and I was of course not only is there can is available on the stock rose arises at 1st of all we can connect to so using this is a share protocol and get a nice much online but what is more the as the quot injection perform called the statistics trade and that you provides an API to call linear methods from around time of renewed Brian identification and also lost so as to who can from and then changes its implementation and then finally means there's quite out of the box tool to degree % down the
price occasions for the so we decided to go with
them as tho possible and just use graphical interface than convenience as the thing that's not need to 1st first-order figure out so in university figure out the chain of Mr. we decisions that to ensure initiate and the man is the lowest and then figure out how to make its ingredients the program I think the so can in our and to do that we need to work is true building as sifications says framework not at end the of story spring worse in the central petition in their ideas so the graphical interface and serine mutants there in time to make our regions like in some system by the fact you'll see to use of Stora runtime toward shared the us so we the readout this
stuff and the form
so is a chain of course so 1st of all of our to on what's new eyes on an exam uninstall slots to make space and then opens and iTunes page like this 1 that is the target up application then that there is this speech need to press this gets by after is it's kind after loss coastal sign in now have and that the revere some and we need tools to our loading password and the press the an is an the goal of his will start and relate and you this yet but I'm not I know because of open modern mean that the petition was loaded and as we can come a final step 4 and so deep-rooted is quite 1st so now we can just sure holds no OK you so here have my
phone describe and and this creates rows here so I am
going to lunch mice groups and you look kept that the now we need to include the and the end the but and and now it's confidence so that's is how it works OK and
the next part of the user what is the after that we're going to translate so the group the binary into an intermediate representation and the and the and and it was so as soon as we get the stupid binary but we need to translate it into the intermittent representation that is suitable for analysis and area presenting results so far I wanna to search the as most translation tools and compilers as well as the combined as we decided to use an intermediate representation that is more high-level them but binary got much more low-level that's so screwed that 10
% results I would have to deal with the following challenges during binary translation in particular would have gotten a lot of things to recover 1st of all we need is a separate functions from the data and at that names through this functions with correspondence to the function names from this of course the if it's possible cause for example is this is called contains a function that disables cific guest text would have to know it's meant to understand the user's semantics otherwise it will be here for a difficult task or impossible to find summaries in size called moreover would have to understand what arguments are this to the function and also we should know their wares and types most and that information that we have to recover the original semantics of the program is that the idea of in God of a flow graph for the program and the flow of the program the majority of the source code is written in Objective-C policies so we have to recover and time information in the interfaces from the bond the file that is classes so interfaces protocols and stuff the and implication mostly fight binaries that contain at least 2 circuit both images for the I'm architecture and for the US 64 conditions the 1st we support efficient for both our and uh I've successful architecture but now as developers and mobile but from all asylum 0 mostly focusing on the eyes to 4 architecture with have abundant the arm at this up in doing support it anymore well as a said before I would have to pick some intermediate representation of that suitable for representing binary program semantics of and stupa false security and was so we decided to use the other room as in their own representation of the code November provides a hint static single-segment into measuring station uh this is well suited for representing see from the programs and another to see progress and the the moral we're gonna run multiple analysis built-in where for program transformations and the transition including the earliest analysis may know them the dementors tree builder of so loose analysis and other transformations and optimisation also would have much experience with this this is conditions so our when was the perfect much for us there are is that
we implemented in our translator we implemented that have forms tool that translates iris applications to very it got a gamma functions and function calls arguments of the functions and function times regarding function for all flow graph of the program and reconstruct types and variables during the inflation and analysis we also use all information about clustering interfaces be reliable to recover
and this is the this feature how route walks molecule receives thereby replication is single busses the units and creates an application memory model based on the use of memory model we can extract the information about classes about function-centric other the glottal full flow graph or their petition the we analyze this data using arousal there isn't by is based on the information we recover it before uh and based on the data flow of the program uh so this whole with her car variables and even data sets after that 380 drain at the realm of the presentation and for but to results and you know the
image present the Intrepid binary uh answer is to distribute binary his and back in the history of the file format that is used for acute bowl analysis Goldmark or the Marco binary format was all the command until blurring of the the communication from the public access using the so right what what passes with certain formation uh about broader symbols and and time information as well as information on classes and interfaces of application for both objective seriously overseas the the once we passed the image we start the process of recovering functions and the constant flow graph of the program was for this purpose uh we developed an iterative recursive algorithm that takes as worthless with addresses so functions such as input we use the following sources of functions that addresses their our and ship or uh address from the functional style section function obtained during runtime person for example Gaussian addresses from the objective c cause definitions so or which all functions of addresses from a hand superstore C + + triple tables the algorithm recursively traverse soulful the functions at known addresses the interests of the control flow graph for issue covered function we also have to take special care of trampolines and girls here can see
an example of trouble in fall far from of series during the translation we replace this trouble with the call through the aerial of serious functions in our in our
but also and we tail call which also should be accounted for in the CG recovery this example and need a medical for the Iris application of function concludes with a temple to the option series so we should take special care about as I mentioned earlier we
recover the information on interfaces or for object of searing suits plus in particular we are able to recover classes protocols Method Man's signatures for literacy classes for course we can only recover the cluster here are the angry at Mitchell tables for this class all information on SIFT closes his loathing completion so we are unable to recover we also implemented our own demand therefore as the symbol name so to make sure that the results more human readable so there is an example of a of
recurrent Objective-C interface and there is an example of recovery also information about the object of sequence various and included signature for each method of because we can decode the signature to get precise information about government that's and the use in time type of the function next slide that shows us
an example for its use glass it was mentioned that we we are unable to recover anything but the class names and Mitchell tables in function and some missing then can be recovered from the binary so where's after we
obtain the best information about the functions we're on a series of analysis to get more precise information about the semantics of the program drink analysis the recovered the memory objects used by the program such as temporary variables a local and global variables and most important of all cryptanalysis strings we also recover types of variables and librarians In our current implementation we don't support integer fault and point types but where content implementing the recovery of complex such serious and structure that has has been already implemented in other in and out other our people In the binary it to go over and just listen to 4 year until the architects we can get more
precise results using the knowledge base of known function signatures for example standard library or wider use of libraries like open cell and now that's more or less function cygnus information for the consumers this is encoded in the binary so it can be recover it and propagated during tactical based
on the obtained the information we generate an intimate of position that preserves the semantic so during program the obtained model is optimized for father in pattern analysis for example we removed a go-it-alone Enron was some propagation paths which is very usable when I was in uh the function of utterance so let's see an example of armed to invent translation this like
contains certain example often intricacy function is of as part of the function in the binary a petition and this is how all this function was often through the use so this is you can see we we recover its names of cost functions and precise argument and
finally in this slide contains the consul full graph all the covered fashion so this is how a binary
recovery intercession is done by our translation the of so what are we
have added in byte code and the 1 tool friend vulnerabilities in it at 1st I will be decided that it would be better if we show our detector vulnerabilities inside a human readable context as so we developed a tool of Venture Commerce of the and promise the son the deceased youth like pseudo code from other called the ultimate goal of this work is to develop the compiler our but for now we are extraction all the information we can get from binary code the function names signatures school size so the arguments types that statements etc. Our reentering structural analysis of which includes so precise probably all of groups and to to at if-else statements Our it'll make these tool much closer to the compiler but c by the
court to have less information than in the objective she binary code for example we have no function the sort of the to see recovery user more interesting but in most cases so that there is enough information for the interpretation of the detected vulnerabilities on on rent application users to received and the other is a binder contains the both of Objective C L and the C functions and their own during this occurred during the binary translation we the line we function it is written on what language uh and do we propagate this lecture true of the to the court uh so depending on this likely use uh objective she or receive true a printer printing of our research also
shows that 2 of the most dangerous to vulnerabilities can be easily found by the pattern matching also but imagine this fast food sources it is important for large wineries on for the so we use the pattern-matching granted them for now for other vulnerabilities we will develop a other data-flow analysis algorithms for example taint analysis um at and don't and and we have done it already for the exude the 86 architecture of for detecting uh and memory of men's in the boxes and if form interesting vulnerabilities of we find vulnerabilities of in and the market by its cord so we want to demonstrate that a minor abroad and we map of editing the instructions to to code line numbers so when we find some vulnerable instruction of an Edinburgh court we collect as elected elected to these vulnerability into the core of can sure with discuss of
vulnerabilities we can detect now our the 1st of all of them as a key ideas applications to attend the transfer of some sensitive and the security critical information by unprotected connections and to it makes this a petition vulnerable to of men in the middle that our most of the applications of from our research and they teach these remain Becerra likely banking applications and to this day the vulnerability is a very important the for example of the application can turn all of the SSA a certificate checked jacket it's too often happens when the developer forget 2 of them to remove the test called and we can detectors such functions are
also the application can use that is to be radical to transfer some you due to data for example maps and use the etc. some information and the mean this data can be done trees and each can help as if the 2 of 2 rooms to make efficient at across change some application logic R. the Terkel or the total cost is sings in binary and so we can find these connections our I have
sometimes used so cations and sometimes use of is secure to the directed functions are like hashing ciphering should random number generator since and deformed defies will triple and whatever a source will also publication can hard-coded encryption key and the Attica can cross the occasion did this key and use it for the foreign sensitive data disclosed for all we can measure this function names and you will find this
on too on every application condemning regain excess to boards so all of the developers to I have to turn off intuitionists especially for important data Our the common error is the the truth of not to turn off and assault usage of information in law code can help Africa and this information can be said that he would you in export if you take should arise through the computer and even don't need to do break on our continent of application but have to implement the function of which describes the behavior in the got more otherwise uh this screenshot short will be saved in a petition due to this case should can contain some of sensitive information like recruited from numbers so the telephone numbers etc. and the information which is stored in of petition due to can the on of then I
in I as a gerbils can use this reflection uh they can call some methods things and that this way they tend to gain access to the primates ADI which is prohibited by Apple and is uh insecure sorry our there are here
are some report of vulnerabilities would that be a lifted don't detector now on FIL of information that is stored in a petition due to uh can bestow on arms for example uh if attitude will have physical access to device or the devices jailbroken application can store sensitive and the persisting from information or some security-critical information uh in a petition due to reading preference files being uh some local databases are or in natural pressure I the yeah the important thing here is that uh true data storage is dangerous but not the only because uh that it can lead to some of the data link is but also that these data can be done please and it is these data in his in is as important if for example these data a defined signing a petition behavior In etiquette can change the behavior for example we might approach application of each source of its made several a addressed in preference files so an editor can change this address from edge to this story is to be and of so had to carry out the momentum needles the other example is when application stores around the unsuccessful attempts of authorization atoms number in the preference files and 2 thousand and these obligations become vulnerable to uh brute force attack up auditable additional other abuses is a lot less important for identification event for revocation but it's again the front here the boss poor data litigation vulnerabilities and seems to be leakage uh can be by data analysis we can track but we must tread on the sensitive information through data flow and the the main problem here is that I became a we must so determine which information is sensitive to the store how we intended a solid telling these so a viable contains for example press so we to and suggest some heuristics or from sample using of course fine of course variable names uh then DataCite uh then of uh some EPA functions as sensitive data sources and for example semantic elements of interactive analysis to ask user rich information is sensed of this work is in progress then I each application which operates the sun sensitive information uh should work on jailbroken devices but I the standard algorithm of the right that action is taken debate by Christ a story you can uh detect these algorithms about additional this algorithm in by nickel but I think the most important to the class of the British series authentication vulnerabilities of all two-factor authentication uh we arts and complexity requirements and the uh because control over for number of a successful authorization intense the slow boot is sort of had to detect to be desert and the the they should be that on server-side the have implementation for some suicide and for Android applications and we want to apply these ideas to ideas applications the but the main ideas of our 2 brought uh is falling we detect and the counter which is responsible for a successful so attempts at authorization atoms uh number and the detector flew by civil nation function and 4 this function to them to recover uh it's complexity requirements the uh choose son before the demonstration you the jury the
full of some of the assembly so vulnerable sequence uh these uh indices we call vulnerable function function so turns off further assess certificate checked then we have translated this courtroom of to rule shows are told the and finally we have of the school and be we see that uh the Parliament is 1 uh so it's father the policy the product uh so we have another support for this x amount uh the as he was the uh source of follow the line number and the vulnerability this reuse
of this tool for a some not much of obligations what I am doing here's some statistics the potential force the statistic represent the current state of our 2 of course have false-positive false-positives uh as I said we and we enhance our analysis engine a bottle barked out to all cells of this notation that the statistics and the uh that's this is we presented to to our tools said this course of work on our find vulnerabilities in Iess applications only using its items and and represent these vulnerabilities on a few notable cynical and the future work is to enhance thank you analysis to develop data flow analysis algorithms the danger as essential uh drawn reduce number of false positives and true made of Sudakov closer to soft goods sold to a developer to compare thank you for attention any questions
thank you the the the the it's not a question just that common have you thought about them is a focus finding in the video in the body will follow by that is API keys for example for Amazon where all the services in all so if you have APA key you can use 0 then it sometimes a some do appear they include not only the squeeze the include also prior he's the the goal of this happens a lot you don't take it ultimately you know that the in the hi 1st great tool of you have any plans on releasing it of 2 source a the but it has a now unfortunately it's the proprietor tool a gay and them they showed some statistics of all many apps is that statistics based the In the couple hundreds of the hand and then I was just wondering a lot of EPs will introduce a full in the case so there will be maybe the tour to be functions for a a cisalpine checking or own I don't know other of fictionality sophistication is artist Sting start about in the tool series for functions for example if a banking applications will use them of official line is the case for example him all to that the and they have functionalities of which were up the K. so you'll have this but still meaning which is a fictional teen years he gave it it's not because it's in the source or in the binary that the application uses it you see what I could who the points where had the multiple thank you thank you and and that you the thank you
thank you