Effects of Program Representation on Pointer Analyses — An Empirical Study - TIB AV-Portal

Effects of Program Representation on Pointer Analyses — An Empirical Study

00:00

2

Related Material

European Joint Conferences on Theory and Practice of Software - ETAPS - e. V.

Prakash, Jyoti Tiwari, Abhishek Hammer, Christian

Formal Metadata

Title

Effects of Program Representation on Pointer Analyses — An Empirical Study

Title of Series

24th International Conference on Fundamental Approaches to Software Engineering, FASE 2021

Number of Parts

10

Author

Tiwari, Abhishek

Hammer, Christian

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/55081 (DOI)

Publisher

European Joint Conferences on Theory and Practice of Software - ETAPS - e. V.

Release Date

Language

Content Metadata

Subject Area	Computer Science
Genre	Conference/Talk

24th International Conference on Fundamental Approaches to Software Engineering, FASE 20211 / 10

Automatic playback

Speech

Text

Image

00:00

Representation (politics)Fluid staticsSingle-precision floating-point formatComputer programObject-oriented programmingChainAbstractionMachine codeAddress spaceSoftware frameworkResource allocationRun time (program lifecycle phase)Java appletContext awarenessSensitivity analysisBounded variationLoop (music)WebsiteSystem callMathematical analysisObservational studyAverageMetric systemVariable (mathematics)Factory (trading post)Constructor (object-oriented programming)String (computer science)Parameter (computer programming)Exception handlingExpected valueSteady state (chemistry)BenchmarkPerformance appraisalVariable (mathematics)Observational studySoftware frameworkComputer programmingMetric systemHierarchyState of matterObject-oriented programmingSystem callDifferent (Kate Ryan album)Java appletRun time (program lifecycle phase)Client (computing)Social classNumberDataflowChainForm (programming)AbstractionSound effectSensitivity analysisMemory managementProgrammanalyseResultantOrder (biology)Knowledge representation and reasoningWebsiteSubsetResource allocationIntermediate languageCASE <Informatik>2 (number)Combinational logicLibrary (computing)Context awarenessDebuggerTotal S.A.VirtualizationMathematical analysisPoint (geometry)Pointer (computer programming)Type theoryDynamical systemAverageInstance (computer science)LengthArithmetic meanMultiplication signSource codePairwise comparisonEndliche ModelltheorieMachine codeContext-sensitive languageAddress spaceBenchmarkImplementationExpected valueLatent heatBounded variationParameter (computer programming)1 (number)Set (mathematics)Identity managementFront and back endsLevel (video gaming)Cartesian coordinate systemFactory (trading post)Computer animation

Transcript: English(auto-generated)

00:02

Hello, everybody. I'm Joji Pragash. I'm going to give a talk on my recent work, the effects of program representation on point analysis. This work has been done with Abhishek and Christian from NUS and the University of Boston. So the scenario is that we have some researchers who

00:22

implement their new program and this idea using our frameworks, say, Voila. Now, they want to compare it with the existing state of the art analysis which have been implemented in different frameworks. And these frameworks vary on various aspects of program representation, which sometimes make the comparison difficult.

00:45

So at the end, it becomes comparing between apples and oranges. So what are the differences in program analysis frameworks? The first is library modeling. Certain frameworks favor scalability over soundness,

01:02

and therefore they exclude some of the libraries from their analysis scope. For example, a framework may choose to exclude some of the Java runtime libraries or some framework specific libraries. The second is program analysis framework always

01:23

work on an intermediate representation to reduce the burden of synthetic sugar in the source code. And the program representation mainly is in two forms. One is the 3D score, and the other is a static single assignment.

01:44

Structurally, these two program representation are different, and the effects of these two program representation on program analysis are unknown. And the last is heap abstraction.

02:04

This defines how a framework represents the abstract allocated heap down the call chain. Some frameworks choose to have the call chain along with the heap, and some don't.

02:26

So the aim of this work is to study the effects of this aspect on a program analysis. And why we are interested in doing this? Because sometimes this affects the comparability of results.

02:42

So for this work, we choose to study pointer analysis. Pointer analysis is a very simple static analysis, which returns the map from variable to heap objects. That means after pointer analysis, we can say that which variable refer to which heap objects

03:03

in all possible executions. And this is quite foundational static analysis, which had applications in many others, such as dynamic binding, type state analysis, call graph, and many more. Since it is very foundational, so the imprecision in pointer analysis

03:22

also affects the imprecision of the client analysis. For our study, we choose the analysis with varying precision with respect to context sensitivity. On one aspect, we have a method called context sensitive,

03:41

which disambiguates method calls in the same function. And these two come in two flavors. One is a call size sensitive analysis, which disambiguates method calls based on their calling context, and the other is an object sensitive analysis, which disambiguates method calls based on their invoking objects.

04:08

And orthogonal to it is the abstraction for heap allocation. And for this, which is a simple context sensitive heap allocation

04:22

and context insensitive heap allocation. In the first case, we confine the length of the calling context to disambiguate method calls, that means. And in the other case, we choose a simple one length of heap allocation context.

04:45

And the frameworks that we choose for our study are Doop, Valla, and Soothe. Doop is a pointer analysis framework, while Valla and Soothe are general purpose program analysis, and Valla and Soothe are used as a front end to Doop to extract the intermediate representation.

05:02

The benchmark that we use for our study is the Decouper Benchmark Soothe, which is standard for program analysis. So the first question that we asked is, how is the library modeling varied across different frameworks? To this end, we instrument Doop to use Soothe and Valla class hierarchy,

05:29

and run Doop with these two class hierarchies. We discovered that in many cases, precisely seven out of 11 benchmark applications,

05:43

the class hierarchy formed by Soothe is a subset of the class hierarchy from Valla, while in others, it is not the case. So there are certain classes in each of the class hierarchy of the frameworks which are exclusive to their framework.

06:07

Following previous result, we want to see what are the effects of these differences on a program analysis. And the existing metrics were insufficient for our study.

06:22

One of the existing metrics that is used in literature is the average points to set size. It is a simple average of the total number of heap objects pointed by the variables to the total number of variables in the program. But the problem is that this metric combines all the aspects of program analysis

06:44

and does not isolate each aspect for the study. Therefore, we devised two metrics for our research. The first metric we devised is a class hierarchy metric, where we restrict the variables to the classes that are common to both the class hierarchies.

07:06

This is defined by the total number of objects in referred by the variables in the common class hierarchy to the total number of variables in the common class hierarchy. And we take the average of these two parameters.

07:28

So for the study set of two ones to evaluate the class hierarchy metric, we choose the analysis with Doop using Soothe and Valla as front-end.

07:40

And we observe that the differences produced by this class hierarchy is negligible. For example, in Valla, there's a minor difference of 10 to the minus 2 for six benchmark applications, while in Soothe, for only one application, it also has a difference of 10 to the minus 2,

08:01

which shows that the differences or the effects of this class hierarchy is negligible on program analysis. And the next we come to the intermediate representation. So to motivate this example, we have a factory class which returns two instances of a class.

08:28

And we can see that in the case of Valla, we only have like two variables, but in in the intermediate representation by Soothe, we have three variables.

08:47

And for the small program, we can see that the number of variables are different across different intermediate representation. And comparing this aspect was challenging because the average

09:02

contains the total number of variables in its average, which makes it difficult to compare to ratios. So to this end, we choose a client analysis, which is virtual method call solution.

09:26

So in the first step, we restrict the method calls where the number of variables differ across frameworks. We define these methods as interesting methods. And then within this method, we restrict to the invoking variables at virtual method calls.

09:43

That means the variables which are involved in the virtual method calls. And this is guaranteed to be equal, given that the virtual method call sites at program in the programs is the same. And then we compute a simple average of the objects referred by

10:11

the variables at virtual calls to the total number of variables.

10:24

And for this, we set up Doop with the Valla, which uses SSA form and with Soothe Jimple, which is a three address code as front end. The expectation was since SSA is locally flow sensitive. So there should be higher precision than the three address code.

10:43

But unfortunately, the results were different. We saw that approximately 90% of the methods are interesting in each benchmark, which means that this aspect cannot be ignored. And still then the differences in between the two frameworks are

11:03

the negligible of the order again 10 to the minus two. So to see the last question that how is the heap modeling buried across

11:21

different frameworks, we set up the study with identical heap abstraction. That means if the heap abstractions are identical across two different frameworks, then they should create the same number of heap objects. So to this end, we compare the application level heap objects

11:45

of Valla and Doop with a one call set sensitive and with one level of heap abstraction. What we observe is quite different that the heap abstraction given,

12:05

although they might look similar from the outset, but these are not the same. We saw that in many cases, the differences is as big as 14 times between the number of heap objects created by each individual frameworks.

12:23

So to conclude this, we can say that the heap abstraction, though, may look similar from the description, but are not comparable in the implementation of these two frameworks. So to conclude, we started with discovering the library modeling variations in two frameworks,

12:47

and then we've evaluated it on the class hierarchy metric, where we found the differences are negligible. And then we evaluated it for the intermediate representation,

13:01

where we find the differences are nearly negligible. And at the end, we saw that the class hierarchy representation in two frameworks are not the same here. Practically, even by the description, they may look similar. So this brings to the end of my talk.

13:21

I'm happy to take your questions.