We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Practical Machine-Checked Formalization of Change Impact Analysis

00:00

Formal Metadata

Title
Practical Machine-Checked Formalization of Change Impact Analysis
Title of Series
Number of Parts
22
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Change impact analysis techniques determine the components affected by a change to a software system, and are used as part of many program analysis techniques and tools, e.g., in regression test selection, build systems, and compilers. The correctness of such analyses usually depends both on domain-specific properties and change impact analysis, and is rarely established formally, which is detrimental to trustworthiness. We present a formalization of change impact analysis with machine-checked proofs of correctness in the Coq proof assistant. Our formal model factors out domain-specific concerns and captures system components and their interrelations in terms of dependency graphs. Using compositionality, we also capture hierarchical impact analysis formally for the first time, which, e.g., can capture when impacted files are used to locate impacted tests inside those files. We refined our verified impact analysis for performance, extracted it to efficient executable OCaml code, and integrated it with a regression test selection tool, one regression proof selection tool, and one build system, replacing their existing impact analyses. We then evaluated the resulting toolchains on several open source projects, and our results show that the toolchains run with only small differences compared to the original running time. We believe our formalization can provide a basis for formally proving domain-specific techniques using change impact analysis correct, and our verified code can be integrated with additional tools to increase their reliability.
Keywords
Physical systemLinear regressionCore dumpSoftware testingMixed realityMathematical analysisSoftwareRevision controlVertex (graph theory)Graph (mathematics)Set (mathematics)ApproximationStrategy gameSubgraphData modelDisintegrationCodeInterface (computing)Proof theoryLinear regressionSoftware testingEndliche ModelltheorieSubgraphVertex (graph theory)Presentation of a groupProof theoryPhysical systemSet (mathematics)Computer fileOpen setMathematical analysisStrategy gameMathematicsQuicksortRevision controlGraph (mathematics)Context awarenessComponent-based software engineeringComputer programCodeFacebookResultantPrime idealMultiplication signFunktionalanalysisApproximationWorkstation <Musikinstrument>Graph (mathematics)Content (media)Data structureLevel (video gaming)SoftwareCASE <Informatik>Theory of relativityPartition (number theory)Selectivity (electronic)Category of beingDifferent (Kate Ryan album)Continuous integrationSoftware developerFile systemProgramming languageAirfoilTask (computing)BuildingProjective planeInformationMatrix (mathematics)Electronic mailing listMereologyTerm (mathematics)Public domainInterface (computing)Java appletOperator (mathematics)Differenz <Mathematik>Sound effectFigurate numberCore dumpPoint (geometry)BitPredicate (grammar)Personal digital assistantLibrary (computing)Algebraic closureValidity (statistics)Expert systemConnected spaceLatent heatLine (geometry)Uniform resource locatorBasis <Mathematik>2 (number)Formal languageDirection (geometry)Computer animation
Transcript: English(auto-generated)
Welcome to this presentation for the paper, Practical Machine Check Formalization of Change Impact Analysis. I'm Karl, and this is joint work with my colleagues Akhmet at Facebook and Milos at UT Austin. So before I tell you about our formalization, let me give you some context for this work. And what we're considering is building and testing in software development.
So in a typical workflow, a programmer submits a diff or a change to a software system through a version control system. And the version control system in turn invokes a continuous integration system which builds the software and performs other tasks such as running tests and other things.
And typically this continuous integration system uses build systems like Make, Bazel, Dune or Cloud Make and may also use regression test selection tools like Ecstasy and Starts.
And the key point of using these kinds of tools is to scale over the size of the change rather than the size of the code base. So you don't want this to take a really long time just because you have a lot of code. You want it to be because the change was large.
And one way that to deal with this is to perform change impact analysis. And we take this as the activity of identifying potential consequences to change the software system. And also after this analysis, we want to use this information to perform fewer tasks
than we might have to do without this analysis. So we save some effort. And this kind of analysis is at the core of a lot of these build systems and tools. And usually when you want to reason that your technique,
your build system, for example, is correct, you will have to reason both about the domain you're applying it to and also the general parts of change impact analysis. And the question we asked in this work was whether we could separate out the reasoning on the change impact analysis
so that this can be sort of done once and for all. And then a domain expert can solve the other problems, for example, for the Java language where you might do test selection and say that this is correct. And our contribution is that we present this for a model
of change impact analysis. And this is done in the Coq proof assistant. So we provide a library of definitions and proofs. And from this code in Coq, we also produced a verified tool called Chip. And then we evaluated Chip by integrating it with building and testing tools. And so now we're getting to this part
where we present our model and the basic concept in our model is of a component. So a component can be thought of, for example, as a file name. And we think of the set of components before and after a change is enacted to a system. So we call the set of components before V
and then we call it V prime afterwards. And then we have a set of artifacts. And the intuition for this is that this is the content of a file. And this may of course change when change is done. So we track this with two functions called F and F prime
that map these components to artifacts before and after the change is made. So, and then we have some kind of dependencies between these components. And this is stated by these dependency graphs, G and G prime. And these are just binary relations on these component sets.
And then we have some set of checkable components that we can run some operation on. So in this case, we call this operation check and we assume that it's a side effect free. So this might be running a test case or invoking some kind of test running program.
And based on this kind of initial concepts, we can define some derive concepts like a modified vertex. And we say that a matrix is modified if the artifact changed. So we can use this by evoking these functions.
And then we can define what it means for vertex be impacted. So this means that it's reachable in the inverse graph. So we sort of take the graph and then we flip all edges and then we see whether we can reach something from a modified vertex and then we say it's impacted. So of course, a modified vertex is going to be impacted.
And then finally, we have the set of fresh vertices. So these are vertices that were added in this revision that were not present before. And the key idea here is that when we take this set of both impacted and the fresh vertices and then run this check on all the executable vertices
and the union of these two sets, then we have done all the work required. So that's kind of the main idea here. So let me give you an example here. So here we have a dependency graph on the left-hand side in the before state, and it has six components out of which two, these five and six are executable.
And then we perform a change, namely we change the artifacts for this vertex one or this component one. And then we sort of flipped the edges of the graph and we see that, oh, one, three and five are actually impacted by this. And then in this case, it was only five
that needed to be executed or checked. So all we need to do here is to run check five. So we don't, for example, have to run check six. So we might save a lot of time and money by doing this analysis, okay? So the correctness, of course, we have formal statements, which I will not show you,
but the intuition behind it is this, that if we only execute impacted and fresh vertices, this is actually sound and complete. So we are checking precisely the vertices we need to check and all the other things that we did not do have the same results as before. So this, of course, we can approve this out of thin air. So we have to assume some kind of sanity properties.
So specifically, for example, we have this property that the direct dependencies of a vertex are the same in both revisions, if the artifact is the same. So if kind of your dependencies or vertex, direct dependencies change, then you also have to change the vertex. And this is kind of typically what happens
in most programming languages. And a vertex with the same artifact in both versions is checkable in new version, if and only if it's checkable in the old version. So basically you can turn something into a checkable vertex or non-checkable vertex if it was checkable
without changing the artifact. So this is, again, some of its property. And finally, we have that the outcome of executing a checkable vertex is the same in both versions if everything kind of in the closure is the same. And this, of course, does not hold in practice sometimes. For example, if you run a test two times with exactly the same system,
it might return different result because of non-determinism and so on, your test is flaky. So here we assume that this does not happen. So this sort of a key assumption to prove correctness. Now we're moving to a hierarchical analysis. So, and usually this is when you run make,
this is what you should have in mind. So you have a set of coarse grained components or files in this case. And then these are connected inside these files or some kind of fine grained components like methods or other kinds of structures. And the files of course are somehow contained
or the methods are contained in these files. So our model sort of extends to this by, we have these now two sets of components and one is the coarse grained set, one is a fine grained set. And then there's a partition from the coarse grained to the fine grained ones.
So, and then of course we have two dependency graphs and we have, we can use the Shane impact analysis of U to analyze the components B and the graph,
this key bottom here. So we have two key strategies to do this. We have the over approximation strategy, which is kind of running everything in, we sort of figure out which files have been impacted and then we sort of execute everything inside those files,
regardless of whether that changed. And this is sort of what makes us or typically what makes us. And then we have the composition strategy where we're a little bit more clever. So we sort of do this, we find the impact of the vertices in this U set and then we sort of figure out a subgraph on the V sets.
And then we perform change impact analysis in this subgraph. And then we can sort of avoid running check on these kind of fine grained components that did not change while getting some kind of boost from our analysis of the U sets.
And the correctness is the same here as essentially the same here as for the basic model. And of course there are some assumptions here that are kind of sanity properties on this partitioning. So for example, if we have two different coarse grained components
and there are two fine grained components that live in each of these and they are related, then of course the coarse grained components have to be related as well. So this is, again, if you're using a file system analogy, this would absolutely hold. And you also have that if these functions
for the coarse grained, if the artifacts for the coarse grained sets are the same, then they have the same partitioning. So again, this would absolutely hold in a file system world but maybe not in other domains. And finally, if you have the same artifacts
for the coarse grained sets, and then for all of the things inside this coarse grained component, then they have to be unchanged as well. So you can't have a method that's changed when the file is unchanged, for example. And let me give you some idea what the encoding copy
so I won't give you too many details, but it's around to the specification of 5,000 lines of proofs and it uses a finite sets and graphs. And the definitions are quite similar to what I just showed you in terms of all sets and so on. But you have some notations like connect here,
which is translate closure. That's a little bit different. And we use some kind of set, some kind of predicate here to define whether something is fresh or not. And this is the union operator. And to evaluate our formalization,
so again, we prove these things under the assumption. So now we extracted some executable code from this and we define a general interface and the tool called chip. And then we integrate the chip with three tools. So one regression testing tool, one regression proof selection, one regression test selection tools,
one regression proof selection tool and a build tool just to kind of show, to cover all basis. And then we compare the results and the time taken to run a chip, the tool with the chip and without chip on a bunch of projects. So here is the list of projects we use.
For example, here, we use the set of Java projects for ecstasy, which is a Java tool. And it's quite big projects. You can see URLs here on GitHub. And then for Coq, we used three quite big projects or at least three big projects and one smaller one.
And for the build system top, we used these projects, which use top as a build system. We used a bunch of revisions to run chip on. And here you can see the results for ecstasy and chip. So you can see that the time increases a bit.
So we went, for example, here from 188.67 seconds to about 195 seconds, so a bit slower. And you can see the change back to analysis time also went up a bit. But the results were actually all the same. So we selected the exact same things. So this was some kind of validation. And overall, the time was not that much worse.
So we did not have to pay too much to get the verified components. And here are the results for iCoq, the regression proof selection tool and chip. And again, we have the story where we increase a bit in the time and sometimes a bit more. But overall, the results were the same and the increase was not, in most cases, not too bad.
And finally, we have the build system. And here we have quite small changes in most. So here we only report the change back to analysis time. You can see that the changes were quite small.
In most cases, so this is milliseconds, but and the results were all the same. So in conclusion, we're presented this from the station of change back to analysis using the core persistent and we integrated, we took a verified code, integrated it with build and test tools. And we sort of validated that in some sense
that this model is capturing something that we think it is. And we provide all the code on GitHub and feel free to ask questions of the presentation. And also feel free to contact me or open issues on GitHub if you have questions specifically about, for example, the Coq code,
which I did not go into detail much.