We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Local Reasoning for Global Graph Properties

00:00

Formal Metadata

Title
Local Reasoning for Global Graph Properties
Title of Series
Number of Parts
13
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Separation logics are widely used for verifying programs that manipulate complex heap-based data structures. These logics build on so-called separation algebras, which allow expressing properties of heap regions such that modifications to a region do not invalidate properties stated about the remainder of the heap. This concept is key to enabling modular reasoning and also extends to concurrency. While heaps are naturally related to mathematical graphs, many ubiquitous graph properties are non-local in character, such as reachability between nodes,path lengths, acyclicity and other structural invariants, as well as data invariants which combine with these notions. Reasoning modularly about such graph properties remains notoriously difficult, since a local modification can have side-effects on a global property that cannot be easily confined to a small region.
Keywords
Invariant (mathematics)Interface (computing)Software frameworkDataflowTime domainInheritance (object-oriented programming)Communications protocolMaxima and minimaCommutative propertyFlow separationAlgebraRule of inferenceProof theoryGeneric programmingCompilation albumProof theoryGraph (mathematics)Inheritance (object-oriented programming)Data structureCommunications protocolInitial value problemScheduling (computing)Natural numberBipartite graphFlow separationGeneric programmingFunctional (mathematics)Maxima and minimaDataflowFlow separationSubgraphInterface (computing)Motion captureResultantInformationQuicksortRepresentation (politics)AbstractionSoftware frameworkCategory of beingParameter (computer programming)Condition numberElectronic mailing listInvariantentheorieAnalogyMereologyMatrix (mathematics)Domain nameVirtual machineTheoryRootKey (cryptography)Uniqueness quantificationState of matterFrame problemPairwise comparisonSoftware developerMeta elementAutomationLevel (video gaming)Network topologyRecursionLogicGraph (mathematics)Einbettung <Mathematik>MultiplicationLocal ringSlide ruleInferenceFormal verificationStandard deviationPresentation of a groupAlgorithmBenchmarkOverlay-NetzConcurrency (computer science)Cycle (graph theory)Process (computing)Physical systemPredicate (grammar)Regular graphDefault (computer science)Lemma (mathematics)Source codeSet (mathematics)Traverse (surveying)Rule of inferenceP (complexity)Computer programmingCodeShared memoryLine (geometry)Thread (computing)Lambda calculusTrailConnectivity (graph theory)Memory managementDialectCircleMathematicsAlgebraMonoidInversion (music)Inductive reasoningMultisetPoint (geometry)Invariant (mathematics)Correspondence (mathematics)Complex (psychology)Nichtlineares GleichungssystemDeadlockQueue (abstract data type)Operator (mathematics)Order (biology)Pearson product-moment correlation coefficientMappingComputer animation
Transcript: English(auto-generated)
Hi, my name is Siddharth Krishna and I'd like to present local reasoning for global graph properties. This is joint work with Alex Summers and Thomas Wiese. The motivation for this work is to verify data structures whose correctness depends on recursive invariance over general graph structures. For example, concurrent B-linked trees that have a tree-structured
backbone but are overlaid with lists on each level and additionally have data invariance that are global in nature. For example, the keys on a leaf depend on the keys on all the nodes in the path from the root to that leaf. We also have structures like Harris's non-blocking linked list which essentially looks like an arbitrary overlay of two lists. Nodes in these lists
exhibit irregular sharing and threads traverse in an irregular manner in and out from one list to another. We also have the priority inheritance protocol or the PIP which maintains a potentially
cyclic graph structure along with data invariance along on this graph structure. I'll tell you more about this example later on. A popular way of verifying data structures is separation logic and separation logic when coupled with standard techniques like inductor predicates is really great when you have regular inductive structures like lists and trees. Separation logic
has a frame rule which I'll tell you about in a second which allows you to do local reasoning. You can prove one component of your system without worrying about other components and this is key to scaling it up and having large automated systems for verification tools which has been very popular in the industry. Unfortunately, separation logic and standard proof
techniques based on it can't handle very well data structures that exhibit the kinds of features we saw on the previous slide. Multiple overlays, irregular traversals, recursive invariants, cyclic graphs. There's been a lot of work looking into these problems and there have been many solutions but mostly they are hard to scale and or automate. So what is the frame
rule? Supposing I have a proof that a command C transforms some initial state P to some final state Q without crashing or doing anything bad, then the frame rule tells me that if I apply the command C on a larger state consisting of P and some extra stuff F, the frame,
and crucially if P and F are disjoint, that's denoted by the star conjunction, then we know that C will run safely and only modify P to Q leaving F unchanged. This is really useful because it helps you perform this kind of reasoning. Supposing I have a data structure that
is a list and I have a command that transforms this list in some way that maintains the list invariant. So it goes from some blue list to some purple list. If I then have a program where this command is called and this program state is something larger, some larger list, some green list, then I can do the following style of reasoning. I can first break this green
list down into a blue list and a red list, you know with chopping it in two or something. I can then use the frame rule to infer that since the command transforms blue list to purple list, the red list will be unchanged. It's sort of a disjoint frame and then ideally what I'd like to do is say that okay now I can put the resulting purple and red list back together and I get back a list so my whole my entire state continues to satisfy the list invariant.
Of course this isn't always true as you might already have realized. For example, you know the command could modify the blue list into the purple list as shown here which means that although the blue list and the red list did compose to form a larger list, the purple list and the red list no longer compose to form a larger list.
In this case, of course, it's easy to see how to fix this issue. All we need to do is keep track of some additional information, namely the endpoints of each list segment. So if our overall state was a list segment from x to z and this command c transforms a list segment xy
to some other list segment purple and also with the same endpoints xy, then we know that we can recombine it with a red list segment and infer that the list invariant is preserved on the global state. So what we really want to be able to do is to do this kind of frame reasoning for arbitrary
global graph properties. If there was a systematic way to given a global graph property to derive a graph predicate like gr let's say over a set of nodes x and some interface i which of course depends on the graph property, then we can do this kind of reasoning and we can prove that the global invariant is maintained on the global set of nodes x while only showing that the
modified set of nodes x1 maintains its interface i1. This will also help us prove algorithms that do arbitrary traversals because hopefully these graph predicates can be split decomposed and composed again on arbitrary lines. But the real question is what are these
we develop a systematic methodology and a generic framework for such frame-based reasoning. We show how you can encode a global and potentially recursive invariance using the notion of flows on graphs. We then derive a generic notion of an interface and lemmas over generic
graph predicates that maintain these recursive invariants so that you can perform frame-based reasoning. Our technique handles overlays, irregular traversals and all other complex and interesting features that we've seen so far. And finally our methodology is compatible with any separation logic based tool or proof system. This notion of flows is very generic or
expressive. Essentially you can use any flow domain or commutative cancelative monoid. Our methodology is amenable to automation and extends to the concurrent setting easily
and in our paper and in this talk I'll give you a hint on how we can do all this using the priority inheritance protocol and Harris list examples. Let's look at the priority inheritance protocol or the pip algorithm in more detail. This algorithm is used in process scheduling and maintains a bipartite graph between processes
which are blue circles and resources which are red squares. There's an edge between a process and a resource if that process is currently waiting for that resource to become available and there's an edge between resource and a process if the process is currently holding that resource. The point of the pip is to prevent priority inversions which are situations
when a high priority process is waiting on a resource that is currently being held by a lower priority process and the way pip avoids such inversions is to propagate the higher priority from the first process to any processes that are blocking it from acquiring its resources.
So let's look at an example. Supposing the process P in this graph which is to acquire the resource R. Note that R is currently being held by resource N which has priority one which is lower than the priority of process P. So what the algorithm does is that it adds an edge in the graph from P to R signifying that P is waiting for R but then it updates R's priority
and transitively the priority of any node reachable from R to P's priority. In order to do this each node maintains not just its current priority that we saw on the previous slide
but also a multi-set of incoming priorities and it sets its current priority to the maximum of its default priority and the maximum value in this multi-set. So in the picture now here we show each node having a set of incoming priorities and its default priority and the current priority is highlighted in red and underlined. So going back to our example after P tries to acquire R
the algorithm calls update on R from an initial priority of a dummy minus one value to P's priority three. So what happens then and you don't have to look at the code in detail but essentially what happens is that R takes this incoming priority three adds it to its set
and updates its initial priority its current priority to three which is the maximum of its default and and incoming priorities. It then recursively calls the next node N and again passes it its maximum priority which is now three. So N updates its incoming priority set
and sets its own current priority to three which is again the maximum and so on. When this process terminates we see that all the nodes in this newly created cycle have current priority three which is the priority of P and which got propagated all around the cycle. This is avoided
priority inversion. On the other hand it has created a deadlock cycle which can be handled. I'm not going to get into how that works right now. The key invariant for verifying the pip is that the priority of every node N is the maximum priority of all nodes that can reach N. This is a non-trivial global and potentially recursive invariant over a graph structure that's
arbitrary and can potentially have cycles. This makes it really hard to do frame-based reasoning. In particular it isn't true that if I take a set to nodes x1 over which the pip invariant is true and this one set of nodes x2 over which the pip invariant is true and I put them together then the pip invariant is still true.
So let us look at how we can express the pip invariant using flows. A flow is a function on a graph that maps each node in the graph to a value from a so-called flow domain. A flow domain is in general any commutative cancelative monoid. For the pip we're going to use multi-sets of natural numbers so that the flow values at each node shown in yellow here
can represent the set of priorities of all nodes that reach this node. A flow domain also has the set E which is a set of edge functions. These are functions from the flow domain M to itself and essentially each function labels an edge and tells us how the
flow value of the source of that edge contributes to the flow value of the destination of that edge. For example all of these lambda queues are used to label, for example this process here has a default priority of one so the outgoing edge is labeled with lambda one and what lambda one does
it takes the maximum of the flow value of the source node and its default priority and propagates that over onward to the destination node. The flow is then any solution of the n is equal to some initial value which for the pip is going to be the empty multi-set
plus the union of for all other nodes n', the flow of n' and to that we apply the edge function between n' and n. Essentially if you use the edge functions that I just described to you this ends up being that the flow of n is equal to the union of the maximum priority of
the flows of all predecessors n' and you can show that the solution to this fixed point equation assigns to each node n a multi-set of priorities of all nodes that can reach the node n. We can then write the pip invariant simply as for all nodes x, x dot current priority
should be equal to the maximum of the flow of x and x's default priority and this corresponds to the invariant of the pip. Okay so how do we do local reasoning for flow-based invariants? Consider this example here from the pip where process p is trying to acquire resource r.
This will add the edge from p to r to the structure and of course that will break the pip invariant so then the acquire operation calls an update operation that takes p's priority 2 and propagates it down these nodes. When the update operation reaches this node it will stop because the priority of this node 2 is equal to p's priority. This results in the structure
shown below where as you can see the pip invariant has been re-established. Essentially update has a stopping condition that tells it when it can stop and doesn't need to examine any further nodes and it knows that the pip invariant will be re-established. To formalize this kind of proof argument we have a notion of a flow interface
which tells us for any subgraph such as the blue one here what is the flow coming into the subgraph and what is the flow going out of the subgraph. The key result is that if before and after a modification a subgraph satisfies the same flow interface the same flows coming in and going out
then we know that these subgraphs will compose with the same set of other subgraphs. Essentially the purple subgraph will compose with this red subgraph because the blue subgraph used to compose with it and composition tells us in particular that the flow of any node in the red subgraph will be unchanged and therefore nodes in the red subgraph will continue to
satisfy their flow invariance. This lets us do local reasoning because all we have to do is restrict our attention to the modified region and prove that a the modified region satisfies its new flow invariance and b that the modified region preserves its flow interface. We prove in our paper that flow interfaces form a separation algebra. This allows us to
use flow-based reasoning in almost any separation logic based logic or tool. We also derive generic lemmas for manipulating subgraphs that satisfy flow-based invariance. The second lemma here is a sort of composition lemma that tells you when you can compose two subgraphs
satisfy a flow invariant in such a way that the combined graph will also satisfy the same flow-based invariant. The crucial part are these flow interfaces the i's and this condition here that i1 plus i2 not equals to bottom tells you when these subgraphs compose and it's the sort of generic analog of the condition we saw for list segments that their
endpoints match up. I'd like to stress again that flow interfaces aren't sort of mathematical representations of the entire graph but they're an abstraction that in general captures just enough information to prove that subgraphs compose and that invariants are maintained. The upshot of all this is that we can do the sort of local reasoning or frame
based reasoning for arbitrary global potentially recursive invariants over arbitrary potentially cyclic graphs. So here's the bottom line. You have a data structure which relies on some
complex recursive global graph properties. What you have to do is you have to pick a flow domain. Usually things involving multisets are useful since the flow domains need to be cancelled later. You then have to encode your data structure invariance as no local conditions on the resulting flow. Once you do that then you have to check that your algorithm modifies
heap regions where the flow interface is preserved. If you do this and you show that the modified region maintains its flow invariant then you get for free that the rest of the graph maintains its flow invariant and you can do local reasoning. As mentioned before all this can be done in any separation based logic based logic or tool and in fact since the paper we've shown
that some of these checks such as checking that the flow interface is preserved is also automatable. You can find more details in our paper for example full details of our embedding into generic separation logics the full proof of the pip. We also show some restrictions under which you can guarantee uniqueness of flow values. This can be used for certain properties
like acyclicity. We also have detailed comparison to related work including the first flows paper. By the way we also proved the Harris list example from that paper in our paper. So our sort of flow framework is in some sense more general.
And finally in the year since our paper was supposed to appear in ESOP there's been a bunch of interesting developments. We've formalized the meta theory of flows and machine checked it in Coq and our collaborators at TU Munich have done the same in Isabel. We've implemented flow based reasoning on Iris as well as Grasshopper and Viper which are SMT based on deductive
verification tools. We've also verified a whole host of interesting benchmarks including fine-grained concurrent data structures work that appeared at PLDI last year. And so really I think it's sort of fair to say that flows can both be fit into multiple
existing separation logic tools and also expressive enough to reason about interesting and realistic data structures. So thank you all for your attention and I'm happy to take any questions.