Partial Order Reduction for Deep Bug Finding in Synchronous Hardware
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 22 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/55027 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
00:00
Group actionSystem programmingPrinciple of relativityHalbordnungData modelComputer programSymmetry (physics)Exponential functionQuadratic equationAlgorithmState of matterCounterexampleHeegaard splittingSet (mathematics)NumberOrder (biology)Physical systemConfiguration spaceSoftware bugSemantics (computer science)SequenceDifferent (Kate Ryan album)Group actionEndliche ModelltheorieStandard deviationSymbol tablePotenz <Mathematik>Software frameworkConstraint (mathematics)Helmholtz decompositionPartial derivativeBound stateConcurrency (computer science)AlgorithmUniqueness quantificationComputer hardwareSpacetimeMultiplication signCondition numberMaxima and minimaIdeal (ethics)SubsetSymmetry (physics)Regular graphOcean currentArithmetic meanLibrary (computing)Transitive relationState transition systemTracing (software)SynchronizationVariable (mathematics)Pattern languagePower (physics)Computer configurationRun time (program lifecycle phase)MultiplicationOperator (mathematics)Order of magnitudeElement (mathematics)Model checkingDynamical systemPoint (geometry)Symmetric matrix1 (number)Mathematical inductionCorrespondence (mathematics)ExpressionBitResultantComplex numberLine (geometry)CASE <Informatik>Clique-widthOpen sourceData integrityPower setConnectivity (graph theory)AdditionFinite differenceParameter (computer programming)ImplementationDecision theoryProcedural programmingFunctional (mathematics)Predicate (grammar)Partition (number theory)Representation (politics)Cartesian coordinate systemProcess (computing)Computer animation
Transcript: English(auto-generated)
00:01
Hello, my name is Vakai, and I'll be discussing our work on partial order reduction applied to model checking of synchronous hardware, in particular for finding deep bugs. We're going to use one of the standard symbolic transition system models, where X is a set of current state variables, X prime is a corresponding set of next state variables, and then we have the normal I of X, which is the initial state
00:24
constraints, and T, which is a transition relation encoding the dynamics of the system. We'll use P of X to denote a property, and a state S is a full assignment to the current state variables. In this talk, we'll also assume that the symbolic transition system is functional, meaning given all of the current state variable values, there exists exactly one next state.
00:47
In addition to the standard symbolic transition system model, we'll also use something that's not so standard for synchronous hardware. We're going to define actions. These actions are predicates that hold when the system performs some operation, and the action might also have
01:01
an enable condition. This means that the action can only be taken if the enable is high. This is built on top of the symbolic transition system, so given a symbolic transition system, you could look at it and decide to split up the different operations it can perform into actions. In a synchronous system, you can perform multiple actions at once, and so we define
01:23
an instruction set to be all the action configurations, or the power set of the actions, and enable conditions are lifted in the natural way. So if you have actions A within your instruction I, that instruction is enabled if each action is enabled. I'd like to emphasize
01:41
the difference between asynchronous and synchronous systems within our framework. For those of you familiar with partial order reduction, you'll know that it's typically applied in the asynchronous case, which makes this distinction very important. In our framework, an asynchronous system would have interleaving actions, and this means that you only apply one at a time. These systems,
02:02
such as concurrent programs, tend to have syntactic hints about any symmetry in the system. In a synchronous system within our framework, actions can be applied simultaneously. That's why an instruction is a member of the power set of actions, and so you have to consider every possible configuration of each of the actions' true or false values. We're very interested in
02:24
pairs, adjacent pairs, of instructions in this work, and it'll be important that there's a much larger number of pairs of instructions than there are a number of pairs of actions. Additionally, synchronous systems don't tend to have syntactic hints about any symmetry within the system. It's more of a semantic symmetry. Now let's consider our potential problem.
02:45
Let's say we have a system with two actions. What happens if we unroll for bounded model checking run up to k? There are many possible action configurations when you consider this unrolled trace. At any step, a0 or a1 could be enabled or disabled, so you have 2 to the k possible
03:05
configurations, and if you generalize this to multiple actions, that would be the number of actions to the power k. This can be very difficult on the solver, as it has to consider an exponential number of possible action configurations. In particular, this might be
03:24
too much work, because you could have symmetric actions where when you start from the same state, let's say you start from some state s0, and you apply a0, and we're going to assume that all the actions are enabled here, you get to state 1, and then if you apply a1, you get to state 3.
03:42
Now maybe you could have applied them in a different order, a1 and then a0, and still reached the same state. Now this kind of semantic symmetry can be very hard to identify, however if you have a lot of number of actions and a lot of symmetry, this can be very difficult for a model checker, even though there's actually not too many states to explore overall,
04:04
and because there's many symmetric paths reaching the same states. Now the question is, people know that this can happen in concurrent software, but can this really happen in hardware? Well here's a simple example. Let's say you just have a FIFO implemented
04:20
in hardware, and you have operations pop that remove an element from the FIFO, and push that adds an element to the FIFO. Well let's consider two possible traces. You start in a state where you have one data element, d0, in your FIFO. In the first case, you pop, which makes your FIFO empty, and then you push the value d1, and you end in this state here. Instead,
04:45
if you had pushed d1 first, then you would have a state in the middle, which has both items, and then you would pop d0 off if you pop next. The final state is exactly the same, but there are two ways to get there, and if you imagine a bounded model checking trace,
05:01
which is extended to a large unrolled to a large bound, this could cause a lot of problems, and we actually saw this in practice. So what would you want to do? The idea of partial order reduction is to reduce the number of paths. This was originally pioneered for explicit state model checking, and it was later extended to the symbolic case for concurrent programs,
05:22
but it's mostly focused on asynchronous systems. With this same diagram, the goal would be to one of these paths and disallow it. Notice that we've only disallowed the second transition, because we don't want to rule out S2. So if there were a bug in the state S2,
05:40
this could be the end of the trace and you would still hit it. But we're not ruling out S3 even by removing this transition, because you can still get there through this path, and the other path was just another symmetric way of reaching the same state. It's also optional to include a guard. This gives you a more expressive partial order reduction,
06:01
which means that you only disallow a transition if some guard holds on that state. It allows you to do kind of more targeted conditional pruning of paths. And now that brings us to the ideal partial order reduction. If you had a system with a lot of symmetry, the state space might look something like this, almost a grid. Now imagine one symbolic step
06:24
considering each individual transition. You could add a constraint to the transition relation to rule out some of that symmetry. Here we say if the instruction I0 is enabled, but instead you apply I1, then you can't apply I0 in the next transition. And in some sense,
06:43
you're just giving preference to I0. So if you could apply I0, you have to apply it first. But you can still switch to I1 later. And so the point is, if you're interested in some state, let's say your starting state is here and you want to reach this state, you can't get there through this path anymore, or this path, but you can still get there through here.
07:04
So again, the goal with partial order reduction is to rule out as many paths as possible without ruling out any reachable state. And so even with these transitions disabled, you can still reach any state that was reachable in the original system.
07:22
Something to notice here is that if you don't have the option to apply I0, you have to be careful to not rule it out in subsequent transitions. So in this case, this transition does not exist. There's no I0 here. So you might apply I1, but now you can't disallow I0 or else you wouldn't be able to get to these states over here. That's why this
07:44
enable condition is here. So if you didn't even have the option to apply I0, you still have to allow applying it after I1. And in this case, if you're interested in this state, for example, you can still get there through this path. So any reachable state is still reachable. This is the goal. We want to apply this to our system. And if you had a guard, that's just
08:06
a more nuanced case. Perhaps there's only symmetry in your design under certain conditions. And so you can guard this constraint with some arbitrary condition. And that's the only time where you disallow these transitions. This is a good time to recall the difference between
08:25
asynchronous and synchronous systems in our framework. If you remember, the asynchronous systems have independent interleaving actions and a synchronous system has instructions. So an instruction is some configuration of actions where you can have multiple enabled.
08:41
And the important thing here is there are a large number of pairs of instructions, exponential in the number of actions. And what this means is that it's very challenging to apply partial order reduction for all action instructions. For all instructions, I mean. Because there's so many of them. So that all the different pairs overwhelms the solver if you
09:03
try to enumerate all the symmetry. Additionally, since you can apply multiple actions at one time, many pairs have no symmetry. So if you take our FIFO example, if you push and pop versus just pushing, there's not really any symmetry there. The symmetry exists between a single push and a single pop, depending on the state of the FIFO. So our solution is something called
09:29
reduced instruction sets. And the idea here is we want to find a new instruction set. So recall the instruction set was the power set of actions. And you're considering applying multiple actions at once. It might be that not every configuration of actions is necessary to maintain
09:44
the reachable states of the system. So if we find a reduced instruction set that still allows reaching any state that was reachable in the original system, and we hope that that instruction set is much smaller than the original one, then we can now apply partial order reduction by first limiting the actions or the instructions that you can apply, and then
10:06
using partial order reduction on this smaller instruction set. It turns out that automatically finding a minimal instruction set is very difficult, because you need to show that the system with IR can simulate the system with the full
10:21
instruction set. So we focus on a special case, just splitting instructions into sequential actions. The idea for sequentially splitting instructions is just the most straightforward way of reducing the size of an instruction. For example, say you had an instruction that was applying A0, A1, and A2 simultaneously. It might be that you could apply A0 and A1, which gets you to some
10:47
intermediate state S1, which was already a reachable state because A0 and A1 is a valid instruction, and then apply A2 and get to the same state. If this is true for any starting state that's reachable, then you could just disallow the A0, A1, and A2 instruction, and
11:05
instead only apply A0, A1 together, followed by A2, and now you've been able to rule out one of the instructions from your instruction set. So we have this high-level algorithm,
11:20
sketch, and the idea here is to show that you can reduce your instructions to a smaller set. You want to start with some C, which is equal to the number of actions, and it's a bit of a proof by induction where C is decreasing, and you want to show that C's simultaneous actions can be split sequentially into smaller instructions. In this case, we're going to try
11:42
to say that C can be simulated by I0, where I0 is some subset of the original instructions, and it's less than C, the size of it is less than C, followed by I1, where I1 is just a single action and still reach the same final state. If we continue this process in our algorithm and
12:06
decrease until C equals one, then we've found a reduced instruction set where only one action is applied at a time, and this is much more like an asynchronous system.
12:22
So the big idea here is we want an algorithm for a targeted for all exists decision procedure. That is, we'd like to show that for all reachable states and for all instructions, such that the number of actions in the instruction is equal to C, there exists some decomposition where I0 is a subset of I, and then there's a single action
12:43
A that was in the original instruction, such that applying I0 and then A reaches the same final state in all cases. Note that this algorithm does not need to return the decomposition, we're just trying to show that we're not ruling out any reachable states, and so we don't actually need to know what the exact decomposition is,
13:03
just that it's possible. We need to know which instructions are kept in I and the reduced instruction set I are, but we don't need to know exactly how it's decomposed and in what order. In particular, different instructions of the same cardinality might require different delayed
13:23
actions, and to handle this, the algorithm uses counterexamples to determine this automatically. So for example, let's consider two different instructions that have size 3. So the first one is A0, A1, and A2. The second one is A0, A2, and A3. So in the first, perhaps the best
13:44
decomposition is delaying A2. So you have A0 and A1, and then if you apply A2 after that, you reach the same final state. Now this instruction is also size 3, it also contains A2, but maybe delaying A2 doesn't actually result in the same final state. So in this case, you have
14:02
A0 and A2, but followed by A3, and then you reach the same state. And that's what I mean when you don't need to return the decomposition. So knowing the difference between these two is not something that the algorithm has to return. It does have to consider these cases, but it's not something that the user needs to know. The user only needs to know that any
14:23
instruction of size 3 can be decomposed into a smaller instruction followed by a single action. This is our sequential splitting technique. The overall methodology of our technique is to first manually annotate the actions of the system. So you want to take a transition system
14:41
representation and partition the functionality into different actions. You then use the algorithm to find a reduced instruction set, or to just show that one is possible, and then a more complicated system might need user guidance. Then you restrict the system to only use instructions from the reduced instruction set. And finally, you check all the
15:02
unique unordered pairs of instructions in the reduced instruction set, which is smaller, for path symmetry. And for any of them which are symmetric, you can rule out one of the paths. And if necessary, you can suggest guards manually, if there's certain conditions that need to hold for symmetry. And we had an approach in our tool which you could supply
15:23
guard hints, and it would try to find a small subset of them to use. So our experiments are going to be a set of parameterized hardware designs, such as FIFOs and arbiters. And we also have a commercial library component which are also FIFO implementations that we're going to evaluate on. And we're checking data
15:42
integrity to make sure that no packet is dropped or reordered. And all of these have injected deep bugs in them. The parameters, these are parameterized designed, and we swept the data width and the depth of the FIFOs from two to 128 by powers of two. And we just ran with and without partial order reduction coupled with reduced instruction sets.
16:04
So here are the first set of results. These are on the commercial designs. This line in the middle is if they ran for the exact same amount of time. And we have regular BMC on the x-axis and with partial order reduction and reduced instruction sets on the y-axis. And then dotted
16:23
lines are an order of magnitude improvement. So here you can see that on small runtimes, which happen to be the small parameters, that it's faster to just run regular bounded model checking, which makes sense because there's overhead in our technique. But as the parameter sizes increase and the runtimes grow, you see a big speed up. And here you're even
16:42
getting two orders of magnitude. And you have lots of timeouts for a regular BMC, but not for the ones with our technique. Of course, there are still some that time out for both. And we see a similar pattern with our open source designs. We get up to multiple orders of magnitude runtime improvement. And you have timeouts at two hours for regular BMC,
17:05
but without our technique. This is also true for the arbitrated FIFOs. Although they seem to be more clustered, you still get a similar impact. And by the way, the n number here is the number of FIFOs. And the parameters are swept for all the FIFOs in the design.
17:21
Thank you for your attention. And please feel free to reach out with any questions or comments.