We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Discourje: Runtime Verification of Communication Protocols in Clojure

00:00

Formal Metadata

Title
Discourje: Runtime Verification of Communication Protocols in Clojure
Title of Series
Number of Parts
22
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This paper presents Discourje: a runtime verification framework for communication protocols in Clojure. Discourje guarantees safety of protocol implementations relative to specifications, based on an expressive new version of multiparty session types. The framework has a formal foundation and is itself implemented in Clojure to offer a seamless specification-implementation experience. Benchmarks show Discourje's overhead can be less than 5% for real/existing concurrent programs.
ImplementationFormal verificationRun time (program lifecycle phase)BendingDynamical systemBenchmarkLibrary (computing)Formal languageLatent heatAlgebraic closureDynamical systemNumberBitProgramming languageComputer programmingParallel computingSlide ruleRun time (program lifecycle phase)Goodness of fitTelecommunicationGroup actionProcess (computing)Mathematical analysisMereologyCommunications protocolImplementationFormal verificationWrapper (data mining)Event horizonConcurrency (computer science)Shared memoryState observerType theorySystem programmingComputer programmingCompilation albumMultiplication signConnectivity (graph theory)BenchmarkExpressionExterior algebraState of matterPersonal digital assistantOverhead (computing)ConcentricCartesian coordinate systemLevel (video gaming)Message passingBlock (periodic table)Right angleFormal languageLibrary (computing)Open setComputer architectureFlow separationAbstractionUniverse (mathematics)Student's t-testClassical physicsPlanningException handlingPoint (geometry)Revision controlPresentation of a groupCalculusNatural numberNetwork topologyComputer animation
Concurrency (computer science)Shared memoryDynamical systemFluid staticsFormal verificationImplementationMacro (computer science)Formal languageSystem programmingState of matterUsabilityJava appletArchaeological field surveyControl flowEmailThread (computing)Loop (music)Metropolitan area networkAlgebraic closureLatent heatSlide ruleMessage passingComputer programmingTelecommunicationRun time (program lifecycle phase)Library (computing)Programming languageProjective planeImplementationGoodness of fitFitness functionGame theoryRepresentation (politics)Power (physics)Arithmetic meanInterleavingAxiom of choiceIntelligent NetworkInsertion lossState of matterHash functionOperator (mathematics)Closed setBitShared memoryRecursionType theoryNumberRadical (chemistry)Variable (mathematics)Square numberQuicksortUsabilityGeneric programmingSoftware developerFormal verificationElectronic mailing listArrow of timePoisson-KlammerConstraint (mathematics)Programmer (hardware)Functional programmingPredicate (grammar)SpacetimeOverhead (computing)Core dumpChannel capacityBuffer solutionGroup actionIterationPoint (geometry)Subject indexingThread (computing)Macro (computer science)Formal languageComputer animation
Ring (mathematics)Coordinate systemFluidAerodynamicsKernel (computing)Revision controlJava appletBenchmarkComputer hardwareRead-only memoryProcess (computing)Software testingConcurrency (computer science)Overhead (computing)Computer programTheoryFormal languageLibrary (computing)ImplementationCore dumpResultantExtension (kinesiology)Formal verificationBenchmarkIntegrated development environmentSuite (music)Revision controlParallel portMechanism designMultiplication signSlide ruleProduct (business)Run time (program lifecycle phase)Process (computing)Combinational logicVirtual machineLatent heatNumberSoftware1 (number)Cartesian coordinate systemQuicksortSoftware testingNetwork topologyComputer programmingCommunications protocolType theoryRing (mathematics)Overhead (computing)ImplementationComputer animation
Transcript: English(auto-generated)
Hello, my name is Sungsik Jongmans. I'm assistant professor at Open University of the Netherlands and guest researcher at CWI in Amsterdam. And my student Ruben Hammers and I, we've been working on runtime verification of communication protocols in Clojure. The plan for this talk is to give a brief overview of the tool that we developed called Discourse. But before we get started,
I really do want to emphasize that a large majority of the material in our TACAS 2020 paper was developed by Ruben. So he definitely deserves most of the credit. In general, my long-term research aim is to design and implement new foundations and tools to make concurrent programming easier. And in this presentation in particular, I'll concentrate on dynamic analysis of application level
message passing communication protocols on shared memory architectures. Now, the motivation is that in recent years, several modern programming languages like Go and Rust and Clojure, they've all started to offer channels as a programming abstraction for shared memory to make concurrent programming easier. But at the same time, by now there's also evidence
that channels have their own issues too. So in a nutshell, the aim of this work was to provide some kind of tool support for this. Okay, now the problem can be described as a classical verification challenge. So imagine that we have a specification S and an implementation I, such that the specification prescribes the following elements. First, we have the concurrent processes
that the program consists of. Second, we have the communication channels that the processes can use to send and receive messages to and from each other. And third, we have the communication protocols that need to be followed. So for instance, in natural language, we could specify that first, a number needs to be communicated from Alice to Bob, and then a number from Bob,
either to Carol or to Dave, et cetera, right? So we specify a whole tree of admissible communications. Now, assuming that we have such an S and I, the question is then, how to ensure that the implementation is safe and live relative to the specification, where safety and liveness can be understood in the classical sense that bad communication actions,
according to the specification, never happen in the implementation, while good communication actions eventually happen. Now, in this talk, we'll look at runtime verification as a method to answer this question. And perhaps I should also clarify that my own background is actually in static analysis. Okay, so over the past years in particular,
I've been working a lot on multi-party session types, abbreviated NPSD, which constitute a behavioral type system to reason about safety and liveness at compile time. Now, potentially this NPSD approach is really quite powerful, but the trouble is that it's also a bit limited in expressiveness. So the reason we initially got into runtime verification
as an alternative was just to explore how much more expressive we could get by using dynamic analysis. Okay, now in more detail, here's a graphical one slide summary of the runtime verification approach that we use. So at the bottom, we have the implementation of a concurrent program. And on this slide, it consists of three concurrent processes,
Alice, Bob and Carol. At the top, we have a specification. Now, to verify the implementation against the specification, we need to add two ingredients to this picture. First, we add a thin wrapper around the implementation called the instrumentation. The only purpose of the instrumentation is to make certain events of interest observable.
And in our case, the events of interest are of course, the communication actions that the processes perform, right? So it's the sends and the receives. Second, we put the specification in a new runtime component of the program called the monitor. And the purpose of the monitor is to actually verify every send and receive
that can be observed through the instrumentation against the specification in an event-driven fashion. So more concretely, this works as follows. Imagine that we start executing Alice, Bob and Carol in this example. Then at some point, for instance, Alice may try to send number four through its channel from her to Bob. And on the slide, this action is denoted
by AB exclamation mark four. Now, right before the send actually happens, the instrumentation will quickly intervene, interrupt the send and temporarily block Alice. Then while Alice is blocked, the instrumentation will ask the monitor if the communication action is actually allowed by the specification.
Now, upon receiving this request, the monitor will consult the specification and if the send is indeed allowed, it will inform the instrumentation that everything is indeed okay. Besides, the monitor will also update the specification to its remainder, so to speak, as if it makes a transition, right? So we're executing a specification
as similar to a state machine, so to speak. At the same time, the instrumentation will unblock Alice and allow the send to actually happen, okay? Now, sometime later, Bob may try to actually receive the value and again, the instrumentation quickly intervenes and asks the monitor if the receive is allowed. The monitor will check the specification,
see that it's okay and inform the instrumentation accordingly. So everything is still fine. But again, sometime later, Bob may try to send value true to Carol, but now this is apparently not okay according to the specification. So the monitor will inform the instrumentation of a violation.
And in this case, the instrumentation will not allow the violating communication action to actually happen, but throw a runtime exception instead. And the key point here is that all of this is enough to ensure safety, but in contrast, liveness is a bit more difficult to guarantee using an approach like this. We do have some ideas to explore, but it's beyond the scope of our TACAS 2020 paper.
Okay, so here's a more precise overview of our contributions. Our paper consists of two parts. In the practical part, we first introduce a specification language for communication protocols. This is essentially a closure library to write specifications in a domain-specific language and define monitors.
Second, we present an implementation language, which is essentially another closure library to add instrumentation to closure programs. And finally, we describe non-trivial examples. And we also report on a number of benchmarks to evaluate the amount of overhead that our dynamic analysis inflicts on executions at runtime.
Okay, so that's the practical part. In the theoretical part, we formalize the specification and implementation languages into calculus. Okay, so the specification calculus is based on global multi-party session types, but more expressive, while the implementation calculus is essentially a miniature version of closure, including the channel library.
Now in the rest of this talk, in the interest of time, I will skip the theoretical part of the paper and only briefly summarize our practical contributions in a bit more detail, okay? So our practical work targets the closure programming language, and closure is a version of Lisp on the JVM. Now there are several reasons why closure is interesting for us.
First, closure supports shared memory concurrency, and it has a core library for channels. So, well, it fits the premise of this project very well. Second, closure is dynamically typed, which makes it a good fit with runtime verification. Third, closure has a powerful macro system, which has made it possible to embed the specification language in closure itself.
So, well, this should be nice from a usability perspective. Fourth, in the yearly closure developer survey, closure programmers indicate that ease of development is generally more important to them than runtime performance. So they might be slightly more inclined than, for instance, embedded C programmers
to pay for communication safety guarantees with a bit of extra runtime overhead. And finally, closure was the seventh most loved programming language in 2019. So it's really not a niche language anymore that it used to be. Of course, this is not extremely important from a scientific point of view, but for us, it was more interesting to conduct this research
for a mainstreamish programming language. Okay, now to explain how our tool works, let's have a look at the simple tic-tac-toe example. In this example, there are two processes, Alice and Bob, with two channels between them. Furthermore, Alice and Bob both have their own private copy of the game grid. Now at runtime, it works as follows.
So suppose Alice goes first and puts an X in some space of her own grid. Bob can't see this, right? So Alice needs to inform Bob about her move explicitly. So she sends the index of the space to Bob. Now Bob subsequently receives the message and updates his own grid accordingly. Next, he selects and puts an O in a blank space
of his own grid. Then he sends a message to Alice, et cetera, right? So in this way, Alice and Bob continue to take turns to play the game until Alice makes a winning move and Bob is informed accordingly. Now at this point, Alice and Bob both know that the game is over, so they close the channels to free up resources
and then they terminate, right? So this is how the program works. Now, the specification for this program looks as follows in our DSL. First, we define two conceptual roles identified by A and B to represent Alice and Bob. Next, we define an auxiliary specification identified by TTTCL
and we'll use it later in the main specification and it has the following meaning. The DSL keyword indicates that we're writing a specification. The par keyword prescribes free interleaving of its operands and the three hashes operator prescribes the close of the channel from the first operand to the second operand. So if you put all of this together,
then this auxiliary specification simply prescribes that the two channels between Alice and Bob are closed in no particular order, okay? Now the main specification looks as follows. It's a bit more complicated than the auxiliary one, but the general anatomy is still the same, right? So we again start with the DSL keyword. The fixed keyword prescribes recursion
where colon X is the recursion variable. Square brackets indicate sequencing. The arrow keyword indicates a communication through the channel from the first operand to the second operand that additionally satisfies a constraint on the message, right? So in this example, we have a type constraint, but in general, it can be any Boolean predicate.
The alt keyword indicates choice and finally the ins keyword prescribes insertion of an auxiliary specification. Now if you again, put all of this together, then the main specification states that first a message of type long, so a number is communicated from Alice to Bob.
Next, there is a choice to either close the channels and terminate or to continue. In the latter case, another message is communicated from Bob to Alice. And finally, there's again a choice either to close and terminate or to continue recursively, okay? Now intuitively, this is quite a simple communication protocol, but actually it's not supported by existing MPST tools,
right? So this example Tic-Tac-Toe, it has actually been sort of our motivating example to do this work. Okay, so let's briefly also have a look at a Tic-Tac-Toe implementation enclosure. So first we need to have a bunch of generic Tic-Tac-Toe concepts.
So for instance, we represent the game grid simply as a list and we have some functions to find the blank space on the grid and to update the grid and to check if the grid is not final. I mean, all kinds of basic stuff, right? Next, we import the core library of closure that provides channels. And this library is called closure.core.async.
Using the channels function of this library, we define two asynchronous channels with a one capacity buffer identified by C1 from Alice to Bob and C2 from Bob to Alice. And so now the only thing that we still need to do is to actually define Alice and Bob. And if we first look at Alice, she's basically implemented as a thread
and the final action of this thread is to close the outgoing channel, okay? But before Alice does so, she executes a loop and in each iteration of the loop, she first finds a blank space on the grid and puts an X on that space. Next, she sends the index of that space through channel C1 to Bob. Then she checks if the grid is not final.
If so, she tries to receive a message through channel C2 and update her grid accordingly. And if at that point, the grid is still not final, Alice enters another iteration of the loop. If the grid is final in contrast, then Alice breaks the loop, closes channel C1 and terminates. Now, Bob is defined similarly,
so let's not go through those details as well. So what we now have on the slide is basically a full implementation of tic-tac-toe in closure. And it's important to emphasize that this is actually all pure closure, right? So we haven't used our tool yet to write this implementation. Now, to add runtime verification to it
and really start using our tool, there are basically three steps. First, instead of loading closure's core library for channels, we need to load our own library. Second, we need to construct a monitor for the TTT specification from the previous slide. And third, we need to add instrumentation to the channels. And more precisely, we annotate the channels
with the intended sender, the intended receiver and the monitor to perform the analysis, okay? Besides these three small changes, nothing else needs to be modified. And in particular, the implementations of Alice and Bob, including the generic tic-tac-toe concepts, they can remain exactly the same as they were.
Now, when we execute a program, the monitor will verify all sends and receives and completely unexpectedly, we actually find an unsafe execution of the tic-tac-toe program in this way, right? So that was a good surprise to us, right? Okay, now, this slide shows a few other examples of specifications that we support.
The interesting thing here is that they involve parameterization in the number of processes. And this is traditionally quite difficult to do with multi-party session types. Now, to give some examples, we can specify a parameterized ring network or a parameterized star network and many other parallel topologies as well. But the details of those, well, they're not really important in this talk.
The last thing I want to show is some experimental results because, well, conventional wisdom is that runtime verification may inflict serious monitoring overhead, right? So we also wanted to study to what extent this is true for our tool. Now, we had a look at four existing concurrent programs from a third-party benchmark suite called NAS parallel benchmarks.
And essentially what we did is we just wrote specifications for the communication protocols in these programs. And we extended the existing reference implementations with monitoring, okay? So then once we had done that, we ran the implementations both with and without monitoring on a sizable multi-core machine
for increasing numbers of processes to also investigate scalability. And we repeated all these runs 50 times to smooth out variability. Here are the results for each of the four programs. The horizontal axis shows the number of processes while the vertical axis shows the slowdown of the monitored versions relative to the unmonitored ones.
And I just want to highlight two observations. First, if we look at the middle two charts, then the overhead of using monitors can be less than 5%. And this was really encouraging for us to see, especially since 5% seems low enough to use this technology in actual production environments as well, sort of as a fail-safe mechanism.
At the same time, the charts on the two sides, they show that overhead can also be substantially higher, up to four and a half times with 16 processes. Now this is definitely too much for production environments, but it should be acceptable just for testing and debugging. So we basically identified two possible usages of our tool.
Okay, so this concludes my talk. Here's again the slide with a summary of our contributions. Regarding future work, I think there are at least two interesting avenues. First, we want to investigate support for liveness in combination with a mechanism to automatically recover from violations. The second thing that's interesting to us
is verification of specifications, because it's not always easy to get the spec right, actually. Okay, so that's all. Thank you for your attention.