We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Verifying OpenJDK’s LinkedList using KeY

00:00

Formal Metadata

Title
Verifying OpenJDK’s LinkedList using KeY
Title of Series
Number of Parts
22
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As a particular case study of the formal verification of state-of-the-art, real software, we discuss the specification and verification of a corrected version of the implementation of a linked list as provided by the Java Collection framework.
Keywords
Source codeBoolean algebraPrice indexEquals signRevision controlCodeFormal verificationSoftware testingInvariant (mathematics)Parameter (computer programming)Field (computer science)Gaussian eliminationChainLemma (mathematics)Subject indexingElectronic mailing listChainPointer (computer programming)Point (geometry)Lemma (mathematics)Object (grammar)TheoremBitWell-formed formulaSoftware bugLibrary (computing)Real numberTerm (mathematics)Cartesian coordinate systemProof theorySoftware frameworkInvariant (mathematics)Standard deviationNumberMultilaterationJava appletDesign by contractComputer programmingSocial classSequenceOperator (mathematics)Formal verificationFocus (optics)Run time (program lifecycle phase)DistanceTheorySubject indexingLatent heatIntegerBuffer overflowEndliche ModelltheorieExecution unitKey (cryptography)Loop (music)QuantificationCASE <Informatik>Film editingPhysical systemoutputOcean currentField (computer science)Presentation of a groupMoment (mathematics)ImplementationSource codeDigital video recorderSoftwareInformationInstance (computer science)CodeMereologySoftware testingKeyboard shortcutException handlingElement (mathematics)Open sourceSemiconductor memoryModeling languageTraffic reportingCategory of beingAllgemeine AlgebraTraverse (surveying)Cycle (graph theory)Parameter (computer programming)Programming languageRadical (chemistry)Data structureError messageComputer animation
Transcript: English(auto-generated)
This is a video recording of a paper presentation for the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The paper that I'm presenting here is submitted to TACAS 2020, and it is the verification of OpenJDK's linked list using key.
My name is Hans-Litur Hieb from Centrum Viskunde and Informatica in Amsterdam. The research agenda behind the work is to study the correctness of real-world and widely used software. So you could immediately think of the Java programming languages, widely used programming languages in the industry,
and we in particular focus on the Java collection framework because it's a library that is used by lots of applications. And in this work, we have been looking at the linked list. How do we study the correctness of such software? For that, we make use of the key theorem prover.
It is a theorem prover that inputs Java programs that are annotated by the Java modeling language. So these are assertions put into the comments of a Java program, so without modifying the Java program that you want to verify.
The theorem prover is automated and interactive, so parts of how it works can be done automatically. However, it is indispensable in certain cases to have the capability of interacting with a theorem prover. In particular, a good point of the linked list is that OpenJDK has, in the effort of OpenJDK,
the source of linked list became public, so we could study it. And in general, this holds that studying the correctness of software requires the source code of that source software to be public.
So let's just immediately start with the news that there is a bug in Java's linked list. Well, this is not very surprising. I mean, there are more than 800 bug reports for the collection framework in Java, and we've been focusing on this particular bug for a number of reasons, and I'll come back to that later.
First of all, what is a linked list? It's a doubly linked list, and it implements a number of standard methods that every collection in Java collection framework implements, such as contains, signs, and adds. An example of a linked list looks like this. Here you see a linked list object, which is empty.
It has a first pointer to some internal node, a size field, and a last pointer to some internal node. And now whenever you add an object, say this object here, to the linked list, then internally a node is created to which the first and last pointer point, and the size is updated to one.
Whenever you add another object, say it's the same object, so the nodes point to the same object, then we create a doubly linked list out of the internal nodes. So as you would expect that a collection contains objects, if you are given a linked list instance,
if you call the add method on that object, that afterwards, whenever you query whether it contains the object you have added, it should return true. But in actuality, you create a new linked list, you do some operations like adding null objects to it,
and then at some point you add an object and contains returns false. You would expect that if you are given a linked list, that whenever you query how large it is, that it would never be negative, it is not possible to have a negative size. But in reality, you have some linked list, you perform some operations on it,
you assert that its size is not negative, which succeeds, you add an object, and now you assert that the size is not negative, but that fails. So what's going on here? So let's dive a little bit into the source code of the linked list, and we can do this because it is open source.
So if we want to call contains, contains is implemented in terms of indexOf. And indexOf itself consists of a loop which looks through the nodes, the internal nodes of the linked list, from the first towards the last.
And it checks whether the item contained in the node is equal to the supplied object. If so, it returns the current index accumulated, and otherwise it increases the index. And here you see the problem, an integer overflow could occur. And that's the problem in the linked list as it currently is implemented.
There is a mismatch between the cached size that is stored in the linked list instance itself, and the actual size being the number of nodes that comprises the linked list. They could differ at some point. And it turns out that this bug is already more than 20 years old.
It was introduced at the moment that the linked list was introduced in the collection framework in 1998. However, the bug wasn't activated yet. It only became active at the moment that Java supported 64-bit architectures, because this now allows you to create more than 32 bits, 32, the number of objects that you can store in a 32-bit integer.
And this is the integer field size that the implementation uses. And that's also how you can reproduce the bug. It requires a lot of memory.
So what have we done? We have supposed to fix and bind its, or make sure that the size never outgrows whatever can be stored by a signed 32-bit integer in the cached size field. And then we ask ourselves, can we now verify that the linked list is correct?
So verifying the correctness of the linked list requires you to do a lot of effort. So in our case, it has taken us approximately seven man-months. The workflow we've been doing, we've been following along,
is you first try to come up with a specification. And this is done in a Java modeling language. And then given the specification, you load it in p, you start verifying the program. And if your verification fails for some reason, you can go back to the specification step and refine your specification.
For instance, your specification of, say, a loop invariant wasn't enough to show that, say, a loop terminates. Another possibility is that the verification fails because there is an error in the program.
And in that case, you can generate a test, and the test demonstrates the presence of the programming error. Now, after you have detected the error, you can revise the code. And after the code revision, you go back again to the verification step.
This is, in short, the workflow we've been taking. And in particular, it's important to say that the specification part of our work took way more than the verification part. Whenever the specifications are all set in stone, the verification can be done in a week.
What the specification comprises are the method contracts associated to all the methods of the linked list. We have to formulate a class invariant which holds at any point in which a particular method is not actively invoked. And we also have to introduce additional conceptual storage,
which is necessary to do the verification, but which does not exist at runtime. These are ghost fields and ghost parameters. Then, during verification, we can automatically generate and verify certain verification conditions, but sometimes it is indispensable to use interaction,
where you can introduce new kinds of formula which shortens the proof significantly. Using a cut of the proof, we steer the quantifier elimination, and we can perform equation reasoning. And as a human, you can put your intuition into the system way more efficiently than what can be detected automatically.
So some proofs could not be done automatically. So to show you a little bit about what we've done in modeling the linked list, I want to tell you a little bit about what the structure is. So this internal node, it consists of a previous reference.
It's an item reference to an element and an extra reference to a node. And then the linked list itself is constructed by forming a chain of nodes. And now a chain is a sequence of nodes, such that the first node in that sequence does not have a predecessor. The last node in the sequence does not have a successor.
And for all the nodes, the predecessor is S given in the sequence of nodes. So for instance, the last node, its previous field indeed points to the node that occurs before it in the sequence. And the same applies for the next field of the nodes.
So the next points to the next node in the sequence. This sequence now is a conceptual unit. It does not exist at runtime, but it is necessary for us to do verification. So now a linked list can be seen as a first and a last reference, except for its size as well.
And it is either the case that the linked list is empty and then both first and last are null, or it is non-empty and then both first and last are not null. And there is a chain between the first and the last node. So this can be modeled in key using JML.
Now, during the proof of certain important methods, such as the add or the contains method of the linked list, it is necessary to add further information during the proof. But this is information that is implied by this invariant of the structure. So to demonstrate a little bit abstractly what the key property of the linked list is,
is to say that every such chain is acyclic. So that means that whenever you start at the first and you traverse through next pointers, you do not encounter the same node again. So there is no cycle whenever you go from the first through the next.
And the same thing holds, of course, also from the last with the previous traversal. So to show that this lemma holds, we can prove it by contradiction. So suppose it now holds, then we can derive at some point a contradiction. So suppose we have two nodes, i and j, in the sequence.
Then, if they are the same, then you can imagine we have built a chain as shown. We have a node i and we have a node j. And then whenever you move to the next node, if these are the same, if you move to the next node on i, then you must obviously also move to the same next node from j,
because i and j are the same. So this, for any k up to the end, allows you to conclude that all these nodes would have been the same. But then if that is the case, by some rewriting you can find out that there is a particular distance from this node k back, which should be then the same.
This can now be found out to be contradictory, because the last node has the next pointer null, while this intermediary node, since it is intermediary, does not have a null next pointer, but points to the next node. And null is not equal to a not null node. Therefore, we reach a contradiction.
And this argument, now abstractly explained, can also be performed within the key theorem prover. So to conclude our work, we have fixed and verified a real world software, namely Java's linked list, and we have used the key theorem prover and JML for doing that.
If you want to know more about this work, please read the paper we have submitted. It contains lots of interesting detail about the whole process, and I hope that you enjoy.
In case of any questions, feel free to reach out to any of us, and thank you for your attention.