We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Helpful NullPointerExceptions - The little thing that became a JEP

00:00

Formal Metadata

Title
Helpful NullPointerExceptions - The little thing that became a JEP
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
One of the most prevalent - if not the most prevalent - exception type in Java is the NullPointerException. While Java set out to overcome the possibilities to do the mistakes one can do when programming in languages like C/C++ by not exposing pointers in the Java language, the misleading term 'pointer' sneaked into this exception. To this day, NullPointerExceptions thrown by the runtime system didn't contain messages. All you had was a callstack and a line number. But in typical expressions and statements there are several dereferences where an NPE can occur in one line. We - some engineers in the SAP team - thought this could be helped by a little enhancement. The new NPE message gives precise information about the location and tries to explain what was going on when a null reference was encountered. However, due to its prominent nature, it eventually became a JEP. In my talk I'll demonstrate the improvements that come with this enhancement. I will lift the hood a little and provide a glance at its implementation details. And finally I'll say some words about the current status and share some ideas for further improvements in the area of exception messages.
ImplementationFlagVirtual machineAlgorithmInferenceMilitary operationMessage passingMereologyPrice indexLogical constantObject (grammar)Stack (abstract data type)Variable (mathematics)Virtual realityCodePoint (geometry)INTEGRALConfidence intervalMoment (mathematics)Computer programStack (abstract data type)Multiplication signOperator (mathematics)Process (computing)Object (grammar)Product (business)CausalityLine (geometry)Machine visionCodeComputer wormLevel (video gaming)Software testingGreen's functionThread (computing)Goodness of fitMereologyAuthorizationChainSpacetimeDistribution (mathematics)Binary codeRight angleSource codeVirtual machineOrder (biology)Film editingDefault (computer science)Exception handlingMultilaterationLogicArithmetic progressionLink (knot theory)Representation (politics)Message passingArmCASE <Informatik>Streaming mediaBarrelled spaceInformationStructural loadDifferent (Kate Ryan album)Forcing (mathematics)Electronic mailing listSoftware developerParameter (computer programming)CuboidInheritance (object-oriented programming)OracleMeasurementNeuroinformatikNegative numberFlow separationLogical constantPointer (computer programming)Field (computer science)Row (database)BytecodeEmailFlagJava appletBitComputer fileAlgorithmControl flowVariable (mathematics)Subject indexingComputer animation
Message passingStack (abstract data type)Variable (mathematics)AlgorithmMereologyCodeJava appletException handlingBlock (periodic table)outputMaxima and minimaMenu (computing)MassDefault (computer science)Line (geometry)Computer fileInformationVariable (mathematics)CompilerFeedbackMessage passingBeta functionObject (grammar)Parameter (computer programming)Moment (mathematics)Local ringScripting languageJava appletSystem callException handlingComputer configurationLetterpress printingSocial classSource codeNumberResultantRight anglePointer (computer programming)CodeFrame problemSubject indexingArray data structureLogical constantBytecodeStatement (computer science)Software developerField (computer science)AlgorithmMathematicsDefault (computer science)DataflowMereologyMultiplication signInformationElement (mathematics)CalculationData storage deviceStructural loadInterior (topology)Computer fileAsynchronous Transfer ModeNeuroinformatikSingle-precision floating-point formatLine (geometry)Flow separationPoint (geometry)Arithmetic meanBitPrototypeCASE <Informatik>Process (computing)Power (physics)Poisson-KlammerDifferent (Kate Ryan album)Rule of inferenceProjective planeHydraulic motorSheaf (mathematics)Constructor (object-oriented programming)LengthInstance (computer science)Group actionFilm editingPRINCE2Mixed realityAuthorizationOperator (mathematics)ArmMetropolitan area networkVirtual machineFlagComputer programInheritance (object-oriented programming)MetreService (economics)File formatSet (mathematics)Computer animation
Convex hullPointer (computer programming)Array data structureLengthGrand Unified TheoryDemo (music)BitSubject indexingSoftware testingStatement (computer science)CASE <Informatik>Line (geometry)MereologyOrder (biology)Term (mathematics)Quantum stateLink (knot theory)Form (programming)Formal languageMultiplication signException handlingComputer animation
Computer fileAsynchronous Transfer ModeAlgorithmJava appletCodeDefault (computer science)InformationVariable (mathematics)CASE <Informatik>CompilerPrototype2 (number)String (computer science)Presentation of a groupSocial classHydraulic jumpInstance (computer science)Java appletSystem callMessage passingObject (grammar)Pointer (computer programming)WhiteboardPlanningComputer animation
SimulationControl flowNeighbourhood (graph theory)Adaptive behaviorObject (grammar)Parameter (computer programming)MappingBus (computing)Social classLocal ringType theoryCodeInformationMessage passingMereologyPoint (geometry)Cartesian coordinate systemPlotterException handlingBeta functionExtension (kinesiology)Smith chartField (computer science)Execution unitStructural loadComputer fileSystem callCASE <Informatik>Natural languageNumberSubject indexingMultiplication signElectronic mailing listRevision controlFunctional (mathematics)Transformation (genetics)Compilation albumTable (information)MathematicsPhysical lawProcess (computing)Poisson-KlammerLine (geometry)ChainOrbitPrototypeInheritance (object-oriented programming)BitMetropolitan area networkVirtual machineUnit testingBytecodeConstructor (object-oriented programming)Variable (mathematics)Standard deviationPointer (computer programming)Interior (topology)Electronic signatureRun time (program lifecycle phase)FlagCompilerSoftware testingAlgorithmJava appletMiddlewareNeuroinformatikEmailRegular expressionVarianceLogical constantSource codeComputer animation
Point cloudFacebookOpen source
Transcript: English(auto-generated)
So, hello, my name is Christoph Langer, so I'm the second in the row of three guys from SAP speaking here. So the one or the other might know me from the OpenJDK community,
you might see me on the mailing list. I try to support Andrew in maintaining the JDK 11 updates release. I think that's most of the work, which takes most of my time. But I'm doing OpenJDK at the moment, so it's doing a lot of testing, integration, and so on. But today, I actually have the pleasure to talk about the work of my colleagues,
namely Gotz Lindenmayer, who more or less did all the progress to bring this into OpenJDK, but he decided to go surfing in South Africa. Usually he's here, but this year, so the turn is on me. Okay, so as it's already a bit late, so the brains have been wrenched a little bit
over the day. It's maybe not the most sophisticated topic of the day, or at least not from how it looks like, because I mean, a little 9-pointer exception, it's not too complicated, just a little sentence that everybody can understand. But okay, there's some logic behind it, which is really quite some algorithm, let's say.
So yeah, we come to that later. So here, what I'm gonna cover, I start with how helpful are the 9-pointer exceptions currently, how helpful could they be. So I'm taking an example as motivation, talk about what we did,
what we brought to the OpenJDK, where we are there. Then it's first time, so we talk technical, try to explain the algorithm behind it a little bit. And in the end, there are a few things which still can be done and could maybe leverage this. So yeah, I'll come to that in the end.
Okay, yeah, how helpful are 9-pointer exceptions? So imagine this little code snippet. So I cut and pasted it out of some source code, line 33, we see some object with fields, and there's another field, there's some payload, there's an assignment to this payload field from another chain of object fields.
So what would we get currently? We get something saying exception in thread main, 9-pointer exception. Okay, we see the file name and the line, like line 33. But wouldn't it be nice if we knew what exactly, what's the field that's null here? Okay, so yeah, could be like this, huh?
So it cannot read field D because A2.B.C is null, so we know, okay, it was exactly this guy here. Okay, yeah, obviously 9-pointer exception could look like this, and yeah.
So this is what we contributed to the OpenJDK. Actually, the thing behind it was already implemented years ago by colleagues for our commercial edition of the JDK, it's called SubJVM, but only the SAP customers using SAP products
based on Java could benefit of it. But we were quite sure that it works, at least, and so last year, about shortly after first time, I think, the initial mail got posted on the mailing list, just to say, okay, it's a little enhancement, cannot be that thing. Yeah, and this proposal was welcomed by people,
obviously by the developers of Java and people saying, yeah, sure, about time to have it in. But it was challenged, too, because out of several reasons, so one thing is always, okay, you don't maybe wanna have too much information in exception messages, because you don't wanna leak too much data
to maybe some attacker also who can then read something, so maybe that's a concern. Then one thing is performance, so we don't have negative performance impact by such computation of messages here. And then it was also about the approach, so we did this in the hotspot, VM coding, and C++, so people stepped up and said,
hey, can we have a look? Maybe we can do it in Java with bytecode APIs and stack working APIs, et cetera. But yeah, okay, that was part of the discussion, and I don't know, it evolved. Then people from Oracle came up, and I think they motivated us to make it a JEP.
Yeah, and with the help from, I think, Colleen, Alex Buckley, and so on, so we brought it to a point where it was admitted to JDK 14, and we could push this to write a time for JDK 14 in October last year. So yeah, it's available, but it's not the default.
So you have to use this flag to activate it. Otherwise, you will still see the old SCARS message. But yeah, you can switch it on. And we, for the submachine binary distribution, we switched it on by default, so we are confident in it. And we also backported it to a submachine 11 edition.
So if you want to try with JDK 11, you can use submachine, and you don't need an additional flag. But I guess, I don't know, we can also discuss about backporting it to open JDK 11, if there's interest. Okay, so yeah, some technical insights.
The Nile pointer exception message, you see the green line. It says, cannot read field D because A2.B.C is Nile. It has two parts. So the first part is, what went wrong? Cannot read field D. And the second part, that's the more complicated part,
I'd say, you have to go back in the program flow a bit to figure out what exactly was the reason what brought the Nile reference onto the operand stack. And it's also important, at the time, we can create a message. We have a bytecode index of where we are at the method.
As we are in hotspot VM, we can query for the bytecode, for the constant pool. But we don't have the program data at that moment. We have no variable constants or things like that. So we have to live with what we have here. Okay, so then, okay, let's take some simple examples.
So really, an easy thing, you initialize an object with Nile and invoke the method to string on it, on this Nile reference, so you will get a Nile pointer exception. So this bytecode will work like this. So at first, you have the ACONS Nile instruction. This pushes the Nile reference on the operand stack.
So basically, Java bytecodes, maybe people are not so familiar with it, it works like the instructions, they push and pop things on operand stack, and it's defined for each bytecode, what it's expected before on the operand stack, and how it will look like in the end, what arguments are popped, and what's pushed there. So okay, so to do an object initialization,
you will, obviously, at first, then load the Nile. You initialize the local variable slot, the slot one, with the ACON instruction that pops the Nile reference from the operand stack, and it's stored in the slot. Okay, that was the first line of the example, then the second line does a load one.
It loads again from this slot of the variable, pushes the value on the stack, so yeah, we had it initialized with Nile, so there will be a Nile on the stack, and then we try to invoke the toString method with invoke virtual instruction. Yeah, afterwards, the reference will be gone,
but in our case, we get a Nile pointer exception, because it was Nile, the method we tried to invoke, the object we tried to invoke the method on. Okay, and now, from the failing byte codes, we can generate this textual message about the failed operation.
For the course, we have to walk back, as I said, and so that's what this algorithm starts with, is really to replay all this from the entry to the method until the point where the Nile pointer exception happened, and we have for each byte code, we have representation of this operand stack, and then from this operand stack, we have backlinks for each field to the instruction,
which pushed the value there, and so then that helps us to then walk and generate the message. Okay, so part A, what byte code fails? Not every byte code can cause a Nile pointer exception, so we have here the array operations,
so if the array you want to access is Nile, then you can say something that I cannot load from int array, for instance, element type, so the array length, cannot read the array length, so get field and put field instruction, I cannot read, I cannot assign a field. Ideally, we also have the field name from the constant pool here.
Those invoke instructions, like we saw here, then what the message would be, you cannot invoke a method. Throwing exceptions and entering and exiting monitors, they also need objects. So yeah, that's the simple switch statement on this byte code, like here, the invoke virtual,
and we come to this part A of the message. Okay, then for the second part, because why the Nile pointer exception was pushed, there's a little bit more to be done here. Okay, in the easy case, you have something like a const null, okay, then we can only say null is Nile, sure.
There are constant operations, so we know the constant that is pushed on the stack, we can just name it. So for an array, we do two things. So at first, we go back the path for the byte code that pushed the array reference. So this could also come from another field, like in the example I had,
like several field references, so we really have to go recursively back until the point where we started at. Okay, and then we have the brackets, and then we also write something about who put the index there. Okay, so get field, get field, we can write the field name, but then we have to get also the course
for the reference that the get field was invoked. Get static, and then we have class name and field name, that's the easy thing. So those invoke instructions, so that's when methods are called, then we can say, okay, the return value of the method is,
or in some places, we would only give method names like in the index computations of the array accesses, and then the load instructions for loading and storing variables. Okay, so in our case, it was an ALoad, which put the Nile reference on the stack,
and the ALoad comes from the variable object, and so then we can say, okay, can it invoke object, because object is Nile. Okay, so, yeah, there are some more things I wanna mention here about this. So one thing is this calculation of the message
is only done when we invoke the get message method of an exception. So at the time that the M generates such an exception, it really just stores the bytecode, the stack information, as it did before, so no change about that, and only when somebody calls like print stack trace or so, it will call get message, and then we will do all this.
So an unusual flow of Java VM, you should not see an impact, because, I mean, exceptions get thrown, get caught, and not necessarily printed out messages. Yeah. Okay, and the first thing was, it's also, we are talking only about
the messages generated by the VM itself, so there's also possible, like, I can do a new Nile pointer exception, or I construct an exception, but here, yeah, then the developer knows what he's doing. He would probably also enter a meaningful message already, so okay, we won't capture this.
Then there's something like hidden frames, for instance, when you have lambdas, and then there are some frames put in by Java C and VM, so those stack frames are tagged hidden, so if we hit a Nile pointer exception there, then we would not generate a message, because it's probably not so meaningful to people looking at the exception.
Then the point, it's implemented in the virtual machine, in the C++ code. Why did we do that? Yeah, the thing is, we have really everything at hand. We are in the VM, we can query for the byte codes, for the constant pool, for everything, and on the Java side, okay, there's a stack walker, and there are also byte code APIs,
but it's more complicated to get all the data we need here, so I think it was really the best way. And in the end, I'll come to that, yeah, we'll come in the end to that, maybe by this algorithm, or this part, this byte code class, we can leverage it for other exceptions as well.
So, okay, then another point, the best results you will see, obviously, when you compile the Java class with the right debug settings, so default of javac is only the lines and source information, so in the call stack, you will get the line numbers and the source file,
but you usually don't have the variables unless you compile with this minus g flag, or this minus g vastline source, whatever you can specify. So once we have the variables, then we can print them in the message if we don't have them. For locals, we have to print something
like parameter one, parameter two for the method, or for local variables, it's even more complicated because then we only have slot numbers, and so it's not really accurate, or you don't know what the compiler does where variables get allocated. Yeah, okay, so yeah, things still to be done.
So as I said, this feature is now kind of beta. It's disabled, so we are hoping to get this enabled by default one day, so maybe let's see how the feedback is, but we will try to ask the OpenJDK community if we can switch it on.
Then there's something, objects require non-null, so you might have heard about that one. So this is some place, maybe also a means of tackling such null pointer exceptions. You can, before you call into another method, you can wrap it with a call to objects require non-null.
Then at that place, before we actually enter the method, we check for the null. Then you have a defined place in the code where this could happen. Also, you would not go further down the program and see the null pointer exception at places where you really don't want it. But on the other hand, then you lose this information
because this is then a null pointer exception that the developer has written or has constructed. So yeah, our thing is not applied then. Okay, then there's this thing like single file source code mode. So there was JEP 330.
You can now, you don't need to use a Java file, compile it and invoke Java with a class file. But you can go like in script mode or so, call Java as a .java file. And then you, at the moment, also don't have this, those null pointer exceptions would also be nice
to have it here. And what you can do is you can set the VM option by parameters to the Java. But you can't modify the compiler at the moment. You can't set the minus G option to have the variable. So that's something where you have to find a solution. Then, okay, the JShare use case.
I mean, that's really something where you want to go and just prototype something in JShare. And then it would also be nice to really get those null pointer exceptions here. So if we do the ballot one and enable it as default, then JShare will have it. Otherwise, at least maybe we can make it a default for JShare. And then there's another place where the algorithm might be applied.
It's the array index out-of-bounds exception. I think we have some time left, so I can try to demo it. Then I've prepared a little prototype. I actually lost guts. So yeah, so you see here, it's a little test case
where we do out-of-bounds accesses on arrays. Maybe we can just make it a bit bigger. Okay, so maybe we go even back to Java 8 time. So if I run it with JDK 11, so you see here,
it only tells you the index, array index out-of-bounds exception, index five in line 12. So okay, it's only one array. But if you have a chain of arrays, maybe like here, array one, two, minus one. So okay, as we have here the minus one only in one place,
we can guess it must have been this array access. Okay, then there's a little enhancement already part of, I don't know, was it 10 or 11, which was contributed by guts. Was it more helpful, right? So we can see the index out-of-bounds for the length.
So the array had only the length four. It already helps a little bit more. And now if we use the same algorithm as in the null pointer exception, we'd go and this is IOB demo even more helpful. So we can really come to some statement about what index was it and the array name.
Yeah, so that could leverage the feature too, so yeah. Okay, I think I'm at the end. So if you have questions, I'm happy to answer. Thank you.
What was the question? Hey, so one use case for which I think it's important to have better null pointer messages
with respect to the objects requiring a null is wherever, basically there are places where the Java C compiler will insert those checks. I know at least two places. One is strings and switch, and the second is when you construct an inner class from an enclosing class instance. In these cases, you can get a null pointer exception,
but we won't get any benefit from this work because it's just an opaque call to require no null. Yeah, so for that actually, I also did a prototype. I brought it here, so maybe we can demonstrate it too.
So that's this class. It's okay, I mean it's really a constructed, I have to, why doesn't it, I only have this presentation mode, it doesn't jump over. Yeah, I know, that's what I'm trying to fix.
Maybe try this duplicate, yeah, okay. So yeah, this is the, this example here.
Okay, it's a bit constructed. I mean, you have a chain of require no null here, yeah. But here it would have, because with the current thing, what is this, I don't know, the standard thing. So again, I get only a null pointer exception. And yeah, I know it's like require no null
in require no null, and it comes from my line 23 in main, which is this one. Oh, but which require no null was it? And if we, yeah, we prototyped using the null pointer exception algorithm here. And so, yeah.
Here we get some message, like okay, there still occurs in require no null, but as we know, this is this method. We can jump back the stack and then do this computation of the bytecode here and come to some explaining message. So that was just an initial approach. So I think it's about time that we can discuss it
on a mailing list to refine it. I have a question. You said you need to compile it with line information, with var information. What would the message look like if the var information is not compiled with? Yeah, I think I also have something here.
Where I can, like local param. Okay, yeah. Okay, so here we have a function call with a method parameter, and also we have a local. So I have to tell Eclipse to change the compilation. Jungle compiler.
Take this knob. Okay. And then we can run it with, what is the, NP01. I always have problem local param here. Okay, yeah, here you see it.
So it would tell you parameter one. So this is okay. And then you have here local two, I think, yeah. But local two is the slot number because here in the, I think the slot zero is at this pointer to the object.
One is used for the parameter index, and then the slot two is local index. So yeah, maybe we should rather say something like local slot or so that for people analyzing this, it's clear it's not necessarily the local variable number two, because here's only one. But it's maps rather to the slot number.
But still, I mean, there's more information than before. Okay, one question. You have the information of the variable.
Is there a way to get the class information? Which class is involved? Yeah, I think we use it here. So in this example, like this. So, but the class information, that's easy. There you don't need to have the compiled and the variable information because the classes with their fields and their names,
that's part of the constant pool at runtime. So that's easy. I was just wondering, this would be a bit susceptible to damage if something byte code transformed a method and injected code into it. Have you looked at what would happen if that was done? I'm not just thinking of something like ByteMan,
where I could probably break this very quickly, and it would be very rude of me to do so. But I was thinking where you've got code transformers like middleware that does the changes to code. How much would that make this less meaningful? Actually, I don't know how much gets assessed in those cases. I mean, there are also things like, it's not about Java.
It's also like other languages or so, which compile down to byte code. So I would think it's still kind of useful to see more information, at least like before. But yeah, it's something to be evaluated probably. So you answered half of the question I was gonna ask by showing the requires null thing.
Why wouldn't we want to do that in all cases if a null was passed in as an argument, walk back up until we get to a field load somewhere, and say it came from this call further up the stack, and that's where that null actually boiled out and came into this method? Because right now, I think normally it's just local,
the details are just local to the method, right? Yeah. Yeah, maybe in the case where there's no custom message, which we don't want to overwrite, yeah, maybe we can think about that. Or maybe we can use an annotation or something
where you want to see it, but then you have to annotate methods. So that was brought up in the discussion for this object's require null support, that that would be one way to tackle it, but yeah. Would it be able to tell me that the parameter or the local is an int in this case? Again, the local is an int, yeah. The type of the variable, if I have a bajillion locals,
but each one has the super long class name. I think yes, you have access to the signature of the method, so that should be possible. Gets more complicated with the stack slots again, because then you have wide slots, or just one slot, it depends. So then it's really, it's a problem. But for the method signature, it should be there.
Quick question on the status. So you mentioned that at a CVM machine, you have this in version 11. Yeah. I know that Red Hat, being a standard of 11U, doesn't necessarily object to new features
like Shenandoah being backboarded. Now I wonder if this is one of those jobs that could be backboarded to 11 users. Yeah, I mean. I mean, have you discussed this, or is this? Wasn't that? We discussed this, so we didn't bring it up to the table yet. I mean, we were just happy in October that, okay, finally it's in. So that's settled a little bit. But okay, we can start. I don't know if you can backboard
something like that, though. Because this does change things. Existing application may not expand those exceptions, isn't it? Yeah, maybe if you bring it to 11, it would be good idea also to keep it under the flag and don't enable it by default, definitely. By default, no matter.
No. Time's up. So I implemented this in Android five years ago, or it became public five years ago, but not to the extent that you've done it here. And the biggest pain in getting this enabled was people who had unit tests where they would
capture the exception, put it into a file, and then all of these golden file tests all broke. And you keep mentioning changing the message and things like that, and I just keep imagining all of these unit tests breaking and breaking again. Yeah, that's amazing. I mean, even for the SAP machine, when we enabled it, we had to go and,
I mean, we were running JT reg test regularly and some things, so we had to change a few places. Well, the golden file had a simple 9-point exception. Okay? So I think time's up. So, thanks. Sorry about that. Questions? Thank you.