Classics Never Get Old: Two Easy Pieces For GraalVM
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61523 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Gamma functionMathematical optimizationMedical imagingVirtual machineCompilerSinc functionAsynchronous Transfer ModeProjective planeBootingPoint (geometry)Spring (hydrology)Open setRevision controlDistribution (mathematics)Compilation albumPower (physics)Default (computer science)Complex (psychology)Software development kitObservational studySet (mathematics)Remote procedure callGravitationLimit (category theory)Process (computing)Computer animation
01:55
No free lunch in search and optimizationSpeicherbereinigungLibrary (computing)Social classCompilerExponential functionJava appletVirtual machineCompilerProjective planeJava appletCodeFormal languageSpring (hydrology)BootingThread (computing)Default (computer science)ImplementationMemory managementService (economics)MereologyParallel computingSemiconductor memoryCartesian coordinate systemSpeicherbereinigungParallel portAdditionInterface (computing)Computer programmingSpacetimeSerial portRun time (program lifecycle phase)Arithmetic meanMultiplicationMultiplication signSystem callCore dumpBinary codeMedical imagingFlow separationSynchronizationPlotterVotingGoodness of fitHost Identity ProtocolProcess (computing)Open setMultilaterationRight angleConfiguration spaceSelectivity (electronic)CASE <Informatik>Rule of inferenceGreen's functionTrailRoundness (object)Computer animation
07:15
ScalabilityParallel portSynchronizationImplementationDeclarative programmingGamma functionCartesian coordinate systemBefehlsprozessorComputer configurationSpeicherbereinigungSemiconductor memoryOrder (biology)CodeContent (media)Cartesian coordinate systemMedical imagingVolume (thermodynamics)Thread (computing)SpacetimeImplementationDifferent (Kate Ryan album)Spring (hydrology)Virtual machineJava applet2 (number)Object (grammar)Operator (mathematics)ResultantBootingParallel portSet (mathematics)Functional (mathematics)MathematicsComputer architectureConfiguration spaceInterface (computing)Point (geometry)Selectivity (electronic)Installation artShared memoryTrailDeclarative programmingRun time (program lifecycle phase)DivisorMetropolitan area networkPlastikkarteConcurrency (computer science)Theory of relativitySerial portVideo gameComputer programmingDemosceneSinc functionType theoryView (database)Similarity (geometry)Multiplication signFlow separationComputer animation
12:24
Queue (abstract data type)MereologyInternet forumService (economics)Complex systemSemiconductor memorySynchronizationBefehlsprozessorThread (computing)Physical systemProcess (computing)String (computer science)Library (computing)Queue (abstract data type)Term (mathematics)Source codeRight angle
13:09
Principal idealMusical ensembleThread (computing)Content (media)SequenceString (computer science)Data bufferSource codeMilitary operationInstance (computer science)Channel capacityConstructor (object-oriented programming)Parameter (computer programming)Hash functionSynchronizationSocial classTable (information)String (computer science)Standard deviationBuffer solutionImplementationLibrary (computing)CASE <Informatik>Right angleData structureComputer animation
13:45
Meta elementObject (grammar)Thread (computing)Infinite conjugacy class propertyBuffer solutionProcess (computing)CASE <Informatik>outputBitMereologySequential accessWordImplementationComputer configurationDifferent (Kate Ryan album)Semiconductor memoryInformationPattern languageSound effectMathematical optimizationComputer programmingMedical imagingBenchmarkTwin primeVirtualizationObject (grammar)Java appletRun time (program lifecycle phase)EmailStreaming mediaPointer (computer programming)Thread (computing)NumberCompilation albumRecursionSynchronizationPoint (geometry)Asynchronous Transfer ModeMultiplication signMoment (mathematics)MeasurementProjective planeWave packetBoss CorporationComplete metric spaceHand fanOpen setRight angleComplex (psychology)Dynamical systemSelf-organizationOperator (mathematics)FluxDuality (mathematics)Optical disc drive
19:07
Program flowchart
Transcript: English(auto-generated)
00:05
just two classical optimizations that will help modern but mature virtual machine that powers native images. And why is that important? Well, and who I am.
00:23
My name is Dmitry Chukyo. I work at a company named Bellsoft, which actively participates in open JDK community. And we release our own JDK distribution, which you probably met if you've ever built a Spring Boot container with default build pack.
00:40
So it's in there. And now Spring Boot, since version 3, supports containers with native images. It can be built as a native image. And if you do that, the compiler being used is the Berica Native Image Kit, which is a Bellsoft distribution of GraalVM.
01:02
So that's another project that we participate. And GraalVM itself can be seen as different things, at least two major modes that we can observe. It can run as a JIT, where compiler is Graal.
01:25
Or we can build a native image with a static compilation. And it will utilize a special virtual machine substrate VM. And here it's different from the traditional Java,
01:45
traditional way of how we run it. Well, another interesting and peculiar point here is that it is written in Java. So it is a complex project. But most of the code is Java.
02:03
And this is beautiful. So you have a virtual machine and a compiler for JVM languages, and Java in particular, written in Java. So if we look at Java itself, why is it so beautiful?
02:21
Well, not so beautiful compared to Kotlin, as we know, right? But still, both Java and Kotlin, they share those concepts. So from the very beginning, there is a way to write correct parallel programs. So then to write parallel programs, we need some means of synchronization or to orchestrate our threads if we share data.
02:46
Most typically, we do that. And also, it's a managed runtime where we don't have to worry that much about free memory. Because we have garbage collection, and garbage is collected for us, and our programs just,
03:04
they can have memory leak, but you have to work hard to get one. And having that native image implementation makes our final binaries very, sometimes,
03:21
makes them very performant. Of course, we have an instant startup. It was mentioned today several times. But we can also have a very good peak performance. In certain cases, that's not a rule, but it can happen. It happens here on this plot. That's just a simple Spring Boot application,
03:44
and we just ping the same endpoint. And here, the native image works better. And also, it warms up instantly, and it has very good latency. So for this small amount of memory that it takes, so this is a small service.
04:01
It takes a small amount of memory, a very small heap. And it also has low latency. And under the hood, it uses serial GC, and we'll talk about that later. Well, what about relationship between GraalVM and OpenJDK?
04:21
Well, we're here in a Friends of OpenJDK room. And Graal has been integrated as an additional experimental compiler in JDK 9. But while it has been removed from recent JDKs, but what's the left over?
04:41
It's an interface to plug it in. So now it's going to be a second attempt to do that. So here on slides, it's mentioned that there is a discussion about new project Graal had. But last week, it was already a call for votes in OpenJDK
05:01
to start the project of bringing the most sweet parts of this technology back into OpenJDK. But it's something that happens right now. So that default garbage collector, that sometimes shows very good latency,
05:22
even compared to parallel GC or G1 in Hotspot. Well, on small heaps. Well, it's a kind of garbage collector we can easily understand. And it's a generational stop the world collection. So here, only one survivor of space.
05:43
But actually, it's 16 by default. But anyway, so we stop all our application threads, and we collect garbage in a single thread. So this is kind of a basic garbage collector, right?
06:00
But from the other hand, it's reliable. And it's very effective, especially if you have only a single core available. So you see the problem. We have some CPU, which may be enough to run many threads, but we run only one of them, at least
06:20
for garbage collection. And garbage collection can take significant time during our application execution. Well, that's obvious. Well, what would we do? Of course, we would like to do exactly the same thing, but in parallel. To decrease the time garbage collection takes,
06:42
to reduce the garbage collection pause. Because it still stops the world pause, but we reduce it because we process data with multiple threads. So that's the idea of parallel garbage collection. The idea is not new, but surprisingly, this modern runtime doesn't have it yet.
07:01
Well, we decided to implement it. And it's still being under review, and some implementation details, well, they change. But the idea is very simple. You just pass the garbage collection selection
07:21
during the creation of your native image. For instance, if you use some main or gradual configuration for your Spring Boot container, you also can do that. And then you have some grisps in runtime, which you also can twist when you run your application.
07:42
And well, you enable that implementation. I'll show some performance results later. But basically, the implementation itself, well, it can be analyzed as a change in a big Java program, which is where LLVM is. And there are now two GC interfaces and implementations.
08:07
And this functionality just reuse existing things in a very, I would say, smart way, just to keep what is all about the parallelization
08:25
as a code. So everything else is reused from serial GC. Typically, there's a problem of how do we synchronize and share the work, because parallel threads for garbage collection,
08:45
they also have the same problem, because they work on the same data, so they have contention or may have contention. So we need to share in some smart manner. Well, it's implemented with the work divided in its volume.
09:06
So every thread operates its local memory, and it's a chunk of memory of one megabyte. So if we need an extra memory, like we scan objects, and we fulfill some set of data that we operate on.
09:24
And then we have an extra chunk. We can just put it aside. So someone else can pick it. So that's the stack that contains the chunks of work. And then the work is finished. The thread just takes the next chunk of work.
09:44
There may be a situation when several threads try to copy to promote the same object. And this is actually solved very simply. They just reserve some space for the object and then tries to install forward pointer
10:02
using an atomic operation. And as this is an atomic operation, only one thread succeeds. So others just roll back, and this is a lightweight operation. Again, this is Java. This is not a strict UML, sorry. But still, all existing places that manage memory
10:27
were reused without changing the architecture of Growlithe. There are already possibilities to add garbage collectors. So if you want to implement one, it's not that complex.
10:41
The major problem is to be correct. Then you deal with memory. Then you deal with concurrency. And then you inject your code into this virtual machine. Because it's all declarative magic that requires you to be careful.
11:03
Well, some performance results. With relatively large heaps, the serial GC, you can have pauses of several seconds, which is long, of course. And there's a big difference if you have two or three
11:22
or four second pause, or if you decrease it by one second. So that's possible that this implementation already. So that's the order of this improvement. With another benchmark, hyper-alloc, you see that latency here, latency of pauses,
11:43
can be decreased two times. Those pauses are not that big, and we have frequent collections here. So x-axis is epoch. So each point is a garbage collection. And the y-axis is time, I believe, milliseconds.
12:06
Well, that's parallel GC. So we can obviously improve many applications and many installations where we have an option to use several CPUs.
12:21
If we use one CPU, of course, we won't see much difference. There is some increase in memory used for service needs, but that's kind of moderate. So other parts of this complex system. I mentioned synchronization. And well, synchronization is useful, but it has trade-offs.
12:45
Because if we implement an honest synchronization, we need to save our CPU resources to put aside threads that won't get the resource. We need to stop them, to queue them, to manage the queues, to wake them up, to involve operating system in that process.
13:05
So that's not cheap. But there are situations that that's another kill. And that even influences the design of standard library. Because we all know string buffer and string builder.
13:22
One class appeared because, well, another one wasn't very pleasant in terms of performance. Yeah, we need it sometimes, but in many cases, we need a non-synchronized implementation. Same like hash table and hash map. Whoever uses hash table, right? But it's very good synchronized.
13:43
But not all classes that have any synchronization in them have their twins without synchronization. That makes no sense, right? So there's a well-known technology. How to deal with a case where accesses to our data
14:02
structures, to our classes, are mostly sequential. Then at any point in time, only a single thread owns and operates with an object. And it's called biased logging or thing logging. Well, why is it simpler and more lightweight?
14:23
Because we don't want to manage all the complex cases. We know that we are in a good situation. And if not, yes, we can fall back and it's called inflate our monitor. Well, it existed in OpenJDK for ages.
14:43
And it has been removed from OpenJDK. First deprecated, then no one noticed, I believe. Because still, are there too many people using something newer than JDK 11? Well, some consequences were noticed probably too late.
15:06
Well, what are the reasons, first of all? What are the reasons to remove biased logging from OpenJDK, from Hotspot JVM? Well, to ease the implementation of virtual threads. To deliver Project Loom, to decrease
15:23
the amount of work there. So some consequences here, and issues discovered. In certain cases, things like input streams can be slowed down, like here it's 8x or something.
15:42
That's enormously slow. And for GraalVM, there is a mode that you see during static compilation. OK, this native image doesn't try to work with many cores. It's a single-threaded program.
16:01
So it's simple, and it works really better in these circumstances. So there is an optimization for that. But you have to know it in advance, and you compile your program. And there is, of course, a runtime option that supports all kinds of situations. And it's complex.
16:21
So the missing part is in the left lower corner. Well, to dynamically be able to process the situation of sequential access pattern. So we implemented quite a classical approach to this problem that brings that thin locking to GraalVM.
16:51
The initial idea was operating with object header, so where it already contains a pointer to a fat monitor
17:05
object. But it can be treated as well as some words we can atomically access and put some information there. The probably close to final implementation that we have right now still, or again, uses a pointer,
17:21
because it turned to be not so easy to keep correctness across the whole VM with some memory that you treat as a pointer or as a word depending on the situation. Well, anyway, inside that part of header
17:41
or inside that special object, we can have 64 bits of information. And we can mark it as a thin lock. This is a flag, right? We can do it atomically. We can keep the ID of an owner thread, which we can obtain.
18:03
Then we work with threads. And account of recursive locks that we currently hold. That, by the way, means that after a certain amount of recursive locks, we have to inflate monitor,
18:20
because we can store more information in that part of this work. Yeah. So again, it's a pure Java implementation where we work with some atomic magic and we update this information.
18:41
But what we've got, and the most recent numbers are even better. So we see that effect on exactly that example, the streams. We can speed them up. And even in a very kind of nano benchmark kind of measurement,
19:00
you also see the improvement. And even in multi-threaded case, there is no difference with the original.
Recommendations
Series of 13 media