We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

FireCRaCer: The Best Of Both Worlds

00:00

Formale Metadaten

Titel
FireCRaCer: The Best Of Both Worlds
Serientitel
Anzahl der Teile
542
Autor
Lizenz
CC-Namensnennung 2.0 Belgien:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
CRaC (Coordinated Restore at Checkpoint) is an OpenJDK project for the coordination of Java applications at checkpoints which leverages the CRIU (Checkpoint/Restore In Userspace) library for process snapshotting. This talk will briefly introduce Firecracker, an open source virtualization technology based on KVM, explain how CRaC can be used with it and compare it with CRIU. The second part of the talk will introduce some new tools based on Userfaultfd and DAMON which allow fine grained analysis of of the JVM's memory access patterns during restore and end with a discussion on how the JVM could be optimized to improve them.
14
15
43
87
Vorschaubild
26:29
146
Vorschaubild
18:05
199
207
Vorschaubild
22:17
264
278
Vorschaubild
30:52
293
Vorschaubild
15:53
341
Vorschaubild
31:01
354
359
410
Leistung <Physik>Prozess <Informatik>Dienst <Informatik>ProgrammierumgebungSpeicher <Informatik>MAPProgrammbibliothekDatenflussOpen SourceSoftwareMathematikVerschlingungVirtuelle MaschineGoogolGemeinsamer SpeicherNichtlinearer OperatorComputersicherheitKonfigurationsraumMultiplikationsoperatorEinfach zusammenhängender RaumSystemprogrammAuswahlaxiomAdditionCASE <Informatik>Wurzel <Mathematik>RechenschieberKartesische KoordinatenVirtualisierungOffene MengeLambda-Kalkülp-BlockFokalpunktMetadatenDiagrammComputeranimation
ClientMinkowski-MetrikHalbleiterspeicherProzess <Informatik>Sichtenkonzeptp-BlockDateiverwaltungCASE <Informatik>Mapping <Computergraphik>SoftwareOverhead <Kommunikationstechnik>ThreadKartesische KoordinatenWarteschlangeBitKernel <Informatik>BootenDemo <Programm>Computeranimation
Quelle <Physik>Meta-TagWurzel <Mathematik>ZeitbereichBootenOpen SourceSocketRandomisierungInhalt <Mathematik>DatentypWiederherstellung <Informatik>Protokoll <Datenverarbeitungssystem>Bridge <Kommunikationstechnik>Gebäude <Mathematik>Physikalisches SystemExt-FunktorZeitstempelDateiverwaltungKernel <Informatik>SpezialrechnerStatistikTaskKontrollstrukturGewicht <Ausgleichsrechnung>GEDCOMSpieltheorieROM <Informatik>Nabel <Mathematik>Lesen <Datenverarbeitung>E-MailChi-Quadrat-VerteilungProzess <Informatik>Demo <Programm>Virtuelle MaschineMultiplikationsoperatorDateiverwaltungComputerspielProzess <Informatik>HalbleiterspeicherBildgebendes VerfahrenPhysikalisches SystemRadikal <Mathematik>Kernel <Informatik>SkriptspracheElektronische PublikationParametersystemNabel <Mathematik>BildschirmfensterSystem FLoginBootenInformationWurzel <Mathematik>SocketComputeranimation
TaskHauptidealringProdukt <Mathematik>Prozess <Informatik>Notebook-ComputerBootenFormation <Mathematik>
VirtualisierungNamensraumPaarvergleichCachingHalbleiterspeicherSchreiben <Datenverarbeitung>Prozess <Informatik>AppletBildgebendes VerfahrenKernel <Informatik>Offene MengeElektronische PublikationSerielle SchnittstelleFunktionalMinkowski-MetrikProjektive EbeneEinflussgrößePunktMereologieKartesische KoordinatenPerspektiveComputersicherheitStandardabweichungDateiverwaltungPhysikalisches SystemDokumentenserverGemeinsamer SpeicherMultiplikationsoperatorAggregatzustandGüte der AnpassungImplementierungKlon <Mathematik>GrenzschichtablösungDistributionenraumWeb-SeiteTopologieFreewareKategorie <Mathematik>Rechter WinkelSpeicher <Informatik>GruppenoperationARM <Computerarchitektur>Kanal <Bildverarbeitung>Wiederkehrender ZustandRechenschieberComputeranimation
Kernel <Informatik>GruppenkeimTaskHauptidealringLaufzeitfehlerMixed RealityDokumentenserverQuelle <Physik>AppletInterface <Schaltung>Demo <Programm>MaßerweiterungElektronische PublikationKonfiguration <Informatik>URLBootenQuelle <Physik>Kartesische Koordinaten
CachingRechenwerkQuelle <Physik>Tomcat <Programm>DialektBitrateDefaultProzess <Informatik>InformationCOMDatenbankProgrammbibliothekMechanismus-Design-TheorieSampler <Musikinstrument>Machsches PrinzipSystemplattformTopologieInnerer PunktVersionsverwaltungEingebettetes SystemSystemaufrufQuaderAusnahmebehandlungComputeranimation
VektorrechnungGEDCOMInformationSpeicherabzugAppletStellenringThreadDatenbankRechenwerkComputerschachSystemaufrufQuaderKonfiguration <Informatik>SoftwareentwicklerCASE <Informatik>VersionsverwaltungAusnahmebehandlung
RankingAppletSpeicherabzugThreadCachingDatenbankStellenringSpieltheorieSampler <Musikinstrument>VektorrechnungAusnahmebehandlungQuelle <Physik>Konfiguration <Informatik>InformationDokumentenserverBootstrap-AggregationE-MailVersionsverwaltungDualitätstheorieDialektTopologieWeb-ApplikationInterface <Schaltung>Tomcat <Programm>W3C-StandardMultiplikationsoperator
DatenbankCachingInformationProzess <Informatik>Tomcat <Programm>VersionsverwaltungDefaultGammafunktionDialektElektronischer DatenaustauschQuelle <Physik>Innerer PunktPrimzahlzwillingeImpulsVektorrechnungSocketVererbungshierarchieStellenringZufallszahlenRegulärer GraphAdditionEndliche ModelltheorieProzess <Informatik>URLResultanteSkriptspracheMailing-ListeElektronische PublikationHook <Programmierung>Kartesische Koordinaten
Reverse EngineeringStellenringQuelle <Physik>ZufallszahlenVererbungshierarchieCloud ComputingDatenbankProzess <Informatik>Modul <Datentyp>RechnernetzROM <Informatik>Projektive EbeneSchaltnetzAppletHalbleiterspeicherWort <Informatik>Offene MengeKartesische KoordinatenArithmetisches MittelWeb-SeiteKernel <Informatik>Gemeinsamer SpeicherLokales MinimumElektronische PublikationMaskierung <Informatik>Rechter WinkelComputersicherheitDemo <Programm>ImplementierungSystemaufrufTelekommunikationKlasse <Mathematik>GrundraumOffice-PaketAutomatische HandlungsplanungProzess <Informatik>EinfügungsdämpfungComputeranimation
ZufallszahlenAppletW3C-StandardRohdatenDatenverwaltungStatistikTaskDigitales ZertifikatVollständigkeitWurzel <Mathematik>ATMExt-FunktorSpezialrechnerKernel <Informatik>Gewicht <Ausgleichsrechnung>Protokoll <Datenverarbeitungssystem>ZeitstempelVersionsverwaltungPhysikalisches SystemDateiverwaltungProgrammierumgebungQuelle <Physik>SkriptspracheProzess <Informatik>Nabel <Mathematik>Applet
VersionsverwaltungWurzel <Mathematik>DatenverwaltungBenutzerprofilLaufzeitfehlerAppletRegulärer Ausdruck <Textverarbeitung>DefaultBootstrap-AggregationQuelle <Physik>GammafunktionLokales MinimumCachingDialektSystemplattformDokumentenserverImplementierungProzess <Informatik>Interface <Schaltung>InformationRechenwerkKontextbezogenes SystemW3C-StandardE-MailDatenbankVektorrechnungVerhandlungs-InformationssystemProgrammbibliothekComputeranimation
Bildgebendes Verfahren
InstantiierungInterface <Schaltung>Virtuelle MaschineComputersicherheitLesen <Datenverarbeitung>Quelle <Physik>InformationVektorrechnungInnerer PunktSocketBootenSpeicher <Informatik>ViereckDatenbankNebenbedingungStrom <Mathematik>Mailing-ListeDipolmomentW3C-StandardGruppenoperationEin-AusgabePasswortMagnetbandlaufwerkURLBildgebendes VerfahrenProzess <Informatik>CASE <Informatik>ImplementierungSoftwareGewicht <Ausgleichsrechnung>AppletNetzadresseNamensraumKlon <Mathematik>Wort <Informatik>VersionsverwaltungInstantiierungMultiplikationsoperatorDifferenteReelle ZahlSpeicher <Informatik>Programm/Quellcode
Web-SeiteBildgebendes VerfahrenOffene MengeHalbleiterspeicherVierzigElektronische PublikationSpeicher <Informatik>Minkowski-MetrikSpielkonsoleWort <Informatik>Dämon <Informatik>Demo <Programm>AdressraumComputeranimation
Virtuelle MaschineInterface <Schaltung>InstantiierungComputersicherheitSocketZeitbereichProzess <Informatik>DateiverwaltungSpezialrechnerBridge <Kommunikationstechnik>FirewallVersionsverwaltungStatistikTreiber <Programm>Kernel <Informatik>Schreiben <Datenverarbeitung>VerschlingungProtokoll <Datenverarbeitungssystem>Gebäude <Mathematik>Wurzel <Mathematik>FreewareROM <Informatik>DatenverwaltungWeg <Topologie>HalbleiterspeicherServer
SummengleichungQuelle <Physik>VisualisierungHalbleiterspeicherDämon <Informatik>InformationElektronische PublikationSystemaufrufProzess <Informatik>Nabel <Mathematik>LoginSkriptspracheWeb-Seite
Lie-GruppeSchiefe WahrscheinlichkeitsverteilungIkosaederMathematikRechenwerkAppletProzess <Informatik>Funktion <Mathematik>Klasse <Mathematik>MereologieAdressraumCachingWeb-SeiteMinkowski-MetrikDämon <Informatik>Bildgebendes VerfahrenMapping <Computergraphik>HalbleiterspeicherVirtuelle AdresseEinsSystemaufrufPhysikalismusElektronische PublikationSoftwareZweiQuadratzahlMinimumSichtenkonzeptVirtualisierungAnalytische FortsetzungCodeProgramm/QuellcodeComputeranimation
ImplementierungProgrammStrömungsrichtungMultiplikationsoperator
Graphische BenutzeroberflächeIkosaederDatensatzInklusion <Mathematik>ProgrammProzess <Informatik>Einhängung <Mathematik>DokumentenserverImplementierungSpeicher <Informatik>Physikalisches SystemGebäude <Mathematik>Kategorie <Mathematik>TelnetVerschlingungRechenschieberComputeranimation
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
So hi, I will start right away because my talk is quite packed, so I'm Focus Simonis working for Amazon in the Amazon Coretto team. My slides and the examples are all on GitHub. I will show this link one more time at the end of the talk so you don't have to
take a copy. I'm a principal engineer in the Amazon Coretto team, work in the Open JDK in 15 years and have various duties in the Open JDK and JCP.
So let's get started about Firecracker. So Firecracker is a minimalistic virtual machine monitor. It's KVM backed. It only supports a limited set of devices, basically block and network devices which are virtualized through VirtIO and a VSOC and a serial device. That makes it very fast and also very secure because it doesn't support any exotic devices
like for example QEMU. It has a REST-based configuration. It's completely written in Rust which also makes it kind of safe. It's based on, it was forked from Google's CrossVM and it's nowadays based on the Rust VMM library which is like a
base library for virtual machine monitors and I think that's also used by CrossVM meanwhile. It supports a micro-VM metadata service which is basically a JSON storage where you can share data between guest and host because with the full virtualization it's not easy to exchange data between guest and host because all the guest applications run in on their own kernel and with this data service for example you don't need a network
connection between host and guest. And then the Firecracker process itself supports in addition to the security provided by KVM, sandboxing, so a jailer utility which basically places the Firecracker process on the host into additional cgroup, change root and seccomp
environment and it's all open source Apache 2 licensed and it's the technology behind AWS Lambda. So every Lambda runs in its own Firecracker virtualized container. So here's just a picture of what I've just told you. So we have the kernel with KVM on the
downside and then we have the Firecracker process which has a thread for each vCPU which you configure in your in your guest and then it has a special thread to handle IO and an API thread which is low priority to handle the rest requests and then it boots the guest kernel which has the
VirtIO devices and the VM thread handles these VirtIO queues and maps them for network to devices on the host and for the block devices for either on a native block device on the host or on a file system which is exported as block device to the guest and then you can run a bit
through the application on the guest and you can run as many guests as you want it's only limited by your amount of memory basically the overhead by Firecracker is just about 50 megabytes so no it's less we will see it's very small so let's go to a demo so I have to truncate the
file so here we just start Firecracker we specify the API socket where we communicate within we have a log file and log info and the boot timer to see the boot time
and now from another terminal we start to config this with JSON data as I told you before so we configure two vCPUs and 512 megabytes of memory I have here a root file system extended x4 root file system and a
freshly compiled Linux kernel so I will now use another rest command to configure the Linux image which will be booted and I passed quite a lot of kernel arguments it's mostly to switch off devices
which you don't need anyway and which aren't supported and we define as init script to just one bash so init script will be just a shell and then we finally have to define a root file system that's our x4 file which I showed you before and now that we've configured everything we can
just start the virtual machine again with a JSON request and when you go back into our window we see that now the virtual machine has been started and it took about 200 milliseconds to to start bash and it's a fully configured Linux it's a it's the image was was was
assembled from a Ubuntu 22 image and the kernel I've compiled it myself you see we have two C sorry two CPUs and about 512 megabytes of memory so if we exit the shell
it will the VM will just reboot because it was our init process from this 200 milliseconds which you take to boot the the serial device alone took about
100 milliseconds so if you take that away usually in production you don't need a serial device it puts in 100 milliseconds and that's on my laptop okay so very quick comparison of
firecracker and docker so firecracker is fully kvm virtualized docker has only c-group namespace isolation the good thing about c-group namespace isolation only is that docker images run on the same kernel so they can do copy and write page cache memory sharing so if you run many of them they they are denser whereas for for if you run several firecracker
images they cannot directly share memory so you have to use ballooning devices for example in the guest to give back memory to the host on the other side that's much more secure because every container has its own memory its own kernel and firecracker has snapshot support to a checkpoint
the whole container like with the kernel everything together and docker can use criu checkpoint restore and user space to do the same thing basically serialize docker container with all processes to to a file we will see examples for that now so now what is crack and criu
so as was mentioned before crack is coordinated restore and checkpoint that's a new project in the open jdk it has basically three points which are important first one is to create the standard checkpoint restore notification api because many applications are not aware of being cloned
and there is state security time all this kind of stuff which an application might want to react upon especially not only when when cloning but not only when checkpointing restoring but especially when cloning the application think for example of an application
which logs to a file and then you checkpoint it and we start two clones and they both write to the same file they will corrupt the file usually so you have to take some measures if you run many things in parallel and application is not prepared for that so if you want to crack is currently not part of an official open jdk release it's still but mostly a research project
in the open jdk but you can already now make your application ready for crack by using the org crack api that's available on maven central and that basically wraps jdk crack namespace which is currently in the crack repository in open jdk but if it finds java x dot crack once that
should become available it will switch to that and it also offers the possibility to pass the implementations to a system property and then finally what makes crack interesting for many people to experiment with is that it basically integrates with with criu so it has a a copy of
criu packed with uh with the crack distribution so you can easily checkpoint your your java process and restart it and then as i mentioned before criu is checkpoint restore and user space that's an old java functionality which allows you to serialize a single process to the file
system it uses kernel free cgroup freezer to freeze your the processes or process tree and then writes all the memory to the to the disk and and so on still criu has some issues because it has to take uh um to take to look at all the open file descriptors shared memory segments stuff like that which might not be available again when you restore the image
whereas file crack as i said before it restores the whole kernel with all the file system everything in place so it's much much simpler from that perspective so let's take a quick demo on file on on crack so i have here open jdk dot 17 with crack uh extensions
and then you simply pass the option checkpoint 2 that's a file and this is just a pet clinic of a spring boot pet clinic example application and i modified it to
to register with the org crack callbacks as i said you can see here it's registered to org crack and now that i have started that i can use j command to to checkpoint it so i send it checkpoint command and when you see just out of the box it didn't work it shows some exception
because it found for example that the port 8080 is open and this uses a vanilla version of tomcat which has which which isn't implementing the crack callbacks so but that's not that bad it has a developer option which has to ignore exceptions so for this simple case
it will probably work so let's let's try it uh start it one more time prepare the checkpoint here so let's wait until it becomes ready so and now now checkpoint it and you see we
also locked the resources so you see it there were about 10 file descriptors and most of them were okay because like the crack modified vm already knows a lot of the file descriptors the vm is
using for example for the jar files it has opened or for the module files and it closes them by themselves without need to register anything so and the checkpoint you work and what's interesting is here that before checkpointing it caused my the the listener the handler i
installed in my pet clinic application so i could do additional stuff before checkpointing and now we can just restore this frozen process and you see it starts instantly it calls the after restore a hook i have registered and we can send a serial request
on 8080 and it basically still works so that's nice let's go further so now firecracker so basically a combination of initial firecracker and crack i found it somehow funny that words are
so similar so it's a play with words in my my opinion it's the best of two worlds to combine these two currently as i said a crack project is based on creo but i think it might be interesting to add support for firecracker as well and i'm currently working on that so with firecracker
you can basically checkpoint a plain jdk even with if it's not modified by by crack because as i said no need to to verify to worry about file descriptors so on one issue with firecracker as i said before you cannot trigger the checkpoint from java so the
crack implementation in open jdk can checkpoint itself because creo is running on the same kernel like the java application so the java just so jni calls crew and checkpoints itself that's obviously not possible in firecracker because you cannot escape from the guest that's the whole thing about running it in in a in a fully virtualized guest so we need another
means of communication but that's not not that complicated it offers maximum security and speed and i said before no copy on right memory sharing but you can use ballooning same page merging kernel features which are also have their plus and their drawbacks but things to
investigate so let's do a firecracker demo with java now to not bore you more with all this jason request i've written a shell script which basically does all that in in one script
and instead of calling bash it just starts java as as init process and we can now submit the request and you see it's it's it's working it's here here is the request
my i i have still registered these these callbacks although i'm running on a vanilla jdk by using the org crack library so they are they are empty they won't do anything and i can now snapshot firecracker you see that's also quite quite fixed quite quick
firecracker is not is resumed automatically so i i have to kill it manually and now if i restart from a snapshot you will see it also it takes just few milliseconds to restart the
image and again i can see URL into it it it works you see there is no the hooks are not being called because there is no real crack implementation in in the back in this case but
like checkpointing for java itself works and it's also easy to run a second clone now obviously we cannot run it in the same namespace because it will use the same ip address like the like the first version so we we started in a in the network namespace so minus n zero is just
to create a new namespace for for the clone and you see it uses i ip net ns net namespace exec to execute firecracker but it restores quite as quickly and
the initial ip address of the of the of the process has now in this namespace it's now mapped on a different ip address on the host but you see it's it's still working so in the get the guest still has the same ip address it has in the first place it's just running in its own
namespace and inside the guest again the the tomcat is running on the same port all no problem so we just kill the first instance and we kill the second instance how much time do i have
two minutes oh okay okay so uh just a few words i i realized that talks which are rated the highest i usually saw some animation so i decided to do animation because usually i only saw console console demos so quick uh introduction user fault daemon is a is a possibility to handle
page faults from the user space and firecracker offers the possibility instead of mapping the image file right into fire firecracker's memory to to use an external user fault daemon
and if we write the user for daemon ourselves we have the possibility to follow page by page which addresses get loaded at the restore and i found it interesting so i created that kind of thing so to an animation for that and for that we we restart our our our firecracker server
with native memory enabled native memory tracking and from the guest we do now ssh into uh into our firecracker guest where tomcat is running and just call j command
native memory details and and put that into a file and we do the same thing with the pmap information this is just a shell script inside the guest which basically prints all the
virtual to physical mappings for all processes into a file and now we can start the the visualizer and it takes the logs oops it it takes the logs of the user fault daemon
and the nmt and the native mapping so what you see here is basically the physical memory learned of the guest so it's memory page zero and in the end it's memory page one gigabyte and
every square is for kilobyte page and if you go and it's on the java process for example you see the dark these are the pages the rss of the java process blue ones are occupied by the java process but they are also in the page cache so that's probably a file for example
or something or class shared class for example when you when you look at the nmt output we see that for example for the classes we use about 66 i probably cannot read it it says virtual is 69
megabytes uh rss is 60 megabytes and userfold daemon loaded about 10 megabytes of it and here's the the animation i promised you so this is how the pages got loaded when we did the first call request on a resumed image and like the the yellow ones are all the pages
which is loaded and the orange one i don't know yeah some are orange belong to the to the to the virtual memory region i have selected here so for example all the orange pages are the the parts of the class space which got loaded for the first request so this is a lot of space for
more uh investigation would be nice to uh to compact this more like physically because you want to prefetch the things which get loaded especially if you download your images from from network for example and but the problem is that all the physical address space is continuous
like the virtual the physical pages are are not and try to look into a possibility to do that so that that's it thank you thank you 30 seconds for questions
yes i unfortunately there is no time in 20 minutes to show that but you can obviously use the current crack implementation inside firecracker use j command and instead of
crew there is a back end called the pause handler that's just a small program which instead of calling crew just suspense the whole process and then you can send in the signal to restore it so with firecracker you basically checkpoint with the pause engine then do the
firecracker snapshot then restore firecracker and then just do an ssh with a kill signal on the process and it will will restart that's one possibility another one is i wrote the jvmti agent which basically does the same thing even without crew it uh it um suspends all threats it calls system gc and then waits uh on a on a port so just ping it with telnet or whatsoever
and and it even calls uh the the the hooks by implementing the this uh custom possibility to uh with the property so i i i say or crack to use my crack implementation to call the hooks so
that all works it's in the in the repository which is uh i had a resource slide which i didn't access all the links