FireCRaCer: The Best Of Both Worlds
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 542 | |
Autor | ||
Lizenz | CC-Namensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/61515 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FOSDEM 202334 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Leistung <Physik>Prozess <Informatik>Dienst <Informatik>ProgrammierumgebungSpeicher <Informatik>MAPProgrammbibliothekDatenflussOpen SourceSoftwareMathematikVerschlingungVirtuelle MaschineGoogolGemeinsamer SpeicherNichtlinearer OperatorComputersicherheitKonfigurationsraumMultiplikationsoperatorEinfach zusammenhängender RaumSystemprogrammAuswahlaxiomAdditionCASE <Informatik>Wurzel <Mathematik>RechenschieberKartesische KoordinatenVirtualisierungOffene MengeLambda-Kalkülp-BlockFokalpunktMetadatenDiagrammComputeranimation
02:15
ClientMinkowski-MetrikHalbleiterspeicherProzess <Informatik>Sichtenkonzeptp-BlockDateiverwaltungCASE <Informatik>Mapping <Computergraphik>SoftwareOverhead <Kommunikationstechnik>ThreadKartesische KoordinatenWarteschlangeBitKernel <Informatik>BootenDemo <Programm>Computeranimation
03:20
Quelle <Physik>Meta-TagWurzel <Mathematik>ZeitbereichBootenOpen SourceSocketRandomisierungInhalt <Mathematik>DatentypWiederherstellung <Informatik>Protokoll <Datenverarbeitungssystem>Bridge <Kommunikationstechnik>Gebäude <Mathematik>Physikalisches SystemExt-FunktorZeitstempelDateiverwaltungKernel <Informatik>SpezialrechnerStatistikTaskKontrollstrukturGewicht <Ausgleichsrechnung>GEDCOMSpieltheorieROM <Informatik>Nabel <Mathematik>Lesen <Datenverarbeitung>E-MailChi-Quadrat-VerteilungProzess <Informatik>Demo <Programm>Virtuelle MaschineMultiplikationsoperatorDateiverwaltungComputerspielProzess <Informatik>HalbleiterspeicherBildgebendes VerfahrenPhysikalisches SystemRadikal <Mathematik>Kernel <Informatik>SkriptspracheElektronische PublikationParametersystemNabel <Mathematik>BildschirmfensterSystem FLoginBootenInformationWurzel <Mathematik>SocketComputeranimation
05:32
TaskHauptidealringProdukt <Mathematik>Prozess <Informatik>Notebook-ComputerBootenFormation <Mathematik>
05:53
VirtualisierungNamensraumPaarvergleichCachingHalbleiterspeicherSchreiben <Datenverarbeitung>Prozess <Informatik>AppletBildgebendes VerfahrenKernel <Informatik>Offene MengeElektronische PublikationSerielle SchnittstelleFunktionalMinkowski-MetrikProjektive EbeneEinflussgrößePunktMereologieKartesische KoordinatenPerspektiveComputersicherheitStandardabweichungDateiverwaltungPhysikalisches SystemDokumentenserverGemeinsamer SpeicherMultiplikationsoperatorAggregatzustandGüte der AnpassungImplementierungKlon <Mathematik>GrenzschichtablösungDistributionenraumWeb-SeiteTopologieFreewareKategorie <Mathematik>Rechter WinkelSpeicher <Informatik>GruppenoperationARM <Computerarchitektur>Kanal <Bildverarbeitung>Wiederkehrender ZustandRechenschieberComputeranimation
09:35
Kernel <Informatik>GruppenkeimTaskHauptidealringLaufzeitfehlerMixed RealityDokumentenserverQuelle <Physik>AppletInterface <Schaltung>Demo <Programm>MaßerweiterungElektronische PublikationKonfiguration <Informatik>URLBootenQuelle <Physik>Kartesische Koordinaten
09:55
CachingRechenwerkQuelle <Physik>Tomcat <Programm>DialektBitrateDefaultProzess <Informatik>InformationCOMDatenbankProgrammbibliothekMechanismus-Design-TheorieSampler <Musikinstrument>Machsches PrinzipSystemplattformTopologieInnerer PunktVersionsverwaltungEingebettetes SystemSystemaufrufQuaderAusnahmebehandlungComputeranimation
10:19
VektorrechnungGEDCOMInformationSpeicherabzugAppletStellenringThreadDatenbankRechenwerkComputerschachSystemaufrufQuaderKonfiguration <Informatik>SoftwareentwicklerCASE <Informatik>VersionsverwaltungAusnahmebehandlung
10:39
RankingAppletSpeicherabzugThreadCachingDatenbankStellenringSpieltheorieSampler <Musikinstrument>VektorrechnungAusnahmebehandlungQuelle <Physik>Konfiguration <Informatik>InformationDokumentenserverBootstrap-AggregationE-MailVersionsverwaltungDualitätstheorieDialektTopologieWeb-ApplikationInterface <Schaltung>Tomcat <Programm>W3C-StandardMultiplikationsoperator
10:59
DatenbankCachingInformationProzess <Informatik>Tomcat <Programm>VersionsverwaltungDefaultGammafunktionDialektElektronischer DatenaustauschQuelle <Physik>Innerer PunktPrimzahlzwillingeImpulsVektorrechnungSocketVererbungshierarchieStellenringZufallszahlenRegulärer GraphAdditionEndliche ModelltheorieProzess <Informatik>URLResultanteSkriptspracheMailing-ListeElektronische PublikationHook <Programmierung>Kartesische Koordinaten
12:03
Reverse EngineeringStellenringQuelle <Physik>ZufallszahlenVererbungshierarchieCloud ComputingDatenbankProzess <Informatik>Modul <Datentyp>RechnernetzROM <Informatik>Projektive EbeneSchaltnetzAppletHalbleiterspeicherWort <Informatik>Offene MengeKartesische KoordinatenArithmetisches MittelWeb-SeiteKernel <Informatik>Gemeinsamer SpeicherLokales MinimumElektronische PublikationMaskierung <Informatik>Rechter WinkelComputersicherheitDemo <Programm>ImplementierungSystemaufrufTelekommunikationKlasse <Mathematik>GrundraumOffice-PaketAutomatische HandlungsplanungProzess <Informatik>EinfügungsdämpfungComputeranimation
13:48
ZufallszahlenAppletW3C-StandardRohdatenDatenverwaltungStatistikTaskDigitales ZertifikatVollständigkeitWurzel <Mathematik>ATMExt-FunktorSpezialrechnerKernel <Informatik>Gewicht <Ausgleichsrechnung>Protokoll <Datenverarbeitungssystem>ZeitstempelVersionsverwaltungPhysikalisches SystemDateiverwaltungProgrammierumgebungQuelle <Physik>SkriptspracheProzess <Informatik>Nabel <Mathematik>Applet
14:06
VersionsverwaltungWurzel <Mathematik>DatenverwaltungBenutzerprofilLaufzeitfehlerAppletRegulärer Ausdruck <Textverarbeitung>DefaultBootstrap-AggregationQuelle <Physik>GammafunktionLokales MinimumCachingDialektSystemplattformDokumentenserverImplementierungProzess <Informatik>Interface <Schaltung>InformationRechenwerkKontextbezogenes SystemW3C-StandardE-MailDatenbankVektorrechnungVerhandlungs-InformationssystemProgrammbibliothekComputeranimation
14:35
Bildgebendes Verfahren
14:54
InstantiierungInterface <Schaltung>Virtuelle MaschineComputersicherheitLesen <Datenverarbeitung>Quelle <Physik>InformationVektorrechnungInnerer PunktSocketBootenSpeicher <Informatik>ViereckDatenbankNebenbedingungStrom <Mathematik>Mailing-ListeDipolmomentW3C-StandardGruppenoperationEin-AusgabePasswortMagnetbandlaufwerkURLBildgebendes VerfahrenProzess <Informatik>CASE <Informatik>ImplementierungSoftwareGewicht <Ausgleichsrechnung>AppletNetzadresseNamensraumKlon <Mathematik>Wort <Informatik>VersionsverwaltungInstantiierungMultiplikationsoperatorDifferenteReelle ZahlSpeicher <Informatik>Programm/Quellcode
16:46
Web-SeiteBildgebendes VerfahrenOffene MengeHalbleiterspeicherVierzigElektronische PublikationSpeicher <Informatik>Minkowski-MetrikSpielkonsoleWort <Informatik>Dämon <Informatik>Demo <Programm>AdressraumComputeranimation
17:34
Virtuelle MaschineInterface <Schaltung>InstantiierungComputersicherheitSocketZeitbereichProzess <Informatik>DateiverwaltungSpezialrechnerBridge <Kommunikationstechnik>FirewallVersionsverwaltungStatistikTreiber <Programm>Kernel <Informatik>Schreiben <Datenverarbeitung>VerschlingungProtokoll <Datenverarbeitungssystem>Gebäude <Mathematik>Wurzel <Mathematik>FreewareROM <Informatik>DatenverwaltungWeg <Topologie>HalbleiterspeicherServer
17:52
SummengleichungQuelle <Physik>VisualisierungHalbleiterspeicherDämon <Informatik>InformationElektronische PublikationSystemaufrufProzess <Informatik>Nabel <Mathematik>LoginSkriptspracheWeb-Seite
18:48
Lie-GruppeSchiefe WahrscheinlichkeitsverteilungIkosaederMathematikRechenwerkAppletProzess <Informatik>Funktion <Mathematik>Klasse <Mathematik>MereologieAdressraumCachingWeb-SeiteMinkowski-MetrikDämon <Informatik>Bildgebendes VerfahrenMapping <Computergraphik>HalbleiterspeicherVirtuelle AdresseEinsSystemaufrufPhysikalismusElektronische PublikationSoftwareZweiQuadratzahlMinimumSichtenkonzeptVirtualisierungAnalytische FortsetzungCodeProgramm/QuellcodeComputeranimation
21:08
ImplementierungProgrammStrömungsrichtungMultiplikationsoperator
21:37
Graphische BenutzeroberflächeIkosaederDatensatzInklusion <Mathematik>ProgrammProzess <Informatik>Einhängung <Mathematik>DokumentenserverImplementierungSpeicher <Informatik>Physikalisches SystemGebäude <Mathematik>Kategorie <Mathematik>TelnetVerschlingungRechenschieberComputeranimation
22:51
Flussdiagramm
Transkript: Englisch(automatisch erzeugt)
00:07
So hi, I will start right away because my talk is quite packed, so I'm Focus Simonis working for Amazon in the Amazon Coretto team. My slides and the examples are all on GitHub. I will show this link one more time at the end of the talk so you don't have to
00:23
take a copy. I'm a principal engineer in the Amazon Coretto team, work in the Open JDK in 15 years and have various duties in the Open JDK and JCP.
00:42
So let's get started about Firecracker. So Firecracker is a minimalistic virtual machine monitor. It's KVM backed. It only supports a limited set of devices, basically block and network devices which are virtualized through VirtIO and a VSOC and a serial device. That makes it very fast and also very secure because it doesn't support any exotic devices
01:03
like for example QEMU. It has a REST-based configuration. It's completely written in Rust which also makes it kind of safe. It's based on, it was forked from Google's CrossVM and it's nowadays based on the Rust VMM library which is like a
01:21
base library for virtual machine monitors and I think that's also used by CrossVM meanwhile. It supports a micro-VM metadata service which is basically a JSON storage where you can share data between guest and host because with the full virtualization it's not easy to exchange data between guest and host because all the guest applications run in on their own kernel and with this data service for example you don't need a network
01:46
connection between host and guest. And then the Firecracker process itself supports in addition to the security provided by KVM, sandboxing, so a jailer utility which basically places the Firecracker process on the host into additional cgroup, change root and seccomp
02:02
environment and it's all open source Apache 2 licensed and it's the technology behind AWS Lambda. So every Lambda runs in its own Firecracker virtualized container. So here's just a picture of what I've just told you. So we have the kernel with KVM on the
02:24
downside and then we have the Firecracker process which has a thread for each vCPU which you configure in your in your guest and then it has a special thread to handle IO and an API thread which is low priority to handle the rest requests and then it boots the guest kernel which has the
02:41
VirtIO devices and the VM thread handles these VirtIO queues and maps them for network to devices on the host and for the block devices for either on a native block device on the host or on a file system which is exported as block device to the guest and then you can run a bit
03:01
through the application on the guest and you can run as many guests as you want it's only limited by your amount of memory basically the overhead by Firecracker is just about 50 megabytes so no it's less we will see it's very small so let's go to a demo so I have to truncate the
03:29
file so here we just start Firecracker we specify the API socket where we communicate within we have a log file and log info and the boot timer to see the boot time
03:43
and now from another terminal we start to config this with JSON data as I told you before so we configure two vCPUs and 512 megabytes of memory I have here a root file system extended x4 root file system and a
04:06
freshly compiled Linux kernel so I will now use another rest command to configure the Linux image which will be booted and I passed quite a lot of kernel arguments it's mostly to switch off devices
04:23
which you don't need anyway and which aren't supported and we define as init script to just one bash so init script will be just a shell and then we finally have to define a root file system that's our x4 file which I showed you before and now that we've configured everything we can
04:45
just start the virtual machine again with a JSON request and when you go back into our window we see that now the virtual machine has been started and it took about 200 milliseconds to to start bash and it's a fully configured Linux it's a it's the image was was was
05:11
assembled from a Ubuntu 22 image and the kernel I've compiled it myself you see we have two C sorry two CPUs and about 512 megabytes of memory so if we exit the shell
05:32
it will the VM will just reboot because it was our init process from this 200 milliseconds which you take to boot the the serial device alone took about
05:44
100 milliseconds so if you take that away usually in production you don't need a serial device it puts in 100 milliseconds and that's on my laptop okay so very quick comparison of
06:01
firecracker and docker so firecracker is fully kvm virtualized docker has only c-group namespace isolation the good thing about c-group namespace isolation only is that docker images run on the same kernel so they can do copy and write page cache memory sharing so if you run many of them they they are denser whereas for for if you run several firecracker
06:24
images they cannot directly share memory so you have to use ballooning devices for example in the guest to give back memory to the host on the other side that's much more secure because every container has its own memory its own kernel and firecracker has snapshot support to a checkpoint
06:43
the whole container like with the kernel everything together and docker can use criu checkpoint restore and user space to do the same thing basically serialize docker container with all processes to to a file we will see examples for that now so now what is crack and criu
07:03
so as was mentioned before crack is coordinated restore and checkpoint that's a new project in the open jdk it has basically three points which are important first one is to create the standard checkpoint restore notification api because many applications are not aware of being cloned
07:25
and there is state security time all this kind of stuff which an application might want to react upon especially not only when when cloning but not only when checkpointing restoring but especially when cloning the application think for example of an application
07:41
which logs to a file and then you checkpoint it and we start two clones and they both write to the same file they will corrupt the file usually so you have to take some measures if you run many things in parallel and application is not prepared for that so if you want to crack is currently not part of an official open jdk release it's still but mostly a research project
08:03
in the open jdk but you can already now make your application ready for crack by using the org crack api that's available on maven central and that basically wraps jdk crack namespace which is currently in the crack repository in open jdk but if it finds java x dot crack once that
08:22
should become available it will switch to that and it also offers the possibility to pass the implementations to a system property and then finally what makes crack interesting for many people to experiment with is that it basically integrates with with criu so it has a a copy of
08:42
criu packed with uh with the crack distribution so you can easily checkpoint your your java process and restart it and then as i mentioned before criu is checkpoint restore and user space that's an old java functionality which allows you to serialize a single process to the file
09:00
system it uses kernel free cgroup freezer to freeze your the processes or process tree and then writes all the memory to the to the disk and and so on still criu has some issues because it has to take uh um to take to look at all the open file descriptors shared memory segments stuff like that which might not be available again when you restore the image
09:24
whereas file crack as i said before it restores the whole kernel with all the file system everything in place so it's much much simpler from that perspective so let's take a quick demo on file on on crack so i have here open jdk dot 17 with crack uh extensions
09:45
and then you simply pass the option checkpoint 2 that's a file and this is just a pet clinic of a spring boot pet clinic example application and i modified it to
10:00
to register with the org crack callbacks as i said you can see here it's registered to org crack and now that i have started that i can use j command to to checkpoint it so i send it checkpoint command and when you see just out of the box it didn't work it shows some exception
10:24
because it found for example that the port 8080 is open and this uses a vanilla version of tomcat which has which which isn't implementing the crack callbacks so but that's not that bad it has a developer option which has to ignore exceptions so for this simple case
10:48
it will probably work so let's let's try it uh start it one more time prepare the checkpoint here so let's wait until it becomes ready so and now now checkpoint it and you see we
11:08
also locked the resources so you see it there were about 10 file descriptors and most of them were okay because like the crack modified vm already knows a lot of the file descriptors the vm is
11:20
using for example for the jar files it has opened or for the module files and it closes them by themselves without need to register anything so and the checkpoint you work and what's interesting is here that before checkpointing it caused my the the listener the handler i
11:41
installed in my pet clinic application so i could do additional stuff before checkpointing and now we can just restore this frozen process and you see it starts instantly it calls the after restore a hook i have registered and we can send a serial request
12:07
on 8080 and it basically still works so that's nice let's go further so now firecracker so basically a combination of initial firecracker and crack i found it somehow funny that words are
12:23
so similar so it's a play with words in my my opinion it's the best of two worlds to combine these two currently as i said a crack project is based on creo but i think it might be interesting to add support for firecracker as well and i'm currently working on that so with firecracker
12:41
you can basically checkpoint a plain jdk even with if it's not modified by by crack because as i said no need to to verify to worry about file descriptors so on one issue with firecracker as i said before you cannot trigger the checkpoint from java so the
13:02
crack implementation in open jdk can checkpoint itself because creo is running on the same kernel like the java application so the java just so jni calls crew and checkpoints itself that's obviously not possible in firecracker because you cannot escape from the guest that's the whole thing about running it in in a in a fully virtualized guest so we need another
13:24
means of communication but that's not not that complicated it offers maximum security and speed and i said before no copy on right memory sharing but you can use ballooning same page merging kernel features which are also have their plus and their drawbacks but things to
13:43
investigate so let's do a firecracker demo with java now to not bore you more with all this jason request i've written a shell script which basically does all that in in one script
14:04
and instead of calling bash it just starts java as as init process and we can now submit the request and you see it's it's it's working it's here here is the request
14:26
my i i have still registered these these callbacks although i'm running on a vanilla jdk by using the org crack library so they are they are empty they won't do anything and i can now snapshot firecracker you see that's also quite quite fixed quite quick
14:45
firecracker is not is resumed automatically so i i have to kill it manually and now if i restart from a snapshot you will see it also it takes just few milliseconds to restart the
15:04
image and again i can see URL into it it it works you see there is no the hooks are not being called because there is no real crack implementation in in the back in this case but
15:21
like checkpointing for java itself works and it's also easy to run a second clone now obviously we cannot run it in the same namespace because it will use the same ip address like the like the first version so we we started in a in the network namespace so minus n zero is just
15:45
to create a new namespace for for the clone and you see it uses i ip net ns net namespace exec to execute firecracker but it restores quite as quickly and
16:01
the initial ip address of the of the of the process has now in this namespace it's now mapped on a different ip address on the host but you see it's it's still working so in the get the guest still has the same ip address it has in the first place it's just running in its own
16:21
namespace and inside the guest again the the tomcat is running on the same port all no problem so we just kill the first instance and we kill the second instance how much time do i have
16:43
two minutes oh okay okay so uh just a few words i i realized that talks which are rated the highest i usually saw some animation so i decided to do animation because usually i only saw console console demos so quick uh introduction user fault daemon is a is a possibility to handle
17:08
page faults from the user space and firecracker offers the possibility instead of mapping the image file right into fire firecracker's memory to to use an external user fault daemon
17:21
and if we write the user for daemon ourselves we have the possibility to follow page by page which addresses get loaded at the restore and i found it interesting so i created that kind of thing so to an animation for that and for that we we restart our our our firecracker server
17:49
with native memory enabled native memory tracking and from the guest we do now ssh into uh into our firecracker guest where tomcat is running and just call j command
18:06
native memory details and and put that into a file and we do the same thing with the pmap information this is just a shell script inside the guest which basically prints all the
18:24
virtual to physical mappings for all processes into a file and now we can start the the visualizer and it takes the logs oops it it takes the logs of the user fault daemon
18:52
and the nmt and the native mapping so what you see here is basically the physical memory learned of the guest so it's memory page zero and in the end it's memory page one gigabyte and
19:04
every square is for kilobyte page and if you go and it's on the java process for example you see the dark these are the pages the rss of the java process blue ones are occupied by the java process but they are also in the page cache so that's probably a file for example
19:24
or something or class shared class for example when you when you look at the nmt output we see that for example for the classes we use about 66 i probably cannot read it it says virtual is 69
19:41
megabytes uh rss is 60 megabytes and userfold daemon loaded about 10 megabytes of it and here's the the animation i promised you so this is how the pages got loaded when we did the first call request on a resumed image and like the the yellow ones are all the pages
20:02
which is loaded and the orange one i don't know yeah some are orange belong to the to the to the virtual memory region i have selected here so for example all the orange pages are the the parts of the class space which got loaded for the first request so this is a lot of space for
20:21
more uh investigation would be nice to uh to compact this more like physically because you want to prefetch the things which get loaded especially if you download your images from from network for example and but the problem is that all the physical address space is continuous
20:41
like the virtual the physical pages are are not and try to look into a possibility to do that so that that's it thank you thank you 30 seconds for questions
21:26
yes i unfortunately there is no time in 20 minutes to show that but you can obviously use the current crack implementation inside firecracker use j command and instead of
21:42
crew there is a back end called the pause handler that's just a small program which instead of calling crew just suspense the whole process and then you can send in the signal to restore it so with firecracker you basically checkpoint with the pause engine then do the
22:00
firecracker snapshot then restore firecracker and then just do an ssh with a kill signal on the process and it will will restart that's one possibility another one is i wrote the jvmti agent which basically does the same thing even without crew it uh it um suspends all threats it calls system gc and then waits uh on a on a port so just ping it with telnet or whatsoever
22:25
and and it even calls uh the the the hooks by implementing the this uh custom possibility to uh with the property so i i i say or crack to use my crack implementation to call the hooks so
22:41
that all works it's in the in the repository which is uh i had a resource slide which i didn't access all the links