Hardware acceleration for Unikernels
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61540 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023216 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Computer hardwareRead-only memoryInformation securityBootingPersonal digital assistantServer (computing)Service (economics)Software frameworkProcess (computing)Maß <Mathematik>Graphics processing unitField programmable gate arrayDemosceneVisualization (computer graphics)Run time (program lifecycle phase)Operator (mathematics)Device driverTensorSummierbarkeitVirtual realityPartition (number theory)Entire functionStack (abstract data type)Bound stateCharacteristic polynomialHypercubeOverhead (computing)CodeAerodynamicsFunction (mathematics)ImplementationLatent heatSoftware frameworkComputer hardwareStack (abstract data type)ImplementationSemiconductor memoryInformation securityLatent heatBootingCartesian coordinate systemVirtual machineSystem callUniqueness quantificationPortable communications deviceServer (computing)Kernel (computing)Arithmetic meanBinary codeType theoryComputer architectureWeb 2.0Goodness of fitFitness functionFunctional (mathematics)BuildingSoftwareFigurate numberProjective planeUniformer RaumContrast (vision)Task (computing)BitVirtualizationPower (physics)CASE <Informatik>Partition (number theory)Remote procedure callExecution unitWorkloadBefehlsprozessorOperator (mathematics)Run time (program lifecycle phase)Field programmable gate arrayPoint cloudDifferent (Kate Ryan album)Device driverProcess (computing)Parallel portMereologyFunctional (mathematics)CodeDynamical systemEntire functionProgram flowchart
08:31
Stack (abstract data type)DemosceneRun time (program lifecycle phase)Library (computing)Core dumpFunction (mathematics)ImplementationSoftware frameworkCodeComputer hardwareMilitary operationComputer-generated imageryNetwork socketSocket-SchnittstelleAbstractionIdeal (ethics)Identical particlesAdditionHeat transferArray data structureField programmable gate arrayStreaming mediaKernel (computing)DisintegrationShared memoryBefehlsprozessorGraphics processing unitInferenceMultiplication signCartesian coordinate systemSoftware frameworkLibrary (computing)BitDemo (music)MathematicsComputer hardwareDifferent (Kate Ryan album)Cellular automatonResultantCore dumpLatent heatOperator (mathematics)MappingCodeImplementationMedical imagingEuclidean vectorSoftwareParameter (computer programming)Virtual machinePlug-in (computing)Instance (computer science)Figurate numberInferenceEndliche ModelltheorieBefehlsprozessorSystem callConfiguration spaceOpen setConnectivity (graph theory)Heat transferKernel (computing)Functional (mathematics)Computer configurationInterface (computing)Revision controlNetwork socketSocket-SchnittstelleFront and back endsDevice driverInformationSubsetComputer animation
16:56
Software repositoryIntegrated development environmentCartesian coordinate systemLibrary (computing)
17:38
Configuration spaceRun time (program lifecycle phase)Physical systemLibrary (computing)Parameter (computer programming)ParsingResource allocationSemiconductor memoryClient (computing)AbstractionRead-only memoryBinary fileWeb pageStandard deviationCoroutineInterrupt <Informatik>SubsetFunction (mathematics)Event horizonProcess (computing)TelecommunicationNetwork socketSocket-SchnittstelleCurvatureNetwork topologyArrow of timeMenu (computing)Structural loadArchitectureComputer-generated imageryComputer configurationMilitary operationLoop (music)BildsegmentierungKernel (computing)Directory serviceMedical imagingCartesian coordinate systemSource codeComputer animation
18:18
Directory serviceMobile appConfiguration spaceExecution unitThread (computing)Function (mathematics)String (computer science)Compilation albumCore dumpLocation-based serviceMaxima and minimaError messageStatisticsCartesian coordinate systemMultiplication signComputer file
18:50
Cloud computingEmailComputer-generated imagery19 (number)Directory serviceEvent horizonCartesian coordinate systemLimit (category theory)Parameter (computer programming)Functional (mathematics)Medical imaging
19:22
DemosceneGraphics processing unitInferenceComputer iconBefehlsprozessorCore dumpOperations researchDirectory serviceEvent horizonGamma functionPrice indexInformationKeyboard shortcutFunction (mathematics)Computer-generated imageryMobile appLoginPlug-in (computing)InferenceKernel (computing)Computer animation
20:00
Computer-generated imageryFront and back endsCartesian coordinate systemDemosceneOperator (mathematics)Array data structure
20:33
Formal concept analysisComputer-generated imageryHill differential equationInstallation artGamma functionExecution unitScalable Coherent InterfaceChemical equationCore dumpGraphical user interfaceDrill commandsInterior (topology)Convex hullInclusion mapConfiguration spaceFront and back endsImplementationBefehlsprozessorOperator (mathematics)Execution unitKernel (computing)
21:25
Maxima and minimaExecution unitHill differential equationGamma functionElectronic program guideSchmelze <Betrieb>ResultantMultiplication signMessage passing
21:56
Maxima and minimaAvatar (2009 film)Sign (mathematics)Wechselseitige InformationMIDIRun time (program lifecycle phase)NumberBefehlsprozessor
22:26
Execution unitRepeating decimalMIDIDegree (graph theory)Maxima and minimaSimulationMoment of inertiaIntelMotion blurRight angleCartesian coordinate systemComputer animation
23:09
MechatronicsEvent horizonSlide ruleHash functionBefehlsprozessorSign (mathematics)Error messageMusical ensembleTrigonometryOpen setInformationField programmable gate arrayPrincipal ideal domainFunction (mathematics)Convex hullLine (geometry)Run time (program lifecycle phase)Gamma functionLocal ringMobile appKernel (computing)Execution unitNormed vector spaceoutputFront and back endsComputer animation
23:43
WebsiteEmulatorLevel (video gaming)Error messageEmulationBuildingConfiguration spaceHash functionFreewareParameter (computer programming)Arithmetic meanExecution unitBefehlsprozessorInformationPlug-in (computing)VideoconferencingLink (knot theory)Point (geometry)Operator (mathematics)Field programmable gate arrayComputer animationSource code
24:20
InformationKeyboard shortcutLocal ringBitPlug-in (computing)Front and back endsConfiguration space
25:14
InformationBootingComputer-generated imageryWeb pageElectronic program guideBuildingSource codeBinary fileInstallation artLink (knot theory)Core dumpLibrary (computing)LoginCartesian coordinate system
25:48
Content (media)Table (information)EmulationBuildingGraphics processing unitInstallation artIntegrated development environmentCodeDisintegrationKernel (computing)Fiber bundleHypercubeBinary fileVirtual machineVirtual realityVacuumRecursionFlagBefehlsprozessorSoftware repositoryVariable (mathematics)Message passingLibrary (computing)Configuration spaceCloningCorrelation and dependenceSample (statistics)MaizeCore dumpDemosceneFormal languageKeyboard shortcutSystem programmingPoint cloudSoftware frameworkComputer hardwareMaxwell's equationsLogicField programmable gate arrayInferenceTensorStack (abstract data type)Perfect groupHorizonProgrammer (hardware)Complete metric spaceOpen sourceSoftwareFitness functionSoftware testingFormal languageKeyboard shortcutPoint cloudDemo (music)Kernel (computing)Computer hardwareINTEGRALRevision controlInferencePerspective (visual)Stack (abstract data type)Graphics processing unitUniqueness quantificationProjective planePerfect groupLink (knot theory)HorizonComputer animation
28:37
Physical system10 (number)Binary codePerformance appraisalVirtualizationResultantLevel (video gaming)BlogKernel (computing)Dependent and independent variablesComputer configurationSource codeFunction (mathematics)Overhead (computing)Point cloudMeasurementLibrary (computing)TensorCartesian coordinate systemFunctional (mathematics)Moment (mathematics)Operator (mathematics)Heat transferInformation securitySoftwareoutputAsynchronous Transfer ModeDifferent (Kate Ryan album)Run time (program lifecycle phase)Virtual machineWordBitComputerEndliche ModelltheorieSet (mathematics)MereologySoftware testingAbstractionSemiconductor memoryComputer hardwareDataflowResonatorIntegrated development environmentMultiplication signVector potentialGoodness of fitField programmable gate arrayINTEGRALSoftware frameworkCodeFilm editingComputer animation
38:33
Computer animationProgram flowchart
Transcript: English(auto-generated)
00:05
Hi everyone, so it's my pleasure to introduce Babis and Anastasios. They're going to give you the talk on using VXL for for hardware acceleration in your kernels. Babis, please. So hello everyone, I'm Babis. My actual name is Geraldo's minus, but you can just call me Babis.
00:22
So we're gonna give a talk about hardware acceleration and our effort to having some support in the uni kernels And we do that with VXL. So Oh, okay
00:40
Yeah, put that over there and maybe you can just keep it here, okay So yeah, we already heard From Simon, so we don't have to repeat what the uni kernels are There are a lot of projects and we know that they are promising. It's a promising technology. We can have very fast boot times
01:00
low memory footprint and some increased security We also know some of the use cases for uni kernels which are usually traditional applications that you might have heard like Web servers and stuff like that, but they have also been used for NFV And we think that they are also a good fit for serverless and in general microservices
01:23
deployments either in the cloud or the aids and We also think that they can also be a good fit for especially in this case for ML and AI Applications and that sounds a bit weird because as we know MLA and AI workloads, they are quite huge and heavy so
01:42
We maybe you have heard about PyTorch. Maybe you have heard about tensorflow. We're not gonna touch them. Don't worry, but What we want to say here is that they're very very heavy frameworks very difficult to add support for them And secondly, we know that this kind of applications are usually compute intensive
02:02
Applications that can take a lot of resources And for for that exact reason we see that there is also shift in the hardware that exists in the data centers not only in In the data center, but also in the edge we see devices that are equipped with a lot of new processing units, of course, we have the traditional FPGAs and GPUs, but we also have
02:26
Specialized processing units like TPUs and also some ASICs And First of all, as we know ML and AI workloads cannot be executed in unique kernels That's for sure because there is no support for these frameworks. And secondly, there is no support for hardware acceleration
02:45
So there is not really any benefit if we cannot if we if we run it in a CPU so I will I will give a small I'm gonna go through the
03:00
Acceleration stack and how we can virtualize it with the current approaches So in general what we have it's pretty simple Usually you have an application which is written in an acceleration framework can be OpenCL can be CUDA can be TensorFlow PyTorch all of these frameworks usually underneath that you have the operator for the GPU or maybe a runtime for FPGAs and
03:26
Then you also have of course a device driver, which resides inside the kernel So this is what we have to virtualize and As we know unique kernels are virtual machines so we can use the same techniques that we have for virtual machines
03:42
We can also use them in unique kernels Some some of these techniques are hardware partitioning, parallel virtualization and remote API So in the case of hardware partitioning we The the hardware accelerator has the possibility to partition itself and we
04:02
Assigned this small part of the accelerator to the VM and the VM can access directly the the hardware accelerator This has very good performance on the other hand. We need to have the entire acceleration stack inside the VM from the device driver to
04:20
the application To the acceleration framework there is also the case of also I forgot to mention here that this is something that it has to be supported from the device and a device driver needs also to be in the VM and In the case of our virtualization these things are getting a bit better because we can have a generic
04:43
Let's say device and then the hypervisor simply manages the accelerator and then we can have The Request to the accelerator managed from the hypervisor so we don't need to have all these kind of different drivers for every
05:02
Accelerator inside the VM on the other hand. We still need to have the vendor runtime and the application and acceleration framework In the case of remote API, we even have a lighter approach The Everything is managed from the servers this server must might be even locally in the same as thing or can be a remote server and what happens here is that
05:25
the acceleration framework intercepts the the calls from the application and forwards them to the To the acceleration framework that resides on the server This has some performance overhead, of course because of the transport
05:41
That happens and it also framework specific so if it has to be supported like there is a remote CUDA for example that supports it so Great, but what is the best for unique kernels in the case of hardware partitioning? This means that we have to port the entire software acceleration stack and every device driver to the unique kernel
06:05
Which is not a good and not an easy task Again in power virtualization things are bit better We have to port only maybe one driver But still we need to port all this acceleration stack in the case of a remote API This is something sounds much more feasible
06:21
Because we can port only the let's say remote CUDA only one framework but how easy is that and It's not easy because as I said before these kind of frameworks are huge. They have very very Big code base they have Dynamic linking which is
06:42
Comes in contrast with the unit kernels and a lot of a lot of dependencies So it's not gonna be easy to be important in any existing unique kernel framework right now so For that We Think that the VXL is suitable for unique kernels, so I will give
07:05
Two tasks to present a bit of how VXL is working Yes Okay, thank you, so
07:23
Hi from my side too, I'm going to talk a bit about the framework that we're building so We started working on VXL to actually handle the hardware acceleration virtualization in VMs, so it's not tailored to
07:43
unique kernels we we have been playing with semantically exposing hardware acceleration functionality from hardware acceleration frameworks to VMs and The software stack is shown in the in the in the figure
08:03
We use a hardware agnostic API, so we we expose the whole function call of the hardware accelerated operation and we we we focus on the portability and on interoperability meaning that
08:21
the same binary code Originating from the application can be executed in many type of architectures and that it is decoupled from the hardware specific implementation a Closer look to the to the software stack so we have an application this application consumes the VXL API
08:43
which has specific support specific specific operations These operations are mapped through a Mapping layer through VXL RT to the relevant plugins which are shown in
09:03
Greenish and They actually are the glue code between the API calls and the hardware specific Implementation which in this figure resides in the external libraries Layer and then it's it's the hardware where it executes whatever there is in the external libraries
09:28
So Digging a bit more into the into how VXL works. So the the core library the core
09:42
Component of VXL Exposes the API to the application and maps the API calls to the to the relevant hardware plugins Which by the way are loaded at runtime? The These plugins are actually glue code between the API the API calls so and
10:05
And the hardware specific implementation. So for example, we have an API call of doing image Classification image inference in general the only thing that the application needs to Submit to VXL is I want to do image classify. This is the image
10:23
This is the the model then the parameters and blah blah blah and this gets mapped to the relevant plugin Implementation for instance in this figure we can use the JETSON inference image classification implementation which Translates these arguments and this operation to the the actual JETSON inference
10:44
Framework provided by NVIDIA that does the image classification operation Apart from the hardware specific plugins we also have the Transport layer plugins, so
11:02
Imagine this the same operation the image inference could be executed in a VM using a virtual plugin so these This information the operation the arguments the models everything will be transferred to the host machine
11:21
That will use hardware plugin So apart from the from the glue code for the hardware specific implementation. We also have the VM plugins We we also some of the of the of the plugins and the API operation support a
11:45
subset of acceleration frameworks such as a tensorflow or or pytorch And what I what I mentioned earlier about the virtio plugins so
12:01
essentially what happens is that the the request of the operation and the argument is forwarded to Another instance of of the Vaxel library either in the On on on the hyper on the hypervisor layer or on on a socket interface
12:21
So we currently support two modes of operations. We have a virtio driver and currently supporting we support firecracker and kim and So we load the driver on the on the VM This driver transfers The the arguments and the operation to the backend to the chemo backend or the firecracker backend
12:46
which in turn calls the Vaxel library to do the actual operation and The the other option is using sockets. So we load a Socket interface as a socket agent on the host. We have the visual plug-in or the guest and they communicate over
13:05
simple sockets I'm going to hand over to Bobby For the unikernel stuff, so
13:26
How how can Vaxel be used in unikernels? Actually, it's quite easy compared to any other acceleration framework that exists and think is that The only thing that we need to do is just have that Vaxel RT that you see over there
13:45
That's the only thing that we need to port because that's and this is a very very thin layer of a C code This can be easily ported to any unikernel that exists and we of course we need some kind of transport plug-in of for to
14:01
forward the requests So as Tasos already explained usually the application is the same application that we can run in the host or in any Container or in any VM can be also used in the unikernel the same node changes and it simply uses the specific API of Vaxel and then
14:20
We simply forward the request to the host and then we have another version of Vaxel Which is in the host and simply maps to the hardware accelerator framework that is implementing the specific function so This as I said this this Allow us to have the same application running either in the host in the VM without any changes. So it's easy to debug easy to
14:46
Easy to execute and we can also access different kind of hardware different kind of Frameworks that exist and we don't need to change our application. We can simply change the configuration in the host
15:02
so Yes, it's we have another acceleration framework and maybe we can think that this is not gonna be easy to use But let's take an example and see how we can extend to the excellent See if it is easier not so let's get a typical vector addition example in open cell which can be executed in the CPU or in
15:22
the FPGA and the steps that usually happens is that we set up the bitstream in the FPGA and the FPGA starts the Configuration with transfer of course we transfer the data to the FPGA then we invoke the kernel as soon as it's ready and We also get the results back to the host
15:42
So this is what the application is already doing so if you have this application already running in your machine, the only thing that you have to do is that somehow you need to live if you defy the application and that's instead of just exposing an API to do that and The next thing is that you can integrate
16:02
the library in the VXL as a plug-in and we have a very simplistic API that you can use and Therefore the application will be seen as a plug-in for the VXL Later, you can also update VXL just adding one more API to the VXL RT So the application can directly use it with the correct parameters, of course
16:25
So I will give you a short demo of how this works Using UNIcraft specifically we can we Yeah, I will transfer a bit so we can have a maybe numerous classification at first and then we can see how this how a
16:47
CUDA blast could operation can be executed during the CPU in their GPU without any changes and Maybe some FPGA if we have time so Okay, this is not good
17:00
This is better so we are in a typical working environment for So we are in a typical environment for UNIcraft we have we have created our application we have a New lib we we're not gonna use actually and we have also UNIcraft. So let's go to
17:26
Here so this is a repo that we have created I will show it to you later. So this is I want to show you so
17:40
Here you can see that we only have we only expose 9 PFS and we use it because We want to transfer the data inside the UNI kernel. So we're not gonna use any network. We're just gonna say Directory with the VM and the only need that we need to do is port to select the XLRT And that's all as you see we don't have any libc
18:00
We do because we don't need it for the specific Example so these are all the applications that are currently running in UNIcraft. You can try them out by yourself So let's we're gonna use image classification so
18:22
We'll take some time. Let me Take some time to build but I will also try to show you the how the application looks like As soon as it finishes and it should finish right now almost
18:42
okay, and Not the application So as you can see Yeah, we can skip the reading the file So this application is quite simple like we have a session that we have to create with vxl with the host Then we simply Call that this is the the function that's called the XL limits classification
19:06
it has the arguments that also needed and then we simply release the resources that we have Used so I will try to
19:21
Do an image classification for? This beautiful hedgehog that we have here and Let's see what's gonna happen Okay, so all the all these logs that you see here are from the JETN inference plug-in And we see that we have a hedgehog
19:43
so it was Identified and the thing here is to You can see that all of these logs are not from the unique kernel all of these logs are from the host that is running I Can also show you
20:00
this small Demo with the kubo with some Operations for arrays using kudo so Same the same same here. We are just we're gonna export the backend First we're gonna use a no-op
20:22
Plug-in which is simply doesn't do anything We you can mostly maybe used only for debug so We have here the application which is a skin and You can see that it doesn't do anything because it's just a no-op up like in it doesn't
20:42
Do anything special so we can change the configuration the host and specify that the backend that we want to use is the actual kuda implementation for maybe CPU Yes Okay, so then we will run it and you will see that we have the
21:05
Actually, it's a min-max operation. It's not the skin and Then you can also we will also run the same thing in a GPU Again, we are just in the host again We can simply change the configuration and now we start it again the unit kernel and we get the
21:28
the result from the GPU you can also All these debug messages you can remove them, of course
21:40
So we also have the yes This is also min-max still No, no, we will go to this game, I don't know do we have time still Yeah, okay. So yeah, we can just Use this it's again. No, nothing happens
22:00
Nothing really special we will do the export for to specify the CPU plug-in again and we will execute and we'll see that the execution time it's quite not very big but it's a just remember that number and
22:21
Now we will run it in the GPU and you can see here that The execution time is much better than before And that's all we can also so the
22:44
The FPGA which is Okay, so this is an FPGA, right so we need to have a bitstream And this is a black schools application, by the way And we will run it natively in the beginning and then we will also run it in the UNIcraft
23:03
so First we just run the application natively and you can see all of the logs and everything will be execution in the FPGA and Then we can we will see how this is executed in a unikernel so
23:30
This is I forgot to solve it, but I will so it will explain later What are all of these things usually what we have to do is just to export the vxl backend that we want to use
23:41
That's how we configure the host to Use a specific plugin and then we have the chemo command that I can explain in more details after this video It still this is from the unikernel now and we access the FPGA and we have the black schools operation running there and
24:04
We also have one more FPGA application, but I think you got the point You can also we have all these links for the videos and everything in the in our talk in FOSDEM So you can also see them from there
24:21
Let me talk a bit about chemo The chemo plugin that we have. This is a bit more. This is just from our Apple, so here we need the chemo which has the Virta.io backend for vxl and if
24:41
Unikraft for example had support for VSOC, we didn't have to use the Virta.io backend so we didn't have to modify chemo, but since chemo is since we have No VSOC support then we have to use the Virta.io and therefore we change a bit chemo adding the backend
25:01
as you can see here and these are all the Already you already know from the previous talk all the configurations for Unikraft the command line options I can I will also show you our Docs we have here
25:22
an extended documentation You can find how to run the vxl application in VM how to run it remotely We also have it It doesn't show here, but we also have
25:41
Okay, maybe more Okay, so here we also have all the All the things that you need to do to try it out by yourself in Unikraft and All of all of them are open source you can check them out and you can clone them by yourself
26:03
so Let me Return so currently vxl has bindings for we we actually released the version 0.5 and We currently there is bind we have language bindings for C C++ Python rust and also for tensorflow and
26:26
We We have the plug-in API that I talked before talks before about extending vxl you can also see how it How it is these are all things that we have tested and we support right now so from the hypervisor
26:44
Perspective we have support for chemo over vtayo and vsoch and for these new rust vmm's like firecracker cloud hypervisor and dragon ball Regarding unique kernels we have working
27:02
It's currently working in Unikraft and in rambran, but we want to all support it in OSV and maybe some more unique kernel frameworks, and we also have integration with kubernetes kata containers and openfas for serverless deployment and
27:21
These are all the acceleration frameworks that we have tested and work with vxl so The JSON inference that you saw with that we did the immense classification We have tensorflow and pytorch support TensorRT and OpenVINO OpenCL CUDA that you saw with the other demo and regarding hardware
27:42
We have tested with GPUs edge devices like Coral and also FPGAs So in to sum up hardware accelerations are stacks are hard this software stack of hardware accelerators are huge and
28:01
complicated to be ported easily in Unikernels and We we we have vxl Which is able to abstract the heterogeneity both in the hardware and in the software and We it sounds like a perfect fit for Unikernels So if you want you can try it out by yourselves. Here are all the links that you can use and
28:23
Test them out and we would like to mention that this work is Partially funded from two horizon projects saran and 5g complete And we will also like to invite you in the Unicraft hackathon that will take place in Athens at the end of March and
28:46
Thank you for your attention if you have any questions, we will be happy to answer them Thank you so much Bobby so for the third time welcome you in Athens in late March for the hackathon
29:01
If there are any questions from from the audience Yeah, please Emit Johnson Thank you Great great stuff. I have a question about the potential future and the Performance that we are currently maybe possibly losing through the usage of API and transport. What do you think is a
29:24
potential in more increase of performance given that framework Yeah, actually the transport is actually yes, it's a bottleneck since you have all these transfers that take place but
29:41
We we think that At the end we will have still very good execution times very good performance and it's also important to mention that We can also you can also set up the environment and everything So you can minimize the transfers. For example, you can have your model
30:03
you don't have to like if you have a tensor flow model or anything like we we are working on how it can be done and Fetching prefetching it before you deploy the function in the host and having everything there So you don't have to transfer from the VM to the host and vice versa and all of these things Actually, if I may intervene so that these are two issues
30:24
The first issue is the all the all the resources the models the the out-of-band stuff that you can do in a in a separate API In a cloud environment in a serverless deployment and the second thing about the about the actual transfers for
30:42
Virtio or Reshock the thing is that since we we semantically abstract the whole operation You don't have to do kuda memcpy, kuda malloc, kuda something, set kernel, whatever and you don't have this latency In the in the in the transfer So it minimizes the overhead just to the part of copying the data across
31:04
So the actual data the input data and the output so This is really really minimal So in in VMs that we have tested we have tested remotely, but the network is not that good So we need to do more tests there, but the in VMs that we have tested the overhead is less than 5%
31:24
for for any much for any much classification of 32k to a meg something like that. So it's it's really really small the Overheads for the transport layer both Virtio and Visock the Visock part is a bit more because it serializes the stuff through to protobufs and the
31:43
Visock is a bit complicated, but the Virtio stuff is really super-efficient Hi, so thank you for the talk my question would be kind of almost on the same thing, but from the security perspective, so if we
32:01
Kind of offload a lot of computation out of the unikernel to the host again, I guess Security and at least the isolation is a thing to think about So if you any words on this topic Yeah, you can take it We agree yes
32:23
There are issues with security because you you essentially you need to run on it I'm gonna turn them to be isolated and now we we push the execution to the to the host So one one of the things that we have thought about is that
32:41
When you run that on a cloud environment the vendor should Make sure that whatever Application is supported to be run on the host should be secure should be audited. So the user doesn't have All the possibilities available. They cannot just exec something in the host
33:02
They will they will be able to exec specific stuff that are audited in libraries in the plug-in system So one one approaches this Another another response to the security implications is that At the moment you have no opportunity to run
33:24
From from a unikernel Hardware accelerated workload, so If if if you want to be able to deploy such an application somewhere then you can run Isolated and you can you can use the whole
33:42
Hardware accelerator And have the same binary that you that you would deploy in a in a non secure environment so you could secure the environment But have this compatibility and software supply mode
34:01
using a unikernel using this semantic abstraction Any other question Yeah, please So my question is similar to the first question But I'm wondering because you can also do you can also do GPU pass-through via
34:25
Can you and KVM and just pass the GPU to To a virtual machine. So I'm wondering what is the performance difference between doing that and Doing it in fear. Yes, actually we want to evaluate that and we need to evaluate it and see how for example with the even
34:45
pass-through Directly like exposing the whole GPU to the VM This could be also one baseline for the evaluation Currently we I don't remember if we have any Do have any?
35:00
measurements Already Yeah, but I mean if we have any in my name Like Okay, and yeah, so Actually from like a GPU virtualization for example
35:22
It's I'm not sure how many how many VMS can be supported in one single GPU. For example, I don't I'm not aware of any solution that can scale to Like a tens of VMS or even even tens of VMS, I'm not sure if there is any existing solution for that
35:47
But yes, we plan and we want to do some extended evaluation on compared also to some like let's say virtual GPU that exists or even the pass-through and native execution we want to do that and
36:04
Hopefully we can also publish the results in our blog Okay. Thank you. Any other questions? Yeah, so in response to the first security question about
36:24
Yeah, we are offloading now compute to the hypervisor and host So does it imply that there is a possibility to break out of the containerization with the Excel
36:42
well, there's Yes, yes Go this is going to be executing on the host in a privilege level Yes, but the other option is what
37:00
So, yeah We We are actually working we want to see what Available sources we have there. How can we make it more secure how we can sandbox it somehow to make it Look better, but on the other hand like for example in FPGAs, there's no mmu There's nothing if you run two kernels one kernel can access if you kind of know what to do
37:26
One kernel can access all the memory in the whole FPGA for example So in one hand you also need support from the hardware and Regarding for example the software stuff we are looking on it and see how this can how can we
37:41
extend and make it more at least increase the difficulty for Having a knee So so for for example in the in the cut the containers integration that we have so when you when you spawn a container The Soundbooks
38:02
the container in a VM our Our agent the host part of the Excel is running on on on the same soundbox not in the VM of outside But it runs on the in the in the sandbox. So yes, there is code executing on the host But it's in the soundbox
38:27
Anything else Right, if not, thank you Anastasia. Thank you Bobby's