We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Hardware acceleration for Unikernels

00:00

Formal Metadata

Title
Hardware acceleration for Unikernels
Subtitle
A status update of vAccel
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Unikernels promise fast boot times, small memory footprint and stronger security but lack in terms of manageability. Moreover, unikernels provide a non-generic environment for applications, with limited or no support for widely used libraries and OS features. This issue is even more apparent in the case of hardware acceleration. Acceleration libraries are often dynamically linked and have numerous dependencies, which directly contradict the statically linked notion of unikernels. Hardware acceleration functionality is almost non-existent in unikernel frameworks, mainly due to the absence of suitable virtualization solutions for such devices. ​ In this talk, we present an update on the vAccel framework we have built that can expose hardware acceleration semantics to workloads running on isolated sandboxes. We go through the components that comprise the framework and elaborate on the challenges in building such a software stack: we first present an overview of vAccel and how it works; then we focus on the porting effort of vAccel in various unikernel frameworks. Finally, we present a hardware acceleration abstraction that expose semantic acceleration functionality to workloads running as unikernels. ​ We will present a short demo of some popular algorithms running on top of Unikraft and vAccel show-casing the merits and trade-offs of this approach.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Computer hardwareRead-only memoryInformation securityBootingPersonal digital assistantServer (computing)Service (economics)Software frameworkProcess (computing)Maß <Mathematik>Graphics processing unitField programmable gate arrayDemosceneVisualization (computer graphics)Run time (program lifecycle phase)Operator (mathematics)Device driverTensorSummierbarkeitVirtual realityPartition (number theory)Entire functionStack (abstract data type)Bound stateCharacteristic polynomialHypercubeOverhead (computing)CodeAerodynamicsFunction (mathematics)ImplementationLatent heatSoftware frameworkComputer hardwareStack (abstract data type)ImplementationSemiconductor memoryInformation securityLatent heatBootingCartesian coordinate systemVirtual machineSystem callUniqueness quantificationPortable communications deviceServer (computing)Kernel (computing)Arithmetic meanBinary codeType theoryComputer architectureWeb 2.0Goodness of fitFitness functionFunctional (mathematics)BuildingSoftwareFigurate numberProjective planeUniformer RaumContrast (vision)Task (computing)BitVirtualizationPower (physics)CASE <Informatik>Partition (number theory)Remote procedure callExecution unitWorkloadBefehlsprozessorOperator (mathematics)Run time (program lifecycle phase)Field programmable gate arrayPoint cloudDifferent (Kate Ryan album)Device driverProcess (computing)Parallel portMereologyFunctional (mathematics)CodeDynamical systemEntire functionProgram flowchart
Stack (abstract data type)DemosceneRun time (program lifecycle phase)Library (computing)Core dumpFunction (mathematics)ImplementationSoftware frameworkCodeComputer hardwareMilitary operationComputer-generated imageryNetwork socketSocket-SchnittstelleAbstractionIdeal (ethics)Identical particlesAdditionHeat transferArray data structureField programmable gate arrayStreaming mediaKernel (computing)DisintegrationShared memoryBefehlsprozessorGraphics processing unitInferenceMultiplication signCartesian coordinate systemSoftware frameworkLibrary (computing)BitDemo (music)MathematicsComputer hardwareDifferent (Kate Ryan album)Cellular automatonResultantCore dumpLatent heatOperator (mathematics)MappingCodeImplementationMedical imagingEuclidean vectorSoftwareParameter (computer programming)Virtual machinePlug-in (computing)Instance (computer science)Figurate numberInferenceEndliche ModelltheorieBefehlsprozessorSystem callConfiguration spaceOpen setConnectivity (graph theory)Heat transferKernel (computing)Functional (mathematics)Computer configurationInterface (computing)Revision controlNetwork socketSocket-SchnittstelleFront and back endsDevice driverInformationSubsetComputer animation
Software repositoryIntegrated development environmentCartesian coordinate systemLibrary (computing)
Configuration spaceRun time (program lifecycle phase)Physical systemLibrary (computing)Parameter (computer programming)ParsingResource allocationSemiconductor memoryClient (computing)AbstractionRead-only memoryBinary fileWeb pageStandard deviationCoroutineInterrupt <Informatik>SubsetFunction (mathematics)Event horizonProcess (computing)TelecommunicationNetwork socketSocket-SchnittstelleCurvatureNetwork topologyArrow of timeMenu (computing)Structural loadArchitectureComputer-generated imageryComputer configurationMilitary operationLoop (music)BildsegmentierungKernel (computing)Directory serviceMedical imagingCartesian coordinate systemSource codeComputer animation
Directory serviceMobile appConfiguration spaceExecution unitThread (computing)Function (mathematics)String (computer science)Compilation albumCore dumpLocation-based serviceMaxima and minimaError messageStatisticsCartesian coordinate systemMultiplication signComputer file
Cloud computingEmailComputer-generated imagery19 (number)Directory serviceEvent horizonCartesian coordinate systemLimit (category theory)Parameter (computer programming)Functional (mathematics)Medical imaging
DemosceneGraphics processing unitInferenceComputer iconBefehlsprozessorCore dumpOperations researchDirectory serviceEvent horizonGamma functionPrice indexInformationKeyboard shortcutFunction (mathematics)Computer-generated imageryMobile appLoginPlug-in (computing)InferenceKernel (computing)Computer animation
Computer-generated imageryFront and back endsCartesian coordinate systemDemosceneOperator (mathematics)Array data structure
Formal concept analysisComputer-generated imageryHill differential equationInstallation artGamma functionExecution unitScalable Coherent InterfaceChemical equationCore dumpGraphical user interfaceDrill commandsInterior (topology)Convex hullInclusion mapConfiguration spaceFront and back endsImplementationBefehlsprozessorOperator (mathematics)Execution unitKernel (computing)
Maxima and minimaExecution unitHill differential equationGamma functionElectronic program guideSchmelze <Betrieb>ResultantMultiplication signMessage passing
Maxima and minimaAvatar (2009 film)Sign (mathematics)Wechselseitige InformationMIDIRun time (program lifecycle phase)NumberBefehlsprozessor
Execution unitRepeating decimalMIDIDegree (graph theory)Maxima and minimaSimulationMoment of inertiaIntelMotion blurRight angleCartesian coordinate systemComputer animation
MechatronicsEvent horizonSlide ruleHash functionBefehlsprozessorSign (mathematics)Error messageMusical ensembleTrigonometryOpen setInformationField programmable gate arrayPrincipal ideal domainFunction (mathematics)Convex hullLine (geometry)Run time (program lifecycle phase)Gamma functionLocal ringMobile appKernel (computing)Execution unitNormed vector spaceoutputFront and back endsComputer animation
WebsiteEmulatorLevel (video gaming)Error messageEmulationBuildingConfiguration spaceHash functionFreewareParameter (computer programming)Arithmetic meanExecution unitBefehlsprozessorInformationPlug-in (computing)VideoconferencingLink (knot theory)Point (geometry)Operator (mathematics)Field programmable gate arrayComputer animationSource code
InformationKeyboard shortcutLocal ringBitPlug-in (computing)Front and back endsConfiguration space
InformationBootingComputer-generated imageryWeb pageElectronic program guideBuildingSource codeBinary fileInstallation artLink (knot theory)Core dumpLibrary (computing)LoginCartesian coordinate system
Content (media)Table (information)EmulationBuildingGraphics processing unitInstallation artIntegrated development environmentCodeDisintegrationKernel (computing)Fiber bundleHypercubeBinary fileVirtual machineVirtual realityVacuumRecursionFlagBefehlsprozessorSoftware repositoryVariable (mathematics)Message passingLibrary (computing)Configuration spaceCloningCorrelation and dependenceSample (statistics)MaizeCore dumpDemosceneFormal languageKeyboard shortcutSystem programmingPoint cloudSoftware frameworkComputer hardwareMaxwell's equationsLogicField programmable gate arrayInferenceTensorStack (abstract data type)Perfect groupHorizonProgrammer (hardware)Complete metric spaceOpen sourceSoftwareFitness functionSoftware testingFormal languageKeyboard shortcutPoint cloudDemo (music)Kernel (computing)Computer hardwareINTEGRALRevision controlInferencePerspective (visual)Stack (abstract data type)Graphics processing unitUniqueness quantificationProjective planePerfect groupLink (knot theory)HorizonComputer animation
Physical system10 (number)Binary codePerformance appraisalVirtualizationResultantLevel (video gaming)BlogKernel (computing)Dependent and independent variablesComputer configurationSource codeFunction (mathematics)Overhead (computing)Point cloudMeasurementLibrary (computing)TensorCartesian coordinate systemFunctional (mathematics)Moment (mathematics)Operator (mathematics)Heat transferInformation securitySoftwareoutputAsynchronous Transfer ModeDifferent (Kate Ryan album)Run time (program lifecycle phase)Virtual machineWordBitComputerEndliche ModelltheorieSet (mathematics)MereologySoftware testingAbstractionSemiconductor memoryComputer hardwareDataflowResonatorIntegrated development environmentMultiplication signVector potentialGoodness of fitField programmable gate arrayINTEGRALSoftware frameworkCodeFilm editingComputer animation
Computer animationProgram flowchart
Transcript: English(auto-generated)
Hi everyone, so it's my pleasure to introduce Babis and Anastasios. They're going to give you the talk on using VXL for for hardware acceleration in your kernels. Babis, please. So hello everyone, I'm Babis. My actual name is Geraldo's minus, but you can just call me Babis.
So we're gonna give a talk about hardware acceleration and our effort to having some support in the uni kernels And we do that with VXL. So Oh, okay
Yeah, put that over there and maybe you can just keep it here, okay So yeah, we already heard From Simon, so we don't have to repeat what the uni kernels are There are a lot of projects and we know that they are promising. It's a promising technology. We can have very fast boot times
low memory footprint and some increased security We also know some of the use cases for uni kernels which are usually traditional applications that you might have heard like Web servers and stuff like that, but they have also been used for NFV And we think that they are also a good fit for serverless and in general microservices
deployments either in the cloud or the aids and We also think that they can also be a good fit for especially in this case for ML and AI Applications and that sounds a bit weird because as we know MLA and AI workloads, they are quite huge and heavy so
We maybe you have heard about PyTorch. Maybe you have heard about tensorflow. We're not gonna touch them. Don't worry, but What we want to say here is that they're very very heavy frameworks very difficult to add support for them And secondly, we know that this kind of applications are usually compute intensive
Applications that can take a lot of resources And for for that exact reason we see that there is also shift in the hardware that exists in the data centers not only in In the data center, but also in the edge we see devices that are equipped with a lot of new processing units, of course, we have the traditional FPGAs and GPUs, but we also have
Specialized processing units like TPUs and also some ASICs And First of all, as we know ML and AI workloads cannot be executed in unique kernels That's for sure because there is no support for these frameworks. And secondly, there is no support for hardware acceleration
So there is not really any benefit if we cannot if we if we run it in a CPU so I will I will give a small I'm gonna go through the
Acceleration stack and how we can virtualize it with the current approaches So in general what we have it's pretty simple Usually you have an application which is written in an acceleration framework can be OpenCL can be CUDA can be TensorFlow PyTorch all of these frameworks usually underneath that you have the operator for the GPU or maybe a runtime for FPGAs and
Then you also have of course a device driver, which resides inside the kernel So this is what we have to virtualize and As we know unique kernels are virtual machines so we can use the same techniques that we have for virtual machines
We can also use them in unique kernels Some some of these techniques are hardware partitioning, parallel virtualization and remote API So in the case of hardware partitioning we The the hardware accelerator has the possibility to partition itself and we
Assigned this small part of the accelerator to the VM and the VM can access directly the the hardware accelerator This has very good performance on the other hand. We need to have the entire acceleration stack inside the VM from the device driver to
the application To the acceleration framework there is also the case of also I forgot to mention here that this is something that it has to be supported from the device and a device driver needs also to be in the VM and In the case of our virtualization these things are getting a bit better because we can have a generic
Let's say device and then the hypervisor simply manages the accelerator and then we can have The Request to the accelerator managed from the hypervisor so we don't need to have all these kind of different drivers for every
Accelerator inside the VM on the other hand. We still need to have the vendor runtime and the application and acceleration framework In the case of remote API, we even have a lighter approach The Everything is managed from the servers this server must might be even locally in the same as thing or can be a remote server and what happens here is that
the acceleration framework intercepts the the calls from the application and forwards them to the To the acceleration framework that resides on the server This has some performance overhead, of course because of the transport
That happens and it also framework specific so if it has to be supported like there is a remote CUDA for example that supports it so Great, but what is the best for unique kernels in the case of hardware partitioning? This means that we have to port the entire software acceleration stack and every device driver to the unique kernel
Which is not a good and not an easy task Again in power virtualization things are bit better We have to port only maybe one driver But still we need to port all this acceleration stack in the case of a remote API This is something sounds much more feasible
Because we can port only the let's say remote CUDA only one framework but how easy is that and It's not easy because as I said before these kind of frameworks are huge. They have very very Big code base they have Dynamic linking which is
Comes in contrast with the unit kernels and a lot of a lot of dependencies So it's not gonna be easy to be important in any existing unique kernel framework right now so For that We Think that the VXL is suitable for unique kernels, so I will give
Two tasks to present a bit of how VXL is working Yes Okay, thank you, so
Hi from my side too, I'm going to talk a bit about the framework that we're building so We started working on VXL to actually handle the hardware acceleration virtualization in VMs, so it's not tailored to
unique kernels we we have been playing with semantically exposing hardware acceleration functionality from hardware acceleration frameworks to VMs and The software stack is shown in the in the in the figure
We use a hardware agnostic API, so we we expose the whole function call of the hardware accelerated operation and we we we focus on the portability and on interoperability meaning that
the same binary code Originating from the application can be executed in many type of architectures and that it is decoupled from the hardware specific implementation a Closer look to the to the software stack so we have an application this application consumes the VXL API
which has specific support specific specific operations These operations are mapped through a Mapping layer through VXL RT to the relevant plugins which are shown in
Greenish and They actually are the glue code between the API calls and the hardware specific Implementation which in this figure resides in the external libraries Layer and then it's it's the hardware where it executes whatever there is in the external libraries
So Digging a bit more into the into how VXL works. So the the core library the core
Component of VXL Exposes the API to the application and maps the API calls to the to the relevant hardware plugins Which by the way are loaded at runtime? The These plugins are actually glue code between the API the API calls so and
And the hardware specific implementation. So for example, we have an API call of doing image Classification image inference in general the only thing that the application needs to Submit to VXL is I want to do image classify. This is the image
This is the the model then the parameters and blah blah blah and this gets mapped to the relevant plugin Implementation for instance in this figure we can use the JETSON inference image classification implementation which Translates these arguments and this operation to the the actual JETSON inference
Framework provided by NVIDIA that does the image classification operation Apart from the hardware specific plugins we also have the Transport layer plugins, so
Imagine this the same operation the image inference could be executed in a VM using a virtual plugin so these This information the operation the arguments the models everything will be transferred to the host machine
That will use hardware plugin So apart from the from the glue code for the hardware specific implementation. We also have the VM plugins We we also some of the of the of the plugins and the API operation support a
subset of acceleration frameworks such as a tensorflow or or pytorch And what I what I mentioned earlier about the virtio plugins so
essentially what happens is that the the request of the operation and the argument is forwarded to Another instance of of the Vaxel library either in the On on on the hyper on the hypervisor layer or on on a socket interface
So we currently support two modes of operations. We have a virtio driver and currently supporting we support firecracker and kim and So we load the driver on the on the VM This driver transfers The the arguments and the operation to the backend to the chemo backend or the firecracker backend
which in turn calls the Vaxel library to do the actual operation and The the other option is using sockets. So we load a Socket interface as a socket agent on the host. We have the visual plug-in or the guest and they communicate over
simple sockets I'm going to hand over to Bobby For the unikernel stuff, so
How how can Vaxel be used in unikernels? Actually, it's quite easy compared to any other acceleration framework that exists and think is that The only thing that we need to do is just have that Vaxel RT that you see over there
That's the only thing that we need to port because that's and this is a very very thin layer of a C code This can be easily ported to any unikernel that exists and we of course we need some kind of transport plug-in of for to
forward the requests So as Tasos already explained usually the application is the same application that we can run in the host or in any Container or in any VM can be also used in the unikernel the same node changes and it simply uses the specific API of Vaxel and then
We simply forward the request to the host and then we have another version of Vaxel Which is in the host and simply maps to the hardware accelerator framework that is implementing the specific function so This as I said this this Allow us to have the same application running either in the host in the VM without any changes. So it's easy to debug easy to
Easy to execute and we can also access different kind of hardware different kind of Frameworks that exist and we don't need to change our application. We can simply change the configuration in the host
so Yes, it's we have another acceleration framework and maybe we can think that this is not gonna be easy to use But let's take an example and see how we can extend to the excellent See if it is easier not so let's get a typical vector addition example in open cell which can be executed in the CPU or in
the FPGA and the steps that usually happens is that we set up the bitstream in the FPGA and the FPGA starts the Configuration with transfer of course we transfer the data to the FPGA then we invoke the kernel as soon as it's ready and We also get the results back to the host
So this is what the application is already doing so if you have this application already running in your machine, the only thing that you have to do is that somehow you need to live if you defy the application and that's instead of just exposing an API to do that and The next thing is that you can integrate
the library in the VXL as a plug-in and we have a very simplistic API that you can use and Therefore the application will be seen as a plug-in for the VXL Later, you can also update VXL just adding one more API to the VXL RT So the application can directly use it with the correct parameters, of course
So I will give you a short demo of how this works Using UNIcraft specifically we can we Yeah, I will transfer a bit so we can have a maybe numerous classification at first and then we can see how this how a
CUDA blast could operation can be executed during the CPU in their GPU without any changes and Maybe some FPGA if we have time so Okay, this is not good
This is better so we are in a typical working environment for So we are in a typical environment for UNIcraft we have we have created our application we have a New lib we we're not gonna use actually and we have also UNIcraft. So let's go to
Here so this is a repo that we have created I will show it to you later. So this is I want to show you so
Here you can see that we only have we only expose 9 PFS and we use it because We want to transfer the data inside the UNI kernel. So we're not gonna use any network. We're just gonna say Directory with the VM and the only need that we need to do is port to select the XLRT And that's all as you see we don't have any libc
We do because we don't need it for the specific Example so these are all the applications that are currently running in UNIcraft. You can try them out by yourself So let's we're gonna use image classification so
We'll take some time. Let me Take some time to build but I will also try to show you the how the application looks like As soon as it finishes and it should finish right now almost
okay, and Not the application So as you can see Yeah, we can skip the reading the file So this application is quite simple like we have a session that we have to create with vxl with the host Then we simply Call that this is the the function that's called the XL limits classification
it has the arguments that also needed and then we simply release the resources that we have Used so I will try to
Do an image classification for? This beautiful hedgehog that we have here and Let's see what's gonna happen Okay, so all the all these logs that you see here are from the JETN inference plug-in And we see that we have a hedgehog
so it was Identified and the thing here is to You can see that all of these logs are not from the unique kernel all of these logs are from the host that is running I Can also show you
this small Demo with the kubo with some Operations for arrays using kudo so Same the same same here. We are just we're gonna export the backend First we're gonna use a no-op
Plug-in which is simply doesn't do anything We you can mostly maybe used only for debug so We have here the application which is a skin and You can see that it doesn't do anything because it's just a no-op up like in it doesn't
Do anything special so we can change the configuration the host and specify that the backend that we want to use is the actual kuda implementation for maybe CPU Yes Okay, so then we will run it and you will see that we have the
Actually, it's a min-max operation. It's not the skin and Then you can also we will also run the same thing in a GPU Again, we are just in the host again We can simply change the configuration and now we start it again the unit kernel and we get the
the result from the GPU you can also All these debug messages you can remove them, of course
So we also have the yes This is also min-max still No, no, we will go to this game, I don't know do we have time still Yeah, okay. So yeah, we can just Use this it's again. No, nothing happens
Nothing really special we will do the export for to specify the CPU plug-in again and we will execute and we'll see that the execution time it's quite not very big but it's a just remember that number and
Now we will run it in the GPU and you can see here that The execution time is much better than before And that's all we can also so the
The FPGA which is Okay, so this is an FPGA, right so we need to have a bitstream And this is a black schools application, by the way And we will run it natively in the beginning and then we will also run it in the UNIcraft
so First we just run the application natively and you can see all of the logs and everything will be execution in the FPGA and Then we can we will see how this is executed in a unikernel so
This is I forgot to solve it, but I will so it will explain later What are all of these things usually what we have to do is just to export the vxl backend that we want to use
That's how we configure the host to Use a specific plugin and then we have the chemo command that I can explain in more details after this video It still this is from the unikernel now and we access the FPGA and we have the black schools operation running there and
We also have one more FPGA application, but I think you got the point You can also we have all these links for the videos and everything in the in our talk in FOSDEM So you can also see them from there
Let me talk a bit about chemo The chemo plugin that we have. This is a bit more. This is just from our Apple, so here we need the chemo which has the Virta.io backend for vxl and if
Unikraft for example had support for VSOC, we didn't have to use the Virta.io backend so we didn't have to modify chemo, but since chemo is since we have No VSOC support then we have to use the Virta.io and therefore we change a bit chemo adding the backend
as you can see here and these are all the Already you already know from the previous talk all the configurations for Unikraft the command line options I can I will also show you our Docs we have here
an extended documentation You can find how to run the vxl application in VM how to run it remotely We also have it It doesn't show here, but we also have
Okay, maybe more Okay, so here we also have all the All the things that you need to do to try it out by yourself in Unikraft and All of all of them are open source you can check them out and you can clone them by yourself
so Let me Return so currently vxl has bindings for we we actually released the version 0.5 and We currently there is bind we have language bindings for C C++ Python rust and also for tensorflow and
We We have the plug-in API that I talked before talks before about extending vxl you can also see how it How it is these are all things that we have tested and we support right now so from the hypervisor
Perspective we have support for chemo over vtayo and vsoch and for these new rust vmm's like firecracker cloud hypervisor and dragon ball Regarding unique kernels we have working
It's currently working in Unikraft and in rambran, but we want to all support it in OSV and maybe some more unique kernel frameworks, and we also have integration with kubernetes kata containers and openfas for serverless deployment and
These are all the acceleration frameworks that we have tested and work with vxl so The JSON inference that you saw with that we did the immense classification We have tensorflow and pytorch support TensorRT and OpenVINO OpenCL CUDA that you saw with the other demo and regarding hardware
We have tested with GPUs edge devices like Coral and also FPGAs So in to sum up hardware accelerations are stacks are hard this software stack of hardware accelerators are huge and
complicated to be ported easily in Unikernels and We we we have vxl Which is able to abstract the heterogeneity both in the hardware and in the software and We it sounds like a perfect fit for Unikernels So if you want you can try it out by yourselves. Here are all the links that you can use and
Test them out and we would like to mention that this work is Partially funded from two horizon projects saran and 5g complete And we will also like to invite you in the Unicraft hackathon that will take place in Athens at the end of March and
Thank you for your attention if you have any questions, we will be happy to answer them Thank you so much Bobby so for the third time welcome you in Athens in late March for the hackathon
If there are any questions from from the audience Yeah, please Emit Johnson Thank you Great great stuff. I have a question about the potential future and the Performance that we are currently maybe possibly losing through the usage of API and transport. What do you think is a
potential in more increase of performance given that framework Yeah, actually the transport is actually yes, it's a bottleneck since you have all these transfers that take place but
We we think that At the end we will have still very good execution times very good performance and it's also important to mention that We can also you can also set up the environment and everything So you can minimize the transfers. For example, you can have your model
you don't have to like if you have a tensor flow model or anything like we we are working on how it can be done and Fetching prefetching it before you deploy the function in the host and having everything there So you don't have to transfer from the VM to the host and vice versa and all of these things Actually, if I may intervene so that these are two issues
The first issue is the all the all the resources the models the the out-of-band stuff that you can do in a in a separate API In a cloud environment in a serverless deployment and the second thing about the about the actual transfers for
Virtio or Reshock the thing is that since we we semantically abstract the whole operation You don't have to do kuda memcpy, kuda malloc, kuda something, set kernel, whatever and you don't have this latency In the in the in the transfer So it minimizes the overhead just to the part of copying the data across
So the actual data the input data and the output so This is really really minimal So in in VMs that we have tested we have tested remotely, but the network is not that good So we need to do more tests there, but the in VMs that we have tested the overhead is less than 5%
for for any much for any much classification of 32k to a meg something like that. So it's it's really really small the Overheads for the transport layer both Virtio and Visock the Visock part is a bit more because it serializes the stuff through to protobufs and the
Visock is a bit complicated, but the Virtio stuff is really super-efficient Hi, so thank you for the talk my question would be kind of almost on the same thing, but from the security perspective, so if we
Kind of offload a lot of computation out of the unikernel to the host again, I guess Security and at least the isolation is a thing to think about So if you any words on this topic Yeah, you can take it We agree yes
There are issues with security because you you essentially you need to run on it I'm gonna turn them to be isolated and now we we push the execution to the to the host So one one of the things that we have thought about is that
When you run that on a cloud environment the vendor should Make sure that whatever Application is supported to be run on the host should be secure should be audited. So the user doesn't have All the possibilities available. They cannot just exec something in the host
They will they will be able to exec specific stuff that are audited in libraries in the plug-in system So one one approaches this Another another response to the security implications is that At the moment you have no opportunity to run
From from a unikernel Hardware accelerated workload, so If if if you want to be able to deploy such an application somewhere then you can run Isolated and you can you can use the whole
Hardware accelerator And have the same binary that you that you would deploy in a in a non secure environment so you could secure the environment But have this compatibility and software supply mode
using a unikernel using this semantic abstraction Any other question Yeah, please So my question is similar to the first question But I'm wondering because you can also do you can also do GPU pass-through via
Can you and KVM and just pass the GPU to To a virtual machine. So I'm wondering what is the performance difference between doing that and Doing it in fear. Yes, actually we want to evaluate that and we need to evaluate it and see how for example with the even
pass-through Directly like exposing the whole GPU to the VM This could be also one baseline for the evaluation Currently we I don't remember if we have any Do have any?
measurements Already Yeah, but I mean if we have any in my name Like Okay, and yeah, so Actually from like a GPU virtualization for example
It's I'm not sure how many how many VMS can be supported in one single GPU. For example, I don't I'm not aware of any solution that can scale to Like a tens of VMS or even even tens of VMS, I'm not sure if there is any existing solution for that
But yes, we plan and we want to do some extended evaluation on compared also to some like let's say virtual GPU that exists or even the pass-through and native execution we want to do that and
Hopefully we can also publish the results in our blog Okay. Thank you. Any other questions? Yeah, so in response to the first security question about
Yeah, we are offloading now compute to the hypervisor and host So does it imply that there is a possibility to break out of the containerization with the Excel
well, there's Yes, yes Go this is going to be executing on the host in a privilege level Yes, but the other option is what
So, yeah We We are actually working we want to see what Available sources we have there. How can we make it more secure how we can sandbox it somehow to make it Look better, but on the other hand like for example in FPGAs, there's no mmu There's nothing if you run two kernels one kernel can access if you kind of know what to do
One kernel can access all the memory in the whole FPGA for example So in one hand you also need support from the hardware and Regarding for example the software stuff we are looking on it and see how this can how can we
extend and make it more at least increase the difficulty for Having a knee So so for for example in the in the cut the containers integration that we have so when you when you spawn a container The Soundbooks
the container in a VM our Our agent the host part of the Excel is running on on on the same soundbox not in the VM of outside But it runs on the in the in the sandbox. So yes, there is code executing on the host But it's in the soundbox
Anything else Right, if not, thank you Anastasia. Thank you Bobby's