We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Cross-platform/cross-hypervisor virtio vsock use in go

00:00

Formal Metadata

Title
Cross-platform/cross-hypervisor virtio vsock use in go
Subtitle
Usermode networking in CodeReady Containers
Title of Series
Number of Parts
287
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
CodeReady Containers runs an OpenShift cluster on a laptop or workstation using virtualization. It's written in go, and uses KVM, HyperV or HyperKit depending on the OS it's running on. External network access is done through gVisor's userland TCP/IP stack which the virtual machine uses over virtio-vsock. This talk will start with a short presentation of what CodeReady Containers is, explain why it needs a userland TCP/IP stack, but its main focus will be around virtio vsock, how to use it from go, and the differences to expect on the different hypervisors.
Asynchronous Transfer ModeWeb pageCodeVirtualizationPresentation of a groupCross-platformMultiplicationOperating systemVirtual machineException handlingNP-hardVirtualizationWordSoftware developerSlide ruleMereologyCommunications protocolWebsiteWindowGoodness of fitDiagramMeeting/InterviewComputer animation
Open setShift operatorLaptopSoftware testingHybrid computerPoint cloudComputer-generated imageryVirtual realityBinary fileComputer networkSoftwareWindowEnterprise architecturePerturbation theoryVirtual machineRevision controlLaptopInstance (computer science)MultiplicationSoftware testingProjective planeMoment (mathematics)VirtualizationStack (abstract data type)BitProduct (business)Software developerPoint cloudDemosceneStability theoryData managementFilm editingMedical imagingBeat (acoustics)Multiplication signInternet service providerMatching (graph theory)Computer animation
Virtuelles privates NetzwerkFirewall (computing)Computer networkKernel (computing)Interface (computing)Physical systemTelecommunicationPresentation of a groupCodeTelecommunicationOrder (biology)WordVirtual machineDifferent (Kate Ryan album)Computing platformBlogCodeDirect numerical simulationBitWindowDesign by contractRight angleGodMereologyRevision controlProcess (computing)Perspective (visual)State observerMathematicsProjective planePresentation of a groupAnalytic continuationDependent and independent variablesChannel capacityConnected spaceSoftwareCartesian coordinate systemWeb 2.0Client (computing)Stack (abstract data type)Interactive televisionKernel (computing)Rule of inferenceLink (knot theory)Configuration spaceFirewall (computing)Server (computing)IP addressComputer animation
Computing platformSocket-SchnittstelleACIDAddress spaceModule (mathematics)Kernel (computing)Cross-platformWritingSoftwareRegular graphSlide ruleVirtual machineCartesian coordinate systemTelecommunicationKernel (computing)Link (knot theory)Multiplication signMereologyDemo (music)Address spaceSocket-SchnittstelleNetwork socketSinc functionMultiplicationRevision controlComputer-assisted translationGroup actionMedical imagingWebsiteSpecial unitary groupCondition numberTrailMoment (mathematics)Magneto-optical driveComputer animation
Virtual realityKeyboard shortcutLibrary (computing)Patch (Unix)Configuration spaceVirtual machineCodeOrder (biology)Utility softwareProjective plane
ImplementationTelecommunicationConnected spaceImplementationVirtual machineFood energySoftwareInterface (computing)Ferry CorstenDigital electronicsTwitterCausalityAxiom of choiceFunctional (mathematics)Associative propertyCASE <Informatik>Order (biology)Depth-first searchInformation securityTelecommunicationCodeProgrammer (hardware)Socket-SchnittstelleÜbertragungsfunktionLibrary (computing)Network socketWeightComputer animation
Game theoryCodeObject (grammar)DialectLatent heatConnected spaceInstance (computer science)WeightMereologyTelecommunicationContext awarenessInterface (computing)Block (periodic table)Regular graphWebsiteAtomic nucleusOrder (biology)Core dumpGoogolCausalityBlogJSONXML
Network socketTelecommunicationTelecommunicationNetwork socketMereologyDifferent (Kate Ryan album)System on a chipProcess capability indexMobile appDirectory serviceDigital electronicsComputer animation
Cantor setWindowMultiplication signConnected spaceException handlingDigital electronicsWeightSocket-SchnittstelleExpected valueInterface (computing)Software testingRegular graphNetwork socketJSON
Kernel (computing)Module (mathematics)TelecommunicationHypercubeCodeSocket-SchnittstelleImplementationWindows RegistryVirtualizationElectronic program guideTemplate (C++)Module (mathematics)Link (knot theory)TelecommunicationStructural loadEqualiser (mathematics)Level (video gaming)Connected spaceNumberKernel (computing)SoftwareWindowImplementationWindows RegistryCodeRule of inferencePerspective (visual)Bus (computing)WeightDirectory serviceDifferent (Kate Ryan album)BitWritingTemplate (C++)Key (cryptography)Computing platformSubject indexingSpeech synthesisService (economics)Cohen's kappaNumbering schemeUniformer RaumCodeData miningCapability Maturity ModelSpecial unitary groupNetwork topologyTournament (medieval)Computer animation
World Wide Web ConsortiumImplementationSoftware testingService (economics)NumberBus (computing)Address spaceDialectConnected spaceClient (computing)Game theoryCorrespondence (mathematics)State of matterView (database)JSON
Network socketVirtualizationImplementationWindowBitView (database)Beat (acoustics)CASE <Informatik>Virtual machineCodeInformationShift operatorLevel (video gaming)Physical systemGodWebsiteObject (grammar)Zirkulation <Strömungsmechanik>Revision controlArithmetic progressionSoftware bugAnalytic continuationVirtualizationOrder (biology)Standard deviationSoftware frameworkError messageProjective planeKeyboard shortcutNetwork socketData managementSocket-SchnittstelleLink (knot theory)SoftwareComputer animation
YouTubeInformation technology consultingService (economics)Source codeSoftwareEnterprise architectureInternet service providerElement (mathematics)BitOrder (biology)JSONComputer animation
Software development kitPlot (narrative)Machine visionWave packetGroup actionDecision theoryFerry CorstenCausalityBitLevel (video gaming)VirtualizationCodeCircleSoftware frameworkUtility softwareComputer animation
Computer animation
Transcript: English(auto-generated)
Hello everyone, welcome to my presentation about using Virtio VISA in Go on multiple operating systems and on multiple platforms. First, a few words about myself. I'm Christophe Fargeau. I've been working at Red Hat for more than 10 years as a software developer.
I used to be working in the virtualization team on SPICE, which is a protocol to remotely access your virtual machines. Nowadays, I'm part of the CodeReady containers team, which I will give more details about in a few slides. So, what are we going to discuss today? First, CodeReady containers.
Then, we will go quickly over the user-made networking that CodeReady containers is using. Then, we will focus on the VISA usage in Go, on Linux, on macOS, and on Windows. So, what is CodeReady containers? It's a way of running a Red Hat OpenShift 4 cluster on your laptop or on your desktop machine.
So, basically anything you have locally, you can run OpenShift 4 on. What is OpenShift? It's basically an enterprise-ready version of Kubernetes made by Red Hat. So, yeah, it's a Red Hat product, so as a Red Hat employee, it's kind of expected I'm working on that.
So, why do we do that? Basically, it allows you to quickly start a cluster, to break it as much as you want, and then to just drop it and recreate it, which is very useful when you do some development and you need to do some debugging, it's not really stable.
You don't really want to break your production cluster or something shared with others, and it's also useful for testing. It can also save you some money, because if your Kubernetes instance is running on AWS or some cloud provider, it can be a bit expensive to do a lot and a lot of testing there.
So, it gives you some convenience of running all of that locally and some flexibility. So, this is working on Linux, macOS, and Windows. So, under the hood, it's a Go binary, which is named CRC, which is why the project is quite often known as CRC and not code-ready containers.
Together with that binary, we have a big virtual machine image, multiple gigabytes. It's a Linux virtual machine, and in that image, we made an installation of OpenShift, which is customized to be running on a single node. So, instead of needing lots of machines to have lots of worker nodes,
to have lots of master nodes, you have just this virtual machine with one node, and it can fit in a virtual machine running on a laptop or on a desktop. So, it's running on multiple platforms, and on each platform, we're trying to use the native hypervisor.
So, this is QEMU KVM on Linux, managed by Libvot. On macOS, we're using hyperkits, and on Windows, we use Hyper-V. It also uses a user-made networking stack for the VM networking. We will see in a moment why. So, the user-made networking, why?
It simplifies the virtual machine networking, because we are running on three different hypervisors. Each time, the networking configuration is platform-specific. You might need administrative rights on the machine to be able to do these changes. So, it can quickly become a bit messy and harder to test,
because you have to take into account three platforms and the various versions of each platform. So, yeah, it made things quite simpler from that perspective. The consistent IP addressing, it means that the IP address that the VM gets is the same on the platform, which is also a bit simpler,
that's what we had before. And most importantly, for our customer users, it allows us to avoid some strict firewall rules and some strict VPNs, which would either block the connection from the host to the virtual machine running locally, or some VPN clients might decide to redirect all the network traffic through the VPN,
which is not going to work when this network traffic is supposed to go between the host and your local VM. So, this really helps with all of this. So, this user-managed networking is managed by gVisor temp-visoc. I gave a link to the project. It's being used by CRC and Pudman machine at the moment.
It's based on gVisor. It's using gVisor for the big part of its code. Basically, gVisor is an application kernel written in Go, and so we reuse the networking stack of this kernel in order to implement a lot of the networking that we need.
So, how does it work? Let's say it's fairly simple, a bit complicated, but well. So, in your virtual machine, this starts in the upper left corner. So, you try to reach fuzzdem.org with curl. So, we have a tap adapter in the VM in order to redirect the network traffic
to help a process running inside the VM. This simple process then sends the network traffic to the host of a result, which is the part we're going to focus on later. And then the host gets the network traffic. It sends the network packets through gVisor, and eventually they are sent to the fuzzdem.org web server.
And then the fuzzdem.org web server sends a reply, and it goes back the same way, but in the opposite way until it reaches curl.fuzzdem.org, and this gets the response, and then the dialogue can continue between the virtual machine and the remote server.
So, this is implemented on Linux, macOS, and Windows. So, the VSOC bits are also working on macOS, Windows, and Linux. So, we always have a Linux virtual machine. It's only on the host that we have cross-platform code for the VSOC interaction. Regarding the user-run networking, I will be giving more details tomorrow in the Continuous Dev Room.
And now, we are going to focus on the Virtio VSOC communication. And now, we get to the crux of this presentation, which is how do we use Virtio VSOC on multiple platforms. So, first, what is it? What is Virtio VSOC? It's detailed in the QEMU Wiki, but in short, it's a way of communicating between a guest and a host
using regular POSIX sockets API. So, you use socket, you use read, you use write. And you don't need to do anything special. You can have multiple channels of communication between the guest and the host.
And specifically, all magic allows easy communication between these two. So, each part of the communication, each machine taking part in the communication has its own address, which is called a CID. So, the host is always CID 2, and then the VM, they each have a different address.
It starts from 3, but it can be an arbitrary one. And then, applications can connect to or listen on the port. You can have multiple ports in a given VM, and then each application can use a different port for the VSOC communication.
So, on Linux, how do we use Virtio VSOC? So, assume we have a Linux virtual machine and a Linux host. So, first, we need to configure the VM to have a VSOC device. We'll see in the next slide how it's done. And then, on the host, you need to load the vhost VSOC kernel module,
and you need the VSOC device slash dev slash VSOC to be accessible by the user. And what this is done, what is nice is that a lot of low-level tools, networking tools, have VSOC support. So, we can just use them to create VSOC communication, very similarly to what we would be doing for our regular TCP IP communication
or for Unix socket communication. So, for example, I can use Netcat from the nmap package, the Netcat version from nmap. So, ncat dash dash VSOC dash dash listen 2222. This will start listening on the current machine, which can be the host or the virtual machine.
It will start listening on port 2222. And then, on the other side, so in this case, it's the VM, I can also start Netcat, this time with just dash dash VSOC and no dash dash listen. I need to provide the CID. So, since I'm on the VM, I want to talk to the host. As I said before, the host is CID 2.
So, I put a 2 there, and then the port number, and then the communication is established between the host and the VM, and I can just send data. You cannot see it there because it's not animated, but first, the VM sends hello host to the host, and it appears on the other side, and then the host sends back hello VM, and it appears on the VM side.
I recorded a short demo about that, but I won't have time to show it now. I will put a link in the slides if you want to watch it. So, the virtual machine configuration. With Libyart, it's very simple. You just add a VSOC device to the virtual machine. It has to be using VirtIO, and for the CID,
either you tell Libyart, okay, just auto-assign 1, auto equal yes, or you can specify the one you want if you want to hard code it. In Go, there are some quite nice Libyart bindings in order to interact with Libyart XML, which are the Libyart XML Go bindings, and so I also put some simple code
showing how you could do that programmatically. You don't need to parse the XML and then patch it manually if you want to do some editing in Go. You can just use this very convenient library. So, if you are a programmer and you want to do some communication over VirtIO VSOC, so we'll be talking mostly about Go here
because this is what we are interested in, how does it work? So, if you look at Go documentation, you can find some low-level LVSOC support, so you can use the very low-level UNIX API for networking with IFVSOC. So, it's basically a mirror
of the C library implementation of sockets. So, you have a socket call, you have read and write calls, but it's not the networking functions that you are used to be using in Go. In Go, when you have networking code, usually you use the net package, which has the coin interface
in order to read and write from the network, and which has the listener interface in order to wait for connections from other machines. So, if you want implementations for that, for VSOC, you need to use an external package, which is mdataher.vsoc,
but once you have imported that package, you get implementation of these convenient interfaces, and you can use VSOC in your Go code as if it was a TCP IP connection or if you had a net listener for TCP IP. So, how does it look?
So, first you need to know the CID and the port that you are going to be connecting to. So, in this case, the VSOC package has a constant for the host, but I could just use 3, 4, 5 if I was connecting to a VM. And then the VSOC package provides a dial method, which is very similar to
what you do for TCP IP or UNIX communication. You just use dial as well. So, very similar API. And if the dial succeeds, you have an instance of an object implementing the coin interface from the net package.
So, you can just use it in all the methods which accepts a net coin object. So, for example, you can use io.copy to copy data between the guest and the host, and it's just going to be working fine. Any context in Go where you can use a net coin,
you could use the one returned from VSOC.dial. So, once you have the connection, it's fairly easy to integrate that with your pre-existing code. And for the listening part, it's almost the same. So, once again, it has a listen method, which returns a listener implementing
the net listener method. So, you run VSOC.listen and you specify the port. And that's about it for the VSOC specific code. Once you have that, you can go to listener.accept, which is a method of the net listener interface.
This returns a coin, this blocks until you get the connection, and it returns a net.coin when it succeeds. And then you can use that connection wherever you want. So, once again, I use io.copy here. And it just all works as regular Go code. And so, this is it for the Linux part.
We can switch to the Mac OS via your VSOC support. So, on Mac OS, we are using Hyperkit for the app supervisor. So, if we want to use a VirtIO VSOC device, we have to specify that on the command line. So, we add a dash s.
So, we need to put the PCI slot, then the VirtIO SOC device. Then we specify the CID that we want to use for the guest. And then we have to specify the path. Why do we need that path? Because with Hyperkit, the VSOC communication is happening over a Unix socket and the host. And so, it's the
part where the directory where the Unix socket is going to be put in. And then the Unix socket has a magic name. It needs to be named CID dot port in hexadecimal. And then Hyperkit will know, OK, if the guest
tries to use this port and has this CID 3, then the communication is going to happen over that Unix socket. If you want to use a different port, you will have to use a different Unix socket. And so, how does it look in Go code? So, this time we don't need an external package
because Go has out-of-the-box support for Unix sockets. So, we build the name of the path that we want to connect to. And then we can just call net.listen on this Unix socket. And we get a listener, a regular net listener
interface that we can use to wait for connection from the VM to the host. And when that happens, then we can do something with the connection that we get. The connection part, if we wanted to connect to the VM over VSOC, I didn't put an example for that because we're not using
this in CRC and I didn't really have time to test if it was working as expected. But basically, I would just use the same Unix socket name and use net.dial to do a connection over this Unix socket. So, now on
Windows. On Windows, it's a bit more complicated, it's a bit different. It's not really VSoC, Vaterio VSoC communication which is happening, to be honest. So, what is going on on Windows? So, inside the guest, the Linux guest, the Go code is exactly the same as on the other platforms.
We use this VSoC package we saw before. We create a connection using vsoc.dial. It's all the same from the perspective of the Go code. It's at the kernel level that you have to load a different kernel module, hfvsoc. And why? Because on Windows,
they already have something in Hyper-V which is a bit similar to Vaterio VSoC, but obviously, which is implemented very differently. And so, this hfvsoc kernel module knows how to tunnel
the Vaterio VSoC data over this native Hyper-V VM bus. So, this is why on Windows, we need to load that kernel module. It can be done in an adjective rule saying, okay, if I'm on a Windows hypervisor, just load that kernel module. And so, the communication is going over that
VM bus from Hyper-V. And so, this means on the host, we are not really going to be writing Vaterio VSoC code, code to interact with that. We are going to write Go code to interact with the Hyper-V VM bus. Once again, we have to use an external Go package for that, which is in Linuxkit slash VirtSoC
and then there's a hvsoc implementation over there. Very similarly to what we had on Windows, we have implementation for net.con and net.listener in that package. So, we just need that package to create the connection, but once we get the connection, we can use all the
usual Go APIs to do stuff with our network connection. One more thing on Windows is that you need to explicitly enable VSoC communication in the registry. For that, you have a special directory with the path there, virtualization, guest communication services.
This registry needs to contain a key with the following template. So, at the end of the template, it always has to be this way and at the beginning, you will put in hexadecimal the port number that you want to communicate with the VM.
So, you put the virtio-vsoc port number in there and this will enable the virtio-vsoc communication for that VM. So, I put some links giving more details about that because it's a bit magic, but remember you have to have this registry key. Once you have done that, the Linux VM
is running, it's trying to do some virtio-vsoc connections. How do you start listening for these connections? You use this external hvsoc package. We can see here the magic Go ID I was mentioning. It's used to create the connection between the guest and the host. These 400 at the beginning
it corresponds to the port number that we want. So, then we just call hvsoc listen on any address and the service ID corresponds to that UID. It's a bit magic, but this way, the VM is going to be able to talk to the host. So, the VM will think it's virtio-vsoc and the host
will think it's a Hyper-V VM bus. Once again, I did not put the client code, the dial code. There's an implementation for that in hvsoc, but I did not test it. I expect it's going to be working as well as the listen stuff, but I did not test it.
And so, we are getting to the end of this talk. So, a few more things that we can do. On Linux, it's possible to use systemd-socket activation with vsoc. It's working very well when you use C code, C implementations. With Go, it's a bit more tricky because there's a Go package to deal with systemd-socket
activation, but this package is only using the standard Go packages for networking, which, as I said before, they don't know about vsoc. So, if you want to use that with vsoc, you have to write a bit of
code to special case the vsoc sockets and to be able to deal with them. Otherwise, this systemd-socket activation Go package will send an error saying, okay, I don't know about this socket type. Some more work which is in progress from us on the CSC side
is to support vsoc with the fourth hypervisor, which would be the Mac OS native virtualization framework. With recent Mac OS versions, you have an API which you can use in Swift or Objective-C to manage high level virtual machines. So, we are currently trying to use
this from Go. There are some code bindings for it, and we are trying to switch to that. The main benefit is that it supports M1 out of the box, which is a big missing piece in Hyperkit. One last thing. So, in CRC, since we have all this VM
stuff, the user-made networking working nicely, we also have work in progress to be able to run appointment containers directly on Mac OS and on Windows. So, something a bit similar to a Docker machine, but using the infrastructure that CRC has in order to run
appointment containers. So, it would not just be OpenShift, it would also support an appointment. So, a few useful links there. The two main projects I talked about and my contact information, and we are getting to the end of this talk. So, thanks a lot to everyone for listening. I will be around for a little bit in order to
answer some questions. Thank you.
Sorry, there was one question on the chat, which is asking, why did we use
Hyperkit instead of the native Mac OS utilization framework? Actually, there are two virtualization frameworks on Mac OS. There is the hypervisor framework, which is very low level, which is used by QEMU, for example. And there is a much higher level one, which is called virtualization framework.
So, the hypervisor framework, where we basically needed to use QEMU if we wanted to use it. To be honest, the decision to use Hyperkit was made before I joined the team. So, we just kept using that. For the virtualization framework, it is
only supported in Mac OS 11, which is a bit too new for what we are doing. I mean, it was too new for what we were doing. But, yeah, we started trying to use it. Actually, we have some code working
with the virtualization framework to start code-ready containers on x86 and on the M1. So, it's very likely that we are going to switch to the virtualization framework very soon. Yeah, Vert.ov Circle also works
on the virtualization framework. Once again, the API is a bit weird, but it can still work. Yeah, Hyperkit was a historical decision, but definitely we are going to switch to the virtualization framework when we can. Are there any more questions?
There is one more minute left. If not, well, thanks everyone for listening.