Virtual Networking in the NFVi
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 57 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/54442 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
openSUSE Conference 201753 / 57
3
6
10
15
16
19
21
23
24
27
28
29
34
36
38
40
41
42
46
47
51
53
55
00:00
Virtual realityComputer networkDuality (mathematics)Escape characterVisual systemTime evolutionComputing platformOverlay-NetzSoftwareÜbertragungsfunktionData centerEvoluteVirtualizationSlide ruleOpen setContext awarenessTwitterComputing platformOpen sourceOcean currentComputer animationMeeting/Interview
01:15
Computer networkData recoveryIntegrated development environmentServer (computing)Core dumpDistribution (mathematics)Communications protocolNetwork topologyInterior (topology)Nim-SpielVisualization (computer graphics)Link (knot theory)Data storage deviceAddress spaceValue-added networkSystem programmingData managementInformation securityArchitectureVirtual realityPlane (geometry)Time evolutionSoftwareVirtual machineLink (knot theory)Integrated development environmentBackupType theoryScaling (geometry)Limit (category theory)NumberVisualization (computer graphics)VirtualizationPlastikkarteQueue (abstract data type)Service (economics)Flow separationPhysicalismCore dumpComputerData centerPlanningConfiguration space2 (number)Plug-in (computing)Data recoveryComputer programData storage deviceSpacetimeFunctional (mathematics)Archaeological field surveyHigh availabilityPolar coordinate systemInternetworkingMultiplicationPoint cloudInformation securityServer (computing)EvoluteComputer architectureRouter (computing)Address spaceVirtual LANDistribution (mathematics)Network topologyMixed realityCloud computingDifferent (Kate Ryan album)VideoconferencingComputer animation
07:12
Computer networkComputer hardwareFunction (mathematics)Service (economics)SoftwareVisualization (computer graphics)TelecommunicationComponent-based software engineeringVirtual realityOperations researchSimilarity (geometry)Control flowEnterprise architectureConnectivity (graph theory)Open sourceOpen setProcess (computing)Vector spaceSinguläres IntegralStandard deviationVirtual machineServer (computing)SoftwareComputer hardwareOpen setComputer architectureComponent-based software engineeringINTEGRALGame controllerPlanningEnterprise architectureFrame problemPublic domainStandard deviationPresentation of a groupSinguläres IntegralNetwork socketTelecommunicationProcess (computing)Slide ruleInternet service providerNavigationType theoryQuery languageMixed realityWeb pageSequelAverageExpected valueData centerÜbertragungsfunktionProduct (business)Multiplication signCuboidOpen sourceVirtualizationResultantQuicksortExpert systemRight angleIntegrated development environmentService (economics)WordSoftware development kitOrder (biology)BitProjective planeMereologyPower (physics)Degree (graph theory)Configuration spaceInternetworkingSpacetimeWeb 2.0Reduction of orderComputer animation
16:24
Open setSoftwareSpacetimeGame controllerDatenpfadImplementationStandard deviationRevision controlPlane (geometry)Asynchronous Transfer ModeMilitary operationNormal (geometry)DataflowFrame problemGroup actionField (computer science)Latent heatDenial-of-service attackFrame problemDataflowSinguläres IntegralFirewall (computing)Interface (computing)Configuration spacePlanningSpacetimeQuicksortTable (information)CASE <Informatik>Link (knot theory)SoftwareOrder (biology)NumberMultiplication signAsynchronous Transfer ModeDemonMatching (graph theory)Right angleGroup actionSoftware developerOpen sourceTelecommunicationWeightSoftware development kitField (computer science)Modul <Software>Process capability indexGame controllerException handlingComponent-based software engineeringOpen setAddress spaceGeneric programmingDevice driverPlastikkarteMilitary baseInterrupt <Informatik>Latent heatImplementationStandard deviationRevision controlData compressionMusical ensembleMereologyRadio-frequency identificationBitComputer animation
21:55
Process (computing)Vector spaceSoftware frameworkComputing platformRevision controlOpen sourceFunction (mathematics)Router (computing)Modul <Software>Graph (mathematics)Vertex (graph theory)Plug-in (computing)Run time (program lifecycle phase)Ring (mathematics)CASE <Informatik>Virtual realityTable (information)Public domainBridging (networking)Computer programLocal ringIntegrated development environmentImplementationJava appletKeyboard shortcutPlane (geometry)Data managementSurjective functionLevel (video gaming)DisintegrationMechanism designDevice driverStatisticsComputer networkStack (abstract data type)SpacetimeComputer hardwareCodeConnectivity (graph theory)Standard deviationControl flowCodeDevice driverDifferent (Kate Ryan album)Process (computing)Component-based software engineeringVirtual machineFormal languageStapeldateiComputer programSet (mathematics)Programming languageOverhead (computing)BitProjective planeMereologyExpected valueType theoryWordTelecommunicationPlug-in (computing)Product (business)AbstractionSoftware testingVirtualizationComputer hardwareGame controllerVector spaceFunctional (mathematics)Software frameworkRouter (computing)DataflowGraph (mathematics)Matching (graph theory)PlanningAsynchronous Transfer ModeCache (computing)Server (computing)Common Language InfrastructureLatent heatInterface (computing)Table (information)Real numberMechanism designSoftwareStack (abstract data type)Open sourceSimilarity (geometry)SpacetimeTerm (mathematics)Mathematical optimizationLibrary (computing)Line (geometry)CASE <Informatik>Configuration spaceOpen setVisualization (computer graphics)DivisorOrder (biology)Graph coloringGreatest elementUtility softwareSoftware developerMultiplication signComputer animation
30:39
Web pageArchitecturePlug-in (computing)Parallel computingComputer programCommon Language InfrastructureInterface (computing)Pairwise comparisonGroup actionVector spaceProcess (computing)Control flowPlane (geometry)Communications protocolCache (computing)Open setDataflowLevel (video gaming)Local GroupDuality (mathematics)Many-valued logicoutputNormal (geometry)Proxy serverOpen sourceMultiplicationConvex hullData miningSima (architecture)DisintegrationDaylight saving timeComputer networkUsabilityDevice driverMechanism designGame controllerGeneric programmingProcess capability indexDevice driverMechanism designComponent-based software engineeringType theoryDataflowGame controllerSlide rulePlug-in (computing)BitFormal languageNavigationComputing platformSoftwareAsynchronous Transfer ModeWeb pageINTEGRALOpen setPhysical systemKeyboard shortcutMatching (graph theory)Computer architectureVector spaceGroup actionGraph (mathematics)Process (computing)Functional (mathematics)Level (video gaming)Parallel portCommunications protocolComputer configurationTerm (mathematics)BildschirmtextPole (complex analysis)MereologySpacetimeComputer animation
35:32
Computer hardwareComponent-based software engineeringFunction (mathematics)Vertex (graph theory)Network topologyPhysical systemMultiplicationRead-only memoryNetwork socketSystem programmingUniform convergenceBefehlsprozessorFormal verificationTopologyChemical affinityLocal ringComputer networkTask (computing)Decision theoryLevel (video gaming)Core dumpVirtual realityFreewareVector spaceRootoutputSingle-precision floating-point formatRevision controlDevice driverRepresentation (politics)Bridging (networking)Social classBitComputing platformContext awarenessVirtual machineCore dumpMatrix (mathematics)DiagonalAutomatic differentiationHypercubeArithmetic meanBus (computing)Decision theoryQueue (abstract data type)Order (biology)System callOverhead (computing)Musical ensemblePlastikkarteLimit (category theory)Functional (mathematics)DigitizingSoftwareComponent-based software engineeringComputer hardwareInformation securitySemiconductor memorySpacetimeVirtualizationSingle-precision floating-point formatState of matterRevision controlDevice driverComputer architecturePhysical systemDifferent (Kate Ryan album)Stress (mechanics)Network topologyConnected spaceDefault (computer science)CASE <Informatik>WordScalabilityLevel (video gaming)ResultantLatent heatServer (computing)Spherical capData centerComputer programProcess capability indexThread (computing)Process (computing)Right angleExpected valuePersonal digital assistantVapor barrierSimilarity (geometry)Computer animation
45:18
INTEGRAL
45:53
Interface (computing)Type theoryCASE <Informatik>Meeting/Interview
46:28
HypermediaComputer animation
Transcript: English(auto-generated)
00:08
Hi, everyone. Thanks for coming to this talk. Hopefully you had a good lunch, and this is going to be interesting so you don't fall asleep. I'm Marco Verlese. I'm working for the networking team in SUSE.
00:24
And today I'm going to talk to you about virtual networking in the NFVI, which stands for network function virtualization infrastructure. So, I'm going through quite a few slides from the evolution of the data centers to a few
00:41
NFVI concepts, focusing on two main V switches that we currently see in the open source. Open V switch and VPP. And then I'm going to talk to you about a few things around platform awareness. Just out of curiosity, who of you has any idea what NFV stands for and what it is?
01:10
Current trends? Okay. So according to a survey that Cisco ran in 2013, 77% of the data center traffic
01:27
was in the data center itself. Now, if you think about it, it's quite a lot of data that is produced and consumed within the data center itself.
01:44
Which is amazing if you think about the actual use that we do with our PCs when we access the Internet. So we usually consume data, video, news, or upload data. So there's a lot of that going on.
02:02
And still 77% of that traffic is within the data center itself. Now, if you for a second think about what that means, if you want to simply stick with the physical environment, well, you're going to see in the bare metal situation, you're
02:26
going to see one OS per machine. The networking is going to be organized in the usual access, distribution, and core layers. You're going to see all the traditional L2 and L3 problems that you would see in
02:45
the data center. Talking about spanning tree, for example. And what's worse is if you need to offer any type of high availability to your customers, if you're a cloud, for example, cloud provider, that can only be done with physical.
03:02
So that means physical machine to a physical machine or with physical links between them. So the issues obviously connected to this is that it's a constrained environment. It doesn't really scale. And it's really, really expensive.
03:21
And it's complex. The network is obviously going to be subutilized because you cannot really scale over a certain limit. If you have any failures, as I said, the backup or the high availability type of scenario is going to have a very slow recovery.
03:41
And you'll find a situation where you're going to hit pretty soon what are the limits of what you're capable of. Number of MAC addresses, number of VLANs, how you're going to partition your network in the data center to scale to a different size.
04:04
So visualization is not a new concept, but the use of it in the networking space is pretty new. Talking about four, five years. The thing is with virtualization we solved a lot of the compute issues.
04:21
So for example, now we can run multiple VMs on a single machine. We can do efficient storage. We can visualize storage. And obviously we can also visualize network cards. I'm sure that you have all heard about virtual functions on using SLOV or multiple queues,
04:43
virtual queues and all the magic that you can do with that. What that means, though, is that we start seeing a brand new type of issue. We now are talking about not just traffic hitting the server, the compute node, but we're actually seeing traffic that hits west, which means VM to VM goes on the same server
05:08
and still has to be handled by something. We have obviously the introduction of VXLAN, which created a huge number of endpoints that
05:24
are now possible to be reached versus what was possible with VLAN. And obviously we're talking about things like intraserver security. If you are a provider, if you're offering a service, how you make sure that somebody
05:40
that's running a service on one virtual machine, now that virtual machine can talk to another virtual machine on the same host, how you guarantee the right separation between the two of them, talking about network traffic. It was quite clear that there was a need of a new architecture or revisiting what
06:03
was used before and trying to adjust it to these new scenarios. So these pictures tries to basically give you an idea of what has been the evolution of a data center. We started with everything being in physical bare metal environment.
06:23
We then introduced the virtual data center with all the nice aspects of visualization and the computes. Then there was the introduction of the V switches, which I'm going to talk to you about in a second. And the next evolution of it was the V routers.
06:42
And then just in the recent years, we have seen also the use and the more use of the extensible data planes. These are things like BPF, eBPF, XDP from the Iovisor community, which offer a great
07:06
configurability and great programmability of data planes on Linux. So what is NFV? Well, first of all, NFV stands for network function virtualization.
07:22
And it offers the opportunity to decouple the network function from proprietary hardware and have them run in software. If you think about what was the market in the networking space until a few years
07:41
ago and still predominant, it was all made of, for example, hardware switches where a company like Cisco, Juniper, and others had their big presence in the market. And why was this architecture started and thought about?
08:05
Well, it started with service providers who wanted to basically accelerate the deployment of new services to increase the growth and also to reduce the amount of money that they
08:23
spend with hardware. As a result, the European telecommunications institute basically married this idea and started supporting it, and a lot of companies joined in, and a lot of experts people started
08:41
producing a lot of draft papers, et cetera. So it became really sort of standard in the architecture in the way it looks like. So the NFV basically allows you to reduce your capital expenditure, which here is called CAPEX, so you can reduce the amount of machines that you buy.
09:06
Reduce the OPEX, which stands for operational expenditure, because obviously buying less machines means that in the data center you can use less power, you can use less cooling to keep the temperature okay for the equipment, and it also accelerates the
09:28
time to market, because now everything is virtual, everything can be deployed with a click, you can configure things in an easier way, so people that want to try out new
09:41
things, that want to test new things, have the chance of doing it, and it also offers a great degree of agility, because what you can do, you can easily scale up and down just by turning on a new virtual machine, switching it off based on the needs, it
10:04
would have been much, much harder if you want to basically go and buy a server or configure a server on the fly when you need it. So in the NFV world, the architecture is quite vast, and obviously there are different
10:23
layers, there are different components, a very important part of it is played by the data plane, by the control plane, and there is this concept of vSwitch, which stands for virtual switch, a virtual switch is a software component that basically allows you to run
10:46
traffic between VMs and can also allow you to have that traffic reaching the outside world, so what happens is that the VM can communicate with another VM on the same physical machine, and the same VM, if needed, can reach the internet, for example.
11:10
What's nice about a virtual switch is that it basically leaves within the hypervisor itself, so now the hypervisor has a network functionality as well, added to the overall
11:23
virtualisation methodologies and techniques that's implemented in there, and basically it's much easier now if you want to roll out a network functionality to roll it out
11:41
on your lab and then in your production environment. You can simply add a new feature in a vSwitch and try it out. And what's nice about it is that you can even be on a machine that is not connected to any other machine and you can still do all your
12:01
stuff with VMs. One of the biggest challenges that are faced by the NFV infrastructure in general is the very different requirements that are present in the different sectors
12:23
of the industry. For example, on this slide I picked requirements from an enterprise data centre versus a telco provider. So if you think about it, in an enterprise data centre most of the traffic is even these days just 10 gigabit, while telco networks have 40 plus
12:46
gigabit network requirements. If you think about the packet sizes, which are really the frames that are sent by the machines and handled by the machines in a data centre,
13:01
in a data centre you would expect this traffic to be a mixed type of traffic. Imagine there are queries coming for web pages, MySQL queries, Oracle queries, intramachine, machine learning type of stuff. While the telco network has to deal with a lot of control packets
13:27
and control packets are usually very, very small packets, we're talking about 64 bytes packet, 96 bytes packet, the average being 72 or 74. Obviously another requirement
13:44
is the expectation of the customers in these two different domains. So an enterprise data centre really wants to get the software out of the box, install it and just run with it. A telco network instead usually focuses a lot on customisation because that's how
14:06
they differentiate between another provider. They can offer you one thing versus another feature. So they customise products a lot and they spend a lot of time and money in doing that. With regards to performance, here I'm using
14:24
the word none for the enterprise data centre as a sort of the more the better. On the other hand the telco providers have very strict requirements on for example things like latency
14:41
and jitter. Those requirements are not coming just out of the box, they're coming from standards. So if you implement for example a 3G network you have specific requirements, if you have a 4G network you have much stricter requirements and now with 5G things
15:03
are going just crazy. So it's very hard to pick one solution that can fit all this. Usually they're tailored solutions and people need to really know exactly what they can achieve with each subcomponent in order to meet their requirements.
15:25
So as I said at the beginning in this presentation I'm going to focus mainly on two features. One is the open vSwitch and the other one is the VPP. On the open vSwitch side I'm going to be much quicker because we have another presentation later on today that I actually
15:44
encourage you to attend, it's about OBS and DPDK integration and it's a little bit more on the VPP. What's interesting about these two projects, they can run and use the standard
16:01
kernel path communication for handling packets, so standard socket based, but they can also integrate with DPDK which stands for Data Plane Development Kit and is the cutting edge in the open source for packet handling and packet processing.
16:27
So what is open vSwitch which abbreviated is OBS? So it's a software based solution as I said earlier it's a vSwitch so it's software and offers a flexible controller in user space where we actually see a daemon running and you have all the nice tools that
16:46
you can use to basically instruct the switch with specific flows for example. And it has what's called the fast data path in user space. As I said I'm going to talk about the OBS DPDK, if it was the standard OBS this data path is actually implemented within
17:06
the kernel. It also provides an implementation of open flow, so if you want to use open flow to configure and instruct your virtual switch you can do so. And it's based on Apache 2 license. So just because I said to you that there are two incarnations of
17:26
OBS, I think this picture is quite nice, it basically shows you how the two differs between them. On the left hand side there is the OBS standard version which is the one
17:41
using the kernel and on the right hand side there is the one that basically integrates with the data plane development kit. So on the left hand side there is another component which is a kernel module which is called open vSwitch.co that runs in the kernel and communicates with the user space through a net link infrastructure. What happens is that
18:06
the data plane is handled in the kernel and any exceptions to this data plane which means basically a flow that is not learned or a flow that is not configured, a packet that basically missed it will be sent to user space where a daemon will handle this
18:26
packet to basically either be configured or discarded. On the right hand side instead the kernel module disappears and all you have is a very small interface which is offered by either the IGB UIO, UIO PCI generic or the VFIO PCI kernel module to expose
18:48
the network cards bars to the drivers that are actually running in user space and these the name is coming because all these drivers are not usual drivers that are driven by interrupts
19:05
but instead they keep polling the NIC for packets coming in. In this case the forwarding plane is sitting in user space itself so everything runs in user space. The way
19:24
that OBS works is it can run in two different modes called normal and flow based. The normal mode basically acts as a standard layer 2 switch. What basically does is it puts the
19:42
network card in learning mode so it can learn new flows coming in and it forwards the frames to the previously learned DMAC or basically can also flood the frame depending on the configuration that you set it. If you set it to flow mode instead all of a
20:05
sudden you see your vSwitch behaving as a sort of firewall as well because anything that is not being configured on the vSwitch is basically dropped. So you will have to have flows in your flow table to load traffic through your vSwitch. Since I mentioned the
20:27
concept of flow table, what a flow table is is that it's literally a table and it's composed by a match and an action part. A match is basically allowing you to specify
20:45
the fields on the packet that you want to match against. Imagine it could be the source MAC, the DMAC, it could be a source IP or destination IP, you can pretty much match on whatever you want in the packet. When a packet comes in it goes through this flow
21:07
table and when a match is found we call it hit and the action according to this match will be performed. So you can have, for example, forward, which means that packet will be forwarded to, for example, a DMAC address. Interestingly, you can also use wildcards
21:32
for the matches which basically allows the user to simplify some use cases, for example, in many cases you don't care about source port or source MAC might be relevant
21:46
or not, so you can wildcard those in order to reduce the number of flows that you have in your flow table. If you have any questions you can interrupt me any time.
22:03
On the other end we have VPP. VPP stands for vector packet processing and it's, as the word says, opposed to the concept of a scalar processing. It's an extensible framework
22:21
that offers really production quality for switch router functionality. The reason why is production quality is that VPP comes from many years of learning and developing of Cisco who basically donated this software component to the open source community 18 months ago and they've
22:50
made their products, so it's pretty stable and being production tested. VPP is part of the bigger project called FIDO which is written FD.io and is hosted by the Linux
23:08
Foundation. One of the differences of VPP compared to other VPP projects is that it
23:20
is obvious that it implements a flow table and each packet will go through this flow table and obviously you're going to have a performance hit because of that, because depending on where your flow sits in the flow table you'll be quicker or slower to
23:40
find your match. If you have thousands of entries in your flow table your packet will go through it and it could basically reach the bottom part before being processed. On the other end VPP, what it does is implementing a graph and not only that but
24:03
it's also processing packets in a vector based mode. What that means is that instead of having each single packet that comes from the NIC going through the graph one by one, the whole lot of packets in a vector mode will be processed by each node on the
24:25
graph. You may think that it's basically the same thing, it's actually not and it's also different from the concept of pulling from the NIC in batch mode. This is real
24:43
data plane processing. What happens is that because it's being handled as a vector we are seeing less instruction cache misses, less data cache misses. There is a real optimisation in terms of cache utilisation and actually all this helps boosting the
25:07
performance of your appliance, of your server really, really considerably. It's also very flexible because if you need to add a new functionality you don't have
25:24
to know all the details of the overall design and the overall code, which trust me is quite a bit of code base. What you can do, you can create a new node which in the VPP world
25:42
is a plugin, so you write your new code following the plugin interface, you implement it and you can plug it in at a specific part of the graph where you require it to be. So it's very flexible and it's quite developer friendly.
26:04
There are many use cases in the VPP world. The first one is the virtual virtual router. What happens is that together with the library sets that you get with VPP, you also get
26:22
a CLI and this CLI allows you to basically run through the common line the configuration required to basically set up a virtual switch or a router. And in literally five or six
26:42
different commands that you run between the Linux part and the VPP specific part, you have a vSwitch going on between two end points, being them virtual machines or being them containers. It's very, very straightforward and easy.
27:04
At the same time, VPP offers both local and remote programmability and the way they do it is to basically have their set of API on top of their code. They do not
27:27
give you. The interesting thing is the API available are for a really big set of programming languages. You can use C, C++, Java, Python, Lua, so you can basically pick the language
27:44
you prefer and they all have the same interface. It also offers remote programmability. Basically you can imagine you would like to integrate your virtual switch or virtual router with
28:01
things like OpenDelight, for example, which is the SDN controller de facto these days. It offers a specific interface and it offers also currently available projects that allow you to control VPP through OpenDelight.
28:26
I think recently enough, six, eight months, it also integrates directly with OpenStack through the ML2 plug‑in mechanism driver. The only thing is the code is not part of
28:43
VPP. You will have to pull it from GitHub, from the OpenStack GitHub and in this way you can basically skip the overhead of integrating VPP with ODL and then ODL with OpenStack.
29:01
You could go straight directly talking OpenStack to VPP in a similar way as it's possible for OBS. As I said earlier, VPP is a high performance user space network stack and the
29:23
interesting bit is that it can run on commodity hardware. This is very similar to OBS. You can pick next86 machine, whatever it is, you can install Linux and then install VPP that will run. Obviously you can expect different type of performance based on the machine
29:41
that you're deploying it on. The same code can run on the host or in VMs or Linux containers. In fact, as I said, it's very, very easy also thanks to the nice abstractions done by the API to set up the same type of setup scenario, whether being using VMs for communications
30:08
or containers. As I said, it basically integrates with DPDK
30:21
which is currently the best of the breed open source driver technology for packet processing and is extensible using the plug-ins interface. That's pretty much it.
30:41
If we think about the differences or if we put them side by side, because they use DPDK, they have already a lot of commonalities in terms of what you require on your platform to make this run. For example, you will have to enable the huge page support, whether
31:07
being a two megabyte or one gigabyte pages, you will have to use one or the other. You will have to enable the IOMMU, which if you want a better performance you can set
31:26
it up as a pass-through mode. This is why you need the Intel IOMMU option to enable SRIOV on your system. And obviously you will need things like VTX, VTD enabled in
31:44
both, which will require you to pick a specific driver, whether it be IGB, UIO, UIO PCI generic or VFI UO PCI and basically bind your NIC to those drivers, which then eventually
32:01
will be used by the pole mode drivers in DPDK to take the packets out of the NIC. Instead, if we think about the design, if we look at the architecture and the design of the two different components, two different software, they look very differently. In fact,
32:24
we can talk about apples and oranges. The OpenViz Witch, as I said earlier, it's based on a match action type of approach, while VPP is based on vector packet processing using a graph. The OpenViz Witch has this concept of the fast path versus the slow
32:45
path. Instead, VPP is more focused on the extensibility of the functionality via plug-ins. Similarly, VPP has always been taught and designed with a high level of parallelism
33:06
and as I said, it doesn't really think about integration with controllers per se, like SDN controllers, it does not implement OpenFlow, as I said. While OVS has been more focusing on the northbound part of the stack, thinking about things like OpenFlow
33:25
integration and they also support the OVS DB, which is another protocol which allows you to basically configure the V switch itself. I'm not going through this slide in details, it's more if you download it, you can actually
33:44
see it. They're all very much features-rich, they offer pretty much the same type of functionality in one flavor or the other. Something that I just learned two months
34:03
ago is in VPP, there's going to be very soon a full TCP stack implemented in user space and that's going to open up a lot of more scenarios and more things to be done with it, which is going to be very interesting to be seen.
34:23
With regards to the integration bit, how these two components talk to other components in the much bigger NFV orchestration architecture? Well, as I said, the OpenV switch allows
34:41
to basically speak many languages. It allows the support of OpenFlow, supports OVS DB, and it has a straight integration with the ML2 mechanism driver in OpenStack. VPP has also now the integration with OpenStack directly through the ML2 driver mechanism.
35:05
However, I'm not aware of being used and deployed heavily. While they stick a little bit more and emphasize more on the OpenDelight integration with OniComp.
35:24
Whether being a positive thing or not, it does not support OpenFlow. As I said, I just wanted to also touch base a little bit on the platform awareness.
35:41
There's a lot ongoing with Ardor in general and what you can do with it. One thing that I would like to stress is because these two words, acceleration and offload, are very often used interchangeably. They actually mean two different things when you think about
36:02
what you do with your Ardor or with your software. The acceleration is to take advantage of either techniques or methodologies which allow you to improve any aspect of the performance of your software stack, being throughput, latency, scalability. The offload instead, what we mean is to defer to a third
36:27
party component, which is usually Ardor, the full execution of a given functionality. You can imagine, for example, taking advantage of a hardware capability to do checksum,
36:40
and that's pretty much enabled by default on every single network card driver, or TSO for TCP segmentation offload, or recently VXLAN, in-cap, de-cap, whether being stateful or stateless, many network cards these days offer that functionality as well.
37:01
But what that means is that software has nothing to do with that execution, it just tells the Ardor to take care of it, and usually the software stack just gets a call back to be told what the result of that execution was. Another important aspect, just because more and more this architecture is present these
37:26
days, is NUMA, which stands for Non-Uniform Memory Access. What this is, is that basically these days most of the servers used in data centers are made of blades, and those blades
37:40
can talk to each other through a backplane, and eventually anyway your OS is deployed on the overall machine. So your OS now can see the different nodes, which obviously already have multiple cores, and for whoever has done multi-threading programming is already
38:02
aware of what multi-threading problems are, and if you scale it up to a NUMA platform then it means that you will also have to take care of and know where those cores with those threads are running. Because if you, for example, start crossing the interconnect
38:22
bus which on Intel is called QPI, you will start paying down latency, your throughput will decrease, and obviously your system doesn't work as expected. So being aware of, for example, where your PCI and where your memory is plugged onto, and where your software
38:42
is running, on which node, on which cores, and how it communicates with other processes on different cores is very, very important. I think in this regard a very good tool that we have on Linux is the NUMActl tool, which very easily can show you the topology
39:04
of your infrastructure, of your machine, can basically show which are the cores numbered on which nodes, how much memory is actually attached to which node, and which node is handling which PCI devices. Because in some cases the PCI connectivity is specific to
39:26
specific nodes. So if you can see this on the right hand side there is an example of a NUMActl which shows on the machine that I was using four different nodes, it's
39:43
quite a powerful machine, and on each node 24 cores, and it also shows at the end the topology of it. So basically highlighting what is the gap between each node, and obviously because it's a matrix you would expect the diagonal of the matrix to be constant because
40:05
it refers to the same node, and all the rest with an increased cost. And in fact if you can see it, the diagonal there is shown as 10, meaning that from node 1 to node 1 I'm actually paying nothing, and then I would pay 21, so more than double
40:26
the cost to reach from node 1 to node 2. And the rationale is that I'm going through a bus that connects the two nodes. Another aspect is related to the hardware
40:41
assistance and hardware accelerations for network cards specifically. Now before I can actually talk about the two acceleration techniques, let me tell you how it works on the basics. So when you have your hypervisor and it's visualizing your NIC which basically
41:03
takes packets from an RXQ and sends packets to a TXQ, that hypervisor has to do two very basic and fundamental things. One is sorting the packets, and the other one is routing the packets. So in order to send the packets to the right VM, what it does
41:24
is basically taking decisions on an L2 base, which is similar to what a switch does. In the first case, instead of acceleration technique, which is called VMDQ, is basically
41:41
taking advantage of virtual queues available on the RX side and on the TX side, so that these queues can be directly mapped to a specific VM. You may think that is a very similar concept but it's not, because what these queues are connected with is also with IRQs,
42:06
and what you can do is that you can map IRQs to be handled by specific cores. And by pinning specific IRQs to specific cores, you're basically reducing the amount of context switch and overhead of other cores doing other things, and instead they're focusing
42:24
exactly on the particular traffic that is going to hit your VM. The only issue, although this was the first version of this acceleration, there is still
42:41
one bit missing here to make it better. In fact, in this scenario, the hypervisor still has to do the last memory copy of the packets coming from the virtual queues to the VM, or from the VM to the TX queues in case of a send traffic. And that problem
43:04
is actually solved by the other technique, which is called SRIOV. With SRIOV we're not talking about virtual queues anymore, we're talking about virtual functions. It's a much more advanced way of basically dealing with your network card that offers barriers between
43:24
virtual functions and all the security that goes with it. But if we stick with the problem associated with memory copies, then there is no need now for the hypervisor to perform the memory, the last memory copy, or the first memory copy anymore, because
43:43
the virtual function is directly mapped into the VM memory space. It's directly managed by a specific driver, which is a slightly different version of the network driver that you would use on the host, and what allows you is to basically deal with the memory
44:07
in the network card itself directly from a VM. So the hypervisor is completely bypassed and does not take care of it anymore. There are some caveats, obviously, around
44:24
the use of SRIOV. Not all good things come for free. Depending on which network card you use, you may or may not have SRIOV functionalities. Usually all the very expensive network cards have it. Similarly, there is a limit to the amount of virtual functions
44:45
that you can have. I think the one that does the most is this has 64 virtual functions available, which means that you can have on one single Ethernet port 64 virtual devices.
45:02
But not more than that. So if you want to have a 65th VM, then you cannot, or that VM cannot talk on the network card, or you have to find different ways. And with this, I'm open to questions if you have any and you're still alive.
45:27
You mentioned that VPP has DPDK support. What about ODP?
45:42
ODP also has an integration with DPDK. Right, so it would still go through DPDK, you wouldn't be talking VPP direct to ODP. Come again? So you've got the VPP interface to DPDK, there's no interface to ODP, you'd still
46:03
go through DPDK to then get to ODP. Yes, there's no VPP, ODP type of scenario. If no more questions, thank you.