PCI SR-IOV on FreeBSD

Video in TIB AV-Portal: PCI SR-IOV on FreeBSD

98 views

Formal Metadata

Title
PCI SR-IOV on FreeBSD
Subtitle
Hardware-assisted virtualization of PCI devices
Title of Series
Author
Stone, Ryan
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Berkeley System Distribution (BSD), Andrea Ross
Release Date
2015
Language
English

Content Metadata

Subject Area
Abstract
PCI Single Root I/O Virtualization (SR-IOV) is an optional part of the PCIe standard that provides hardware acceleration for the virtualization of PCIe devices. When SR-IOV is in use, a function in a PCI device (known as a Physical Function, or PF) will present multiple Virtual PCI Functions (VF) on the PCI bus. These VFs are fully independent PCI devices that can use the functionality of the PF without the overhead of synchronizing with the driver for the PF or other VFs. SR-IOV allows for great improvements in network performance in virtualized environments compared to traditional software-only network virtualization. SR-IOV is an important virtualization technology supported in a number of hypervisors. Although FreeBSD has long had support for acting as a guest OS in an SR-IOV environment, to date it has not been possible to use SR-IOV in combination with native virtualization technologies like vimage jails or bhyve. This talk will cover the new SR-IOV infrastructure added to FreeBSD PCI subsystem, which allows the use of FreeBSD as an SR-IOV host. Discussion will focus on the use of SR-IOV by system administrators, with the balance of the talk devoted to the kernel API provided to PF driver maintainers.
Loading...
Structural load State of matter Multiplication sign Stack (abstract data type) Cryptography Data compression Kernel (computing) Encryption Spacetime Mapping Keyboard shortcut Shared memory Process capability index Interface (computing) Physicalism Virtualization Port scanner Flow separation Membrane keyboard Order (biology) Website Normal (geometry) Pattern language Quicksort Point (geometry) Read-only memory Game controller Overhead (computing) Virtual machine Process capability index Discrete element method Revision control Root Term (mathematics) Computer hardware Energy level Directed set Hydraulic jump Address space Form (programming) Addition Scaling (geometry) Line (geometry) Cryptography Equivalence relation Uniform resource locator Kernel (computing) Computer animation Software Personal digital assistant Device driver Computer network File archiver Central processing unit
Game controller State of matter Motion capture Process capability index Point cloud Complete metric space Discrete element method Hypercube Virtual reality Term (mathematics) Single-precision floating-point format Computer hardware Directed set Information security Firmware Address space Process capability index Virtualization Port scanner Limit (category theory) Functional (mathematics) Computer animation Integrated development environment Internet service provider Device driver Normal (geometry) Data type
Polar coordinate system Read-only memory Implementation Process capability index Code Process capability index Port scanner Mereology Functional (mathematics) Hypercube Latent heat Virtual reality Computer animation Integrated development environment Positional notation Personal digital assistant Device driver Order (biology) Interpreter (computing) Implementation Extension (kinesiology) Information security Address space
Numbering scheme Electronic data interchange Computer file Variety (linguistics) Modal logic System administrator Process capability index Point cloud Parameter (computer programming) Medical imaging Device driver Data compression Average Computer hardware Encryption Address space Data type Area Default (computer science) Key (cryptography) Computer file Parameter (computer programming) Cloud computing Bit Line (geometry) Software maintenance Functional (mathematics) Computer animation Integrated development environment Smart card Device driver Configuration space Iteration Right angle Key (cryptography) Figurate number Quicksort Data type
Greatest element Mountain pass System administrator Multiplication sign File format Sheaf (mathematics) 1 (number) Function (mathematics) Parameter (computer programming) Coma Berenices System software Variance Type theory Computer configuration Kernel (computing) Error message Metropolitan area network File format Computer file Electronic mailing list Sampling (statistics) Process capability index Interface (computing) Parameter (computer programming) Virtualization Instance (computer science) Port scanner Functional (mathematics) Hand fan Data mining Internet service provider Configuration space Right angle Asynchronous Transfer Mode Booting Finitismus Numbering scheme Computer file Real number Electronic mailing list Rule of inference Field (computer science) Number Mach's principle Device driver Goodness of fit Scripting language Address space Boolean algebra Default (computer science) Pairwise comparison Dataflow Cellular automaton Directory service Set (mathematics) Single-precision floating-point format Kernel (computing) Computer animation Personal digital assistant String (computer science) Sheaf (mathematics) Device driver Formal grammar
Intel Mountain pass View (database) System administrator 1 (number) Sheaf (mathematics) Design by contract Insertion loss Parameter (computer programming) Mereology Disk read-and-write head Information privacy Weight Stack (abstract data type) Semantics (computer science) Medical imaging Mathematics Video game Bus (computing) Central processing unit Flag Resource allocation Stability theory Spacetime Product (category theory) Process (computing) Computer file Interface (computing) Parameter (computer programming) Virtualization Bit Instance (computer science) Functional (mathematics) Message passing Lattice (order) Smart card Internet service provider Bridging (networking) Configuration space Right angle Reading (process) Point (geometry) Read-only memory Slide rule Computer file Real number Motion capture Virtual machine Control flow Process capability index Automorphism Rule of inference Device driver Population density Hacker (term) Default (computer science) Series (mathematics) Dialect Demo (music) Java applet Set (mathematics) Device driver Software maintenance System call Vector potential Table (information) Kernel (computing) Computer animation Personal digital assistant Mathematics Computer hardware Device driver Computer network Vertex (graph theory) Dependent and independent variables
Thread (computing) Code Multiplication sign Archaeological field survey 1 (number) Parameter (computer programming) Shape (magazine) Interface (computing) Weight Encapsulation (object-oriented programming) Food energy Order of magnitude Software bug Synchronization Stability theory Metropolitan area network Email Electric generator Product (category theory) Spacetime Process (computing) Mapping Software developer Sampling (statistics) Interface (computing) Virtualization Instance (computer science) Functional (mathematics) Laser Flow separation Fluid Lattice (order) Internet service provider Software testing Right angle Pattern language Figurate number Quicksort Freeware Data type Slide rule Vacuum Process capability index Online help Latent heat Internetworking Computer hardware Energy level Ranking Software testing Subtraction Proxy server Address space Mobile Web Scaling (geometry) Uniqueness quantification Projective plane Weight Wind tunnel Kernel (computing) Computer animation Software Personal digital assistant Computer network Communications protocol Freezing
Area Metropolitan area network Word Computer animation Demo (music) Demo (music) Virtual machine Virtualization Quicksort Instance (computer science) Remote procedure call
Keyboard shortcut Directory service Function (mathematics) Discrete element method Proper map Mach's principle Insertion loss Information systems Regular expression Message passing Booting Chi-squared distribution Metropolitan area network Sine Flash memory Login Binary file Maxima and minima Inclusion map Image resolution Computer animation Mathematics Function (mathematics) Personal area network Software testing Analytic continuation Figurate number Mathematical optimization Physical system Freezing Window Online chat
Product (category theory) Logarithm Home page Bit error rate Directory service Process capability index Discrete element method Emulation Duality (mathematics) Insertion loss Phase transition Hash function Gamma function Haar measure Message passing Uniform space Metropolitan area network Electronic data interchange Sine Flash memory Lucas sequence Login Grand Unified Theory Binary file Maxima and minima Inclusion map Software development kit Image resolution Computer animation Mathematics Function (mathematics) Boom (sailing) Device driver Software testing Quicksort Permian Physical system Online chat
Home page Cloud computing Directory service Discrete element method Emulation Mach's principle Hash function Single sign-on Information systems Message passing Units of measurement Metropolitan area network Electronic data interchange Sine Flash memory Real number Login Instance (computer science) Binary file Johann Peter Hebel Maxima and minima Inclusion map Category of being Computer animation Function (mathematics) Mathematics Device driver Software testing Physical system Online chat
Ring (mathematics) Mountain pass Mathematical singularity Home page Bit error rate Directory service Discrete element method Hand fan Duality (mathematics) Escape character Sic Hash function Information systems Haar measure Message passing Physical system Chi-squared distribution Metropolitan area network Electronic data interchange Sine Login Ext functor Grand Unified Theory Maxima and minima Inclusion map Computer animation Mathematics Function (mathematics) Software testing Physical system Online chat
Logarithm Home page Time zone Bit error rate Directory service Hand fan Duality (mathematics) Hash function output Gamma function Haar measure Message passing Newton's law of universal gravitation Metropolitan area network Electronic data interchange Sine Flash memory Login Binary file Maxima and minima Inclusion map Computer animation Mathematics Function (mathematics) Device driver Software testing Permian Physical system Online chat
Personal identification number Password Directory service Discrete element method Emulation Image resolution Medical imaging Hash function Message passing Newton's law of universal gravitation Metropolitan area network Computer font Electronic data interchange Sine Login Ext functor Maxima and minima Image resolution Sample (statistics) Computer animation Mathematics Software testing Mathematical optimization Hydraulic jump Physical system Cloning
Greatest element Ring (mathematics) Multiplication sign Source code Home page Pythagorean triple Cloud computing Shape (magazine) Table (information) Software bug Peer-to-peer Computer configuration Atomic number Phase transition Statistics Information Aerodynamics Haar measure Physical system Chi-squared distribution Metropolitan area network Computer font Link (knot theory) Process capability index Physicalism Mass Parameter (computer programming) Bit Port scanner Functional (mathematics) Maxima and minima Internet forum Configuration space Right angle Figurate number Physical system Online chat Router (computing) Expression Server (computing) Freeware Computer file Line (geometry) Virtual machine Directory service Process capability index Electronic mailing list Bit Code Host Identity Protocol Number Mach's principle Social class Read-only memory Message passing Compilation album Series (mathematics) Default (computer science) Serial port Electronic data interchange Dataflow Information Code Login Line (geometry) Limit (category theory) Computer animation Function (mathematics) Mathematics String (computer science) Device driver Revision control FAQ
Laptop Metropolitan area network Constraint (mathematics) Inheritance (object-oriented programming) Key (cryptography) Duplex (telecommunications) Cloud computing Abstract syntax tree Process capability index Social class Word Numeral (linguistics) Hypermedia Broadcasting (networking) Computer configuration Computer animation Device driver Simplex algorithm Bus (computing) Statistics Loop (music) Chi-squared distribution Flag
Ring (mathematics) Logarithm System administrator 1 (number) EPROM Coma Berenices Interface (computing) Weight Variance Data acquisition Statistics Metropolitan area network Petri net Basis (linear algebra) Internet service provider Ext functor Trigonometric functions Maxima and minima Computer configuration Simplex algorithm Website Software testing Data type Slide rule Personal identification number Computer file Duplex (telecommunications) Network operating system Line (geometry) Directory service Electronic mailing list Automorphism Discrete element method Internetworking Video game console Normal (geometry) Loop (music) Data type Sine Military base Code Login Binary file Broadcasting (networking) Hypermedia Computer animation Function (mathematics) Universe (mathematics) Revision control Data Encryption Standard FAQ Flag
Freeware Set (mathematics) File system Computer-generated imagery Source code EPROM Directory service Electric dipole moment Variable (mathematics) Table (information) Social class Virtual reality Read-only memory Intrusion detection system Video game console Integrated development environment Message passing Loop (music) Units of measurement Chi-squared distribution Metropolitan area network Zoom lens Link (knot theory) Tape drive Data recovery Computer file State of matter Process capability index Ext functor Instance (computer science) Maxima and minima Transmitter Broadcasting (networking) Sample (statistics) Computer configuration Hypermedia Computer animation System programming Simplex algorithm Software testing Electronic visual display Central processing unit Block (periodic table) Force Flag
Personal identification number Divisor Multiplication sign Computer-generated imagery EPROM Directory service Revision control Virtual reality Read-only memory Computer hardware Statistics Video game console Associative property Chi-squared distribution Metropolitan area network Link (knot theory) Computer file State of matter Statistics Connected space Transmitter Length of stay Hypermedia Computer configuration Sample (statistics) Computer animation Order (biology) Simplex algorithm Interrupt <Informatik> Electronic visual display Force Flag Booting
my name is rhinestones then bind and here talk about a virtualization technology called PCI answer every single root I O virtualization answered jump right into
what sort of motivated the this the technology so we want of the form of I from typically Serbia were talk about the network access transmitting and receiving packets but history is not strictly limited to that and the typical way that we do this is the archive emerges going to presented a fertilizer paravirtualized advice to the and through some memory-mapped interface the VMC can send receive packets or performance I and but every time you want to do that so you have to essentially performer not call into the hypervisor and and how they do more work so from example again of sending a packet after putting the pattern on the membrane map interface we have to not call into the Hypervisor Hypervisor will take the packet I believe typically even with frailty have to copy the packet and then set it up through the host kernel that so that mean techniques like relevant very successful in getting very good performance but still the problem of there is additional overhead in terms of doing the annexes up going to the high hypervisor really have to go through the host of kernels of IP stack and do another arriving look up to another URL to local so and so forth and really you've already done this in the gas so the the the inevitable additional overhead to this by another problem is that if you want to take advantage of any advanced offload can be tricky for example like 2 years so I in order for that to work the lead of the 2 right right right before the host and host kernel has to advertise no support for state here so up to the hypervisor has to advertise that the parietal interface to the Vienna the end driver for the perverse ways of ice has 40 or so and in his new offloads come line makes a VAX land or or what have you but you have to you know go through the whole process and at each level of that that's stack at support so that can get back get tricky and normalize delays spaces had these virtualized equivalents obviously in the terms for saying network packets to solve problems but you have things like crypto offloads compression offload PCI devices adjusting showing up in Intel chips that's now and there's no point in doing a virtualized version of that because it has no software will the host can't really do the compression of the encryption any faster than the gas can on the CPU but but then if you do have that hardware now there's no way for the medium to get access to the harbor to do the to do the offload and to software so another technique that you can use is called PCI pasta and with PCA pass through will really do with remaining in the the indirect access to the site sold its ah registers will be mapped into the hands of a virtual physical what would you call that the way to its it to its physical address space where we would call it and then the DNA will go through the island you so that the presented by scandium a memory with the Americans go directly to the and address space so the matter this is now for the typically for the FastPath of of sending a packet already India made there is no lock all the hypervisor it's all done in software in the driver on the in the in the guest our but there are several disadvantages of approach to the biggest 1 is that doesn't scale yet give the and complete control of the PCI devices sold if you want for VM's 98 fornix if you want a clear and you need 8 mixed in the in the in year chassis and this doesn't really solve typical typically we're immortalization you wanna share resources between order VM's while now you're not sharing and you haven't really got the full benefits of of fertility it's a very useful technique in certain cases but it's not necessarily generally applicable so with that I O VE what we're going to
do is is we're going to have a a single BCI device again and it will presented as normal itself to the host OS our at the PF the physical function and the host OS might arise will attack the driver that but then Western allows us to do is create a virtual this PCI functions and those virtual PCA PCA functions are completely independent said functions and they show up on the PCI Street just like any normal PCI devices but they are backed by the same piece a device may have access to that state devices of resources and we can share the capabilities of PCI devices are through the easier and so the advantage here is now as we want to add more reactions we just have to create more us up to the limit of the hardware and then we again we use PCA passed through to give the indirect access so again we don't have to do those upholding the hypervisor and the the other advantage that I I forgot to mention fucking with this because our was I was talking about
the peace that pass through PCI pass through there's also some security concerns all you have to give the VM complete control of the PCI devices have no way of restricting what it can do are so for example with until recent until next you can Robert for around the device firmware through the BCI device now there there are some restrictions in terms of it has to be signed and all that but there's a lot of other things that you could do but if you're interested the and say you know in a cloud type environments are the providers really don't wanna give all the customers so they don't really know what they are and can necessarily tell you trust them the 1 given direct access to the hardware and let spoof MAC addresses in capture packets that don't belong to them and that might have no secure restricted in that kind of stuff so but with the SRI at the because we have
this PF device here that's controlled by the hypervisor and that the PCA device to basically add security functionality and basically the PF the PF driver behind writer can configure and limits what the audience can do are so that's that's very important give you can give you the flexibility because in some concern environment the cases you want to be able to potentially capture all packets or or spoof MAC addresses it's they don't want use case but it's it's a legitimate 1 certain in certain cases so we want that possibility but it's it's not always a applicable so the
specification is it's it's it's a PCI extension and it's similar to the PCI specification that it doesn't say if you are in in order to send the packet the driver was right to register 5 with this value and place the data that you know it it is in this part of memory and all that is that it says if you want you know you here's how you not registered as I here's how you write devices is and individual implementations story give the interpretations of the registers sold it with PCs are the same thing and what this means is that you need you need to extend the the driver on the host are yours for us right is so my work has isn't necessarily generally applicable until the individual vendor or worth writing driver extends the driver to up to give us support for us right but but the good thing is that it gives the other car makers a lot of flexibility and they can add new features like the excellent offloads and very easily because they get to define how the device works out that
leads to a bit of a tricky problem we want going to figure arriving because we always different devices they have always different abilities and some of those devices will be be in the cards are I have seen at least on people's roadmaps 90 minutes is the market now compression offloads and flows ceteris restaurant capable and if you're saying you know if you if you're running Xia cloud services in the cloud 10 environment or even your own environment which you wanted to HBS having compression and encryption available to you in hardware could really help performance out so it's interesting to have and so were place we have a wide variety devices all of the different utilities like don't want the PCI infrastructure to limit or enumerate these are all the things that you can support because then in these new offloads come on line and you can take advantage of them without any infrastructure and even worse and if it was hardcoded infrastructure then that EDI problems you can men and stable so many features can show up until the next I'll released so I definitely don't want that but I really feel that we want 1 unified tool for all devices to be able to see the we don't want a situation where if you were to want figure Chelsea occurred in this area use this tool if if you wanna configuring tell tool we use this Intel provided tool so on and so forth in that also put a lot of burden on the driver maintainers to basically duplicate each other's work and in rewrite unary meant the real sort out the right we don't want that situation and I'm much happier having the driver maintainers right interesting features in the driver and extend them rather than running usually until from iterations but sold the solution I a measly came up with is to removing 2 key value pairs and
the individual PF drivers are going to advertise the capabilities of the device through what I call necessary configuration schema and scheme is going average is going to tell you what type of name value pairs we we accept other type of value so you can refer to the same user or MAC address Australia or what have you and then we can also make parameters required 1 of the required then the administrator has to specify the configuration file or it may come optional and potentially apply default values so to only to the administrators the advantage of this is that again at a single unified configuration file and configuration tool and to the device drivers they'd be in images they offload the work of your Parsons configuration and validating it onto the S infrastructure so called on 1 place on and the 1 of those in the parameters you know the plight of the 2 individual VS or on the whole PS so for example if you want to encapsulate of than virtual function that behind of you and you can put different meats behind you be land some makes some virtual functions 1 and some not 1 you and it's it's flexible in that way so that
may have been a little abstract so hopefully this example will clarify things a little of so this is real world output from the finite cell device that's the 4 from Intel Our and works stepping on the physical function of the number of just 1 create and the device sold with lesser IV it's interfering it on devices a one-shot deal you have to you have to create all your via all at once and that was basically having to make the harbor conditional here and in the convict Felcourt just specify what devices going to apply and then on the other we can set it would want to be a pass-through device for 1 to be used by the host by default it's not passivisable short of feature device on the host of and then there's some device-specific stuff like Mac and use a MAC address the instance through grammars around spoofing promiscuous mode what not so you can see that the true parameters they all of a default value because the Boolean so really has very yes your love Christmas or not there's no optional but the MAC address you can leave them unspecified and then winds up happening for this driver is the via for the and will choose 1 of so that was by mutual assured you it's also used in vigorous RDB we should really have to invoke it manually are others in our city script that will run it during for you but it's useful to know what it is and the 2 configurations low it's going to get it get a schema from the kernel which has been provided by the driver attached time to the to the current article validate the scheme for you pass invigoration of the kernel kernel has to revalidate because it captures usual and applied AI I validate initial and 1st because it's much easier provide good error messages rather than you know even valve from the kernel that's not a particularly nice interface but then the PCI subsystem will create the virtual functions and finally it will possibly figuration down the gene for virtual pets invigoration down to the pier driver and how to do this despite device-specific steps that to to bring up the VS because creating DS doesn't really do anything until the PF driver has actually allocated resources to them on the example for fully have to create a virtual switch interface or something like that so that's basically of a virtual port on the this this which embedded in the for without that for you the packet technical anywhere out of the out of RDF Talcott choose so on so forth that's all very device-specific and has begun for the PFM so the configuration file is you now UCL format that same from like did . com and earlier talk on other of other system utilities that will be potentially using it by into into 3 sections in the PS section are where the PS for the p of global options will go those of the the 1st 2 up in this in this case the 1st so whatever saying but than PF and this you and then the default section gives the largest set default values for all the yes and then finally you can have over per of yes configurations India sections which fall right default values that are either set in the schema or set the default settings I don't have specify section if it doesn't have any parameters and you have 1 of these files for each PF device so if you have multiple devices that just not comparable use actual list of our of your configuration files the names are arbitrary I think it would be a good practice if you named after the basis for and in this case and putting in a subdirectory anorexic keep it all in 1 place but that's the up to the Administrator the names the directories doesn't matter as long as available you have to know the petition available during early that's fine so I went over
this minute ago but so these are the these 3 parameters are specified from the infrastructure used by so they're going to apply any device again in the uh believe over and with the these ones will be willing device and then individual PF drivers will expose of more configuration values that they 1 and probably so here's a sample configuration file again in the PS section are we have to specify the device and number of yes you want are shown example why you might want to use a default section of because say if I'm if I'm using this for PCI pass through the that's that's so that the height that use case fan and I don't really want and you know allocating 20 yet so we want CVX you're passages F 1 troops on so so the fault section will allow you to save all my ideas are and pasta mode you want baldly behind the same the land of the next supported that you could specify the the land there but then if you have any of the a specific configuration that do 1 apply like see for example have a mac address or in my mines case by wants 1 via to be accessible from the host and not use for pass through and then I can override the default value in that that in that section and finally I don't have would be up to section if I don't have any specific configuration for the for the for the start of the field of ICT for rule all I know that even like that right now on the bottom here this true OK so my
message to the device driver is is that we need to trade the configuration schema as an API we cannot break people's configuration files in stable so that means you can add new required parameters you can change default because you can't change default behavior in stable and on head you can buy next try to be nice and avoided and not just break people's configurations arbitrarily i mean obviously it's it's always a judgment call as to whether these whether the another cost outweighs the benefit by a strike not screw are users over so if you have to do a new D and the required parameter is a little nicer because the the infrastructure will fail horribly during buddhist say you didn't specify this required parameter so I'm often want to do this but that's not the very obvious failure in the fairest paying any attention or knows it which is you value value that's a very nasty thing to do because it can be very subtle and these look like they're working except in this 1 case I'd also like it if we have Pf drivers defaulted to the most secure can fix so default to the untrusted VM settings where there's you know don't allow them don't allow the X to to spoofing news or capture practitioner have access to up on the that stuff coverage you know 90 % use cases if not more and as long as you provide a configuration that lets the other 10 % get what they need that specify the get a much better situation and somebody spinning up there at the club the club provider not quite understanding the the the parameters here and then string themselves without even knowing and giving unsecured configuration yes no no it's per per contract for all of your sanity check our there's a dry run I'm not sure if the growth of the checks are not on the the demo would find a 50 50 if it's they the losses they wrote this that wrote the tools upper but that would be a good thing to have so I mentioned earlier we need to know the infrastructures 1 part and then there's least there's only so much work individual PCI drivers but that's really not any different from you know PCI devices if you go often by semantic PCA card sliding it machine of the no driver there doesn't do anything it's exactly the same for arriving after the ITER I excel driver which is the Fort feel cards from Intel their 40 given as well as the new PCA 3 of 4 10 next but that has full series support I'm running in production right now so that's fully supported by the IT driver which is the older tend to give encouragement tell it has a technology preview support in the driver it is it is compiled in but what the technology preview basically means is that it has not had a few people queueing process so uh Javidan there are I have had a couple other a driver maintainers talk to me about this there seems to be a fair amount of interest I believe Chelsea will be doing work on it relatively as soon as something that was the date set but now ox also I promised them this infrastructure a year ago and didn't show up so it's not my fault not there that there's no support the but no of their uh when or if they'll be working on it yes OK other questions I need you get money this what kind of support is required on the host CPU bias Northbridge so I do not believe that we require any bias support as of now the uh with John always work recently to like the 1 thing is sometimes that the bias do is allocate memory space regions to devices be but with John Baldwin's work a couple years back you know 3 can reallocate those are so that that just that just works but not in the host CPU here if you wanted you could surpass the you need the same things out but other than that there's really there's really nothing needed like these these really look like real PCI devices are and so from the from the ships point of view as a obvious for as I know it is not it doesn't have a node that exists is really is really the end point that does it because the endpoint just response to add additional reading are ideas like PCA triplets all yeah that's that that's true too right out the question is the comment is that by by bystanders are often disabling virtualization support by default is right that could they do anything similar to the PCI card I don't believe so it it just shows up as a piece 80 billion I don't think there's a way for the bias ended instru over and that when the so if the body at its meeting these disabled you can't use it with the hat on use it natively alike as a new device but that those use cases a little more specialized in the happiest it's not I haven't I haven't tried that but that is Intel has been testing the RBF drivers because they didn't they had to be after drivers before we had a series of on the host so so Linux host freebies the gas to death were selected as a aligned with my work at all ones on the outside aisle where haven't tried is previous the host when gets no while what they had the native drivers life easier for me to work that way than have a you know hypervisor in between the 2 sides can so was going to go over
a couple potential use cases the 1st 1 9 2 Irish so this slide of adjusted refresh people's memories no we want we have a beehive high riser and the privacy will attach to the PF and will have be created pass through the and so the the configuration of the the 1 2 and really so this before but you sense that passage true either in the default section or on individual PS and then there's a there's a the this is the piece pass through flag to the to be higher depending on when using the runner running we have directly or using other rapid interchange but also on the demo in a minute but but it it just shows up as the PCI devices passed through the PC the triplet the bus loss function to be alive and it will map in and other potential use cases would be IBM's jails for those who are unfamiliar with the image it's ARV image jails jails is a of freebies the ContainerID style of Atlantis technology of previous years had this for years on end then the image allows us to create virtual network stack instances that are independent that have their own routing tables you can have you know conflicting eyepiece and completing subnets on the different the next and previous years handles that has a typically inversely the wage was the VHL is you create this the pair interface and the paraphrase for or has a foot in both the next and that lets you wrote the packets from the jails the net into the host the net and then have the net and can Bridget with real but with this series of of the density can be a little bit of a clear I mean you're not really going to see a huge performance boost annealing must because I it's all running on the host but it seems me just a little bit of a clear configurations that we have the support maybe you decide to use it may also make the fire all rules a little nicer right and I'm not a system administrator and I'm a kernel hackers so I can't really tell people what they do and want but this this this would work and I will show some endemic in and so to
figure it all you really have to PF if you wanted to send via specific parameters of course could but that would be sort of the minimal configuration and analyze the stability of the company of space whole thing but the key thing is that in your jail specification you have to know enabled the net on the jail and then set the interface to the vir virtual function interface but in this case you not using PCA pass through so it's a host shows up on a host of they use case which is getting a little closer to Hobbes and uses this is with net and although into 1 of these cases we have so X and Y and we make equipment for Internet service providers and so we need to test with the type of traffic that you see on the internet or specifically ministers writers network so we have our old traffic generation solution and we need to be able to to perform assessing with this so it needs to scale of our product obtained it a 2nd 40 but 2nd you have to want to give it a 2nd half and so the traditional way you do this and we are using that that we have our own of we have our own net tight kernel bypass EPI but you could with net map accomplish the same thing and nutritionally energy Norris steal it up you really need to force multithreading and typically the way you do that with next issues multicue about what that every year at the developers are added in 100 make that work better other problem with that said by Monday run into is that in some tourist writers network especially mobile type networks you have all these different encapsulation protocols and we'll see like India lesson the land and GRE so on and so forth all all this different stuff and not know modern x and no 1 it's further no actually is able to online through all these tunnels and actually find the 4 top of the of the packet and that means that the multitude is use this to you because it uses the for couple and half as it is used Q and when it can't find it was everything on Q 0 which is a very sad situation so I mean you can if you want try to do in software spread out but then you have to worry about you still have a you solve a single thread is doing that and it can be a bottleneck that you also have the problem of quote the synchronization between all the different threads on on your software keys so the solution that's online news is is this interface fertilization where with a survey will do is will create multiple the and give them unique MAC addresses and then we could attach different NetMap instances or what have you to the dual of individual and then this pair them up on the server-side and client-side of our test network so that the 1st instance sample sizes and which other and because we're using MAC addresses and and layer 2 what are layer to address the you know that's the very 1st thing the packet before any the encapsulations so now than making hardware can distribute our packets across art from generate travel generator instances and that has allowed us to to scale all scale this transliteration solution way up so that's that's another interesting tethered use case that you can you can use this for and lastly around this or very long general because i've been flying through my slides so
dynamic with the right can the slide of 20 of thanks to several people know a project of this magnitude doesn't just happen in a vacuum and so big thanks to my the people who took the time to review it especially Mark Johnson was examined the time and Sharma hood who spent hours reviewing my code and then even more hours locked in a meeting room with me as we walk through the code again line-by-line it was long it was painful and they don't get any of the you know any of the credit at the end but you have to be helped by getting this this to the quality level need to be John Baldwin than in Vogel also where joiner at Intel John help to review the PCI subsystem made several very important suggestions picked up on some very important bugs some nasty ones and is a big help Jack their review the patterns to the until drier and make sure to get those shepherd into the trade that several members of the previous documentation team were they help with my man pages going on within the shape of a learned a fair amount of time and often the process so so thank you and I know I'm getting several people here but many people are shifting here and there and point out stuff and finally I I do need to apologize to all the for subscribers of freezing that's when I created the 21 reviews and fabricators subscribed net to it I did not at that time understand the level of spam that fabricator producers I so I do apologize and the thank you for your patience and I have learned my lesson and now if I have something then that might care about I will forward in the initial e-mail to it and thereafter if you wanted if you care you could subscribe to it yourself and finally thanks Sandline who were very supportive from the start of this work in getting his work in the Free BSD I think it's a great thing % resemblance a great thing for for free you have worked on
and now at my and so it's a pretty simple set up I have 2 machines first one this is sort of my area virtualization host that will be doing the demo 1 is to be C 1 21 of and I will show all of you using that word using RBF natively using the from a jailed using it from IBM and then this TWC 612 as sort of an endpoint and thanks to Dillon who uh who was my remote hands at Savinar as Georgia's column yesterday who I got the set up for making her instances fortunately
I forgot before the conference room it's still open is
still here it accept character
with music and proper for the conference to actually figure
out how to get the the
the dual output from my freeze the boot to work
and then I Mike 0 I have no
idea how ACS works so
here I am in Windows
8 having no idea what I'm doing of sorry the question the destruction of again and so
on and you yes up
OK so the question is how or the need determined
and so the Navy is mostly
is you know it's sort of of a cooperation between the individual and of VF driver and the
the the device subsystem so it's the
devices subsystem is going to enumerate the PCI
devices and if the nicest possible vices it will attach the host driver to them
so the 1st hour the 1st host-attached
instance will show what is
driver name 0 will be dry run into 1 but now I think the
until the of driver has the
interesting property that it is we use the I IX
name so then if you
have will have the the host Iikes 0 heights 1 like that
native devices the PS and
I figure start showing up as a IIcx 2 might be the 1st of
the 1st VF devices
unit 0 you know 1 guy used by the BS driver of so
it's either the hands
unfortunately of of my infrastructure it's really up
to the up to the device of
system now was within the M it's gonna show up as you know I
sold the 0 and if you not to
of the 0 1 1 on each on each of the and what so so so so that I
have I have no idea because that's
all controlled by the BF
drivers which were mostly in place before my work was done while
i think the way that the forkful driver works for the
the VAT and what was that so that's a nice and
easy to remember conventions
so I think I would encourage driver writers to stick
with that but which can now how
is the text so and I have 2 and this is this
this is when the readable from the back yes no I don't think I can make it any bigger so I think professor professor that this is in a very high
resolution up with this 1 probably not readable
intersects images or have
1 tremolo work like as I could switch between such finitude if I really need to book at so here's
our are to be C 1 21 now if you want to see what are what PF drivers are currently attached in our real versary can Ellis selective such of the figures see that on this machine I have actually required to attain deport only Ixelles 0 is actually wired up but that's the harbor I have those little the physical those aphysical port px physical functions and these names here are exactly what you want to put in the configuration file her showed it to the 2nd but before that a you really chair you can look through the Committee PCI constant to look at the cost of the 2nd you can look at the capabilities shape capabilities of the system with PCA comp 5 is seated in the middle there a series is now the coding of 1 we'll this is this is just reading in the capabilities of the device so this doesn't tell you that I V is axis fought in the PF driver to tell the device supports are the only thing that typically people are running parables information is the 2nd line here which tells the number of supporting the efforts although even then you have to be a little bit careful are because for example with sides I believe it advertises like 64 VS are supported by you order if you wanted to actually that in the PF driver the PF driver would actually not be able to allocate any received use for itself that sold that the driver wife in that limit even further and depends on how many she was you assigned to the answers as to how many inches you can create this this gives you an idea of what the harbor limit actually is and the rest of this is pretty technical details of just for my own debugging purposes now we can look at my configuration file that we've seen this 20 times now but again uh the Ixelles 0 is just what you grab of last dead flesh out of the and then as I said I want to use the x 0 for the host and then use the F 1 in the jail those 1 they don't worry passive devices but everything else I want for the atoms so I have my default value for pasta as true and your resource-light doing exactly this but that's how to read this and now if I say yes or no not at all clear that the whole yet so I yeah I thought about that at the at the minnow I want all these options to be documented in the mandates for the device but I actually O. Arikan Jack that past so I'm not even the threat of being a good example here i it would be possible ultimately this data comes out the driver so it's a little it's not the greatest place for put the other hand if it's it's sorta like tools like this is full descriptions so that that that does make sense and what to think about what to do with the API to make that happen but I do think that makes sense ticket arriving father bug like the so time for the murder when they're clearly the 1 without that API and serve as OK that's the schema and some the convey for alright so now how to create a the yes so you have this source comparable I we control files to that specifies what the files you want and then either approve or because we don't want to wait 5 minutes for this still bias start checked so and now for go looking PCI can't will unfortunate Arab's would you can see that these bottom for entries or a new PCI devices of so the 1st 2 are
attached the host driver so the right so the 0 right so the 1 and in the next to us use the pasta driver for behind PPT
driver and you can see these are all actual PCI devices Lewis's 5 0 16 17 18 19 and they share the same PCI bus parents as their parent device of
and of course now I have my devices in i've constraints so of course this this laptop comes with the numeric keypad it's wonderful word doesn't come with is a letter for the NumberLock key apparently that was too expensive for a laptop this baby yeah I know I'm sure they see themselves all kind of money all kinds of money in america looking works done and just to show that this
is in some way star on 1 if I could type it would help and we can see
a packets are really are coming out of this thing something and
similarly if I have my children com file and habitats dealing with the net enabled and the universe was passed through it is site so the 1 you can tell I copied and pasted stressing my slides that's where make sure it works and the rest of this again I'm not administrator copied bases basis from the it from the internet I hope it is somewhat correct xxx correct enough from the jail seems to exist so I'm happy and then I could start my jail and then I can go into my jail ChIP-seq 1 now them in my jails we can see that I have just the past or not really pass through whether the active starting to jail has moved this site so the 1 of pour into the the net for the jail also active I go back this is back on the host again you can see that the the 1 Cortes disappeared could it's no longer in the full longer in the host of the net is now in the the jails the net snout shows up there so the ones women's expressed surprise that can also came from the sky and finally
I have very VM units can nutrients so 2 civilians if I go to the host again see if I can remember so it's about certain major source exam uh user IDs and how would you share the thank you you sure examples behind him run uh had remember freight find and then with the enron I confess my desk again and then it's desk P with this I can properly and then I just choose this this is the piece actually used to pass into the hands of what has to the first one for example I do 5 0 18 and then I just give it a name and now we can see that unit in the EDM as we just discussed the virtual instance so has been attached
is the title the 0 because this is a completely independent of
the and so it's always starts is 0 and once again I have no or connectivity through PCA pasta from IBM thank thank you that this is the the so for now certainly crossable a C associates Brown 90 ms and let's see if we can secure confined yet so about a factor of 3 and I bet that the jail it's about the same speed yet and the jail on native about the same speed which makes sense because it any of the additional wouldn't be surprised if that's just the interrupt latency and because in order to work with the PCA passed through the Iraq has to be at least with this version the hardware the actually received by the host OS and then injected into the gas so there is some additional latency there at a time beautifully alright so I had there no further questions that standard thank you
Loading...
Feedback

Timings

 1689 ms - page object

Version

AV-Portal 3.12.0 (3a2599d676b25753609baac9def5622401886a53)
hidden