We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Ethernet Switch Framework

00:00

Formal Metadata

Title
Ethernet Switch Framework
Subtitle
Fully utilize your WLAN router
Title of Series
Number of Parts
24
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Designing and developing the Ethernet Switch Framework for FreeBSD. FreeBSD is making great strides to be fully functional on many typical WLAN routers. Furthest along is support for devices based on the Atheros series of System-on-a-Chip products. Thanks to Adrian Chadds relentless work, many devices can be used with FreeBSD-current for routing between LAN and WLAN interface. The Ethernet Switch Framework closes one of the last remaining driver gaps to fully enable build an embedded FreeBSD version for such devices. Currently under development, the Ethernet Switch framework enables configuration of built-in ethernet switch controllers. This allows users to create powerful networking setups without any additional hardware. Even though these routers are typically not very expensive, the switch controllers offer a number of features typically only found in more expensive enterprise equipment. This allows users to create interesting and powerful network setups at home or in small offices. This talk will present the current state of development, the architecture of the driver framework and will detail the implementation of a typical switch driver. It will also go into some of the architectural challenges that needed to be solved to deal with hardware configurations typical for embedded systems that are uncommon in the world of regular desktop and server systems.
VideoconferencingSoftware frameworkPower (physics)Data managementRemote Access ServiceTime domainRadon transformOpen setComputer-generated imageryPhysical systemMechanism designWireless LANInterface (computing)ArchitectureFlash memoryComputer hardwareBefehlsprozessorDevice driverBusiness modelMultiplication signLimit (category theory)FirmwareHard disk driveOperating systemMixed realityTerm (mathematics)Different (Kate Ryan album)PCI ExpressSingle-precision floating-point formatCASE <Informatik>Configuration spaceFile systemPhysical systemBitMedical imagingOrder (biology)Type theoryWordCategory of beingInterface (computing)Bounded variationNormal (geometry)Virtual LANSoftware frameworkFreewareProper mapPartition (number theory)NumberComputer fileData miningRouter (computing)Mechanism designSoftwareGame controllerBootingStandard deviationWhiteboardConfiguration managementInformationRead-only memoryWireless LANCartesian coordinate systemUser interfaceData storage devicePoint (geometry)Serial portData conversionNP-hardVisual systemNatural numberCuboidDensity of statesView (database)WritingReading (process)Kernel (computing)MathematicsGroup actionComputer architectureIdeal (ethics)Event horizonBit rateDistribution (mathematics)XML
Component-based software engineeringPhysical systemInterior (topology)Interface (computing)Wide area networkDisintegrationConfiguration spaceValue-added networkDevice driverGeneric programmingPersonal identification numberData managementComputer wormKernel (computing)ArchitectureSoftware frameworkChi-squared distributionPhysicsNetwork topologyBusiness modelLine (geometry)GeometryMIDIProxy serverAbstractionStandard deviationCodeLimit (category theory)Computer hardwareMereologyPOKERouter (computing)Revision controlDigital filterSpanning treeInformation securityQueue (abstract data type)GradientLink (knot theory)Open setNetwork topologyDevice driverSource codeSign (mathematics)Product (business)Form (programming)Goodness of fitBusiness modelSpanning treePOKEProjective planeBitBefehlsprozessorRouter (computing)Standard deviationNumberInterface (computing)Different (Kate Ryan album)Data managementStructural loadInformationMultiplication signShared memoryTheoryConfiguration spaceAlgorithmAddress spaceMultiplicationPresentation of a groupCodeBus (computing)Error messageSet (mathematics)Reverse engineeringProxy serverDivisorPointer (computer programming)WikiEncryptionExistencePhysical systemTerm (mathematics)Kernel (computing)Connected spacePoint (geometry)Computer hardwareHierarchyEntire functionGeneric programmingLink (knot theory)Open sourceTable (information)Range (statistics)Intrusion detection systemKey (cryptography)Database transactionVirtual LANImplementationSystem callUtility software2 (number)NP-hardPlastikkarteLogicMessage passingCASE <Informatik>Right anglePersonal identification numberMechanism designPhysicsTraffic reportingTelecommunicationInternetworkingLimit (category theory)Bit rateBuildingDSL-ModemLine (geometry)Functional (mathematics)Data transmissionFirmwareInformation securitySoftware frameworkCore dumpSoftwareContext awarenessDefault (computer science)Priority queue1 (number)SummierbarkeitFirewall (computing)Web applicationMedical imagingOcean currentIPSecPhase transitionMereologyMathematicsBlock (periodic table)Wide area networkReading (process)Single-precision floating-point formatSocial classModemRevision controlCuboidDuplex (telecommunications)Order (biology)Real numberHookingComputerState of matterComputer clusterElectronic mailing listJames Waddell Alexander IIRandom matrixLatent heatPrototypeSpacetimeHypermediaParameter (computer programming)SubsetAdditionIndependence (probability theory)Hacker (term)Content (media)Wave packetNormal (geometry)Factory (trading post)Data storage deviceInstance (computer science)WhiteboardSoftware developerRule of inferencePerfect groupProper mapFlow separationComputer fileSerial portData structureRandomizationPhysicalismMaxima and minimaComputing platformGodFreewareGroup actionSquare numberWordBoss CorporationSystem on a chipMobile appSystem administratorApproximationWebsiteVotingIdeal (ethics)Natural languageWeightPerturbation theoryMetropolitan area networkWireless LANExecution unitArithmetic meanDisk read-and-write headPerspective (visual)RoboticsTouch typingSpeech synthesisDegree (graph theory)View (database)Covering spaceFigurate numberFlash memoryCartesian coordinate systemEvent horizonUser interfacePlanningHoaxSound effectVideo gameGreen's functionSoftware testingLevel (video gaming)Graph (mathematics)ProteinoutputMaizeConcentricOffice suiteUniverse (mathematics)TransmitterInsertion lossMoment (mathematics)Cellular automatonVideo game console
Transcript: English(auto-generated)
All right, welcome. Last December, I read a very exciting post by Adrian
that he had managed to actually boot free BSD on a couple of Atheros-based pieces of hardware, one of which I actually owned. And I thought, well, that's great. Finally, I can replace my OpenWrt software
built with the operating system I actually enjoy using, which is free BSD. So I immediately started looking at that and noticed that this particular piece of hardware has a switch chip which is not properly
initialized by the bootloader. So in order to really use it as a wireless router, you need to somehow initialize the switch. So I figured, well, I'm somewhat familiar with the OpenWrt code, so why don't I give it a shot and see if I can come up with some kind of driver that does the initialization and maybe
some VLAN type of configuration. That's what led to this talk. So why do I want to do this? Because there's a lot of little devices out there that are really cheap from about $30 that
are somewhat powerful. And being able to run free BSD on them enables a couple of things that you wouldn't necessarily get. Because they have ethernet and Wi-Fi, you can use them as routers. And as I'm going to talk about, there's a couple of things
that you can do with the hardware that the stock firmware usually doesn't support. Many of these devices also have a USB port, which means that the limitations of the hardware in terms of storage, for example, can be overcome by plugging in a USB stick or by plugging in a hard drive or attaching
all kinds of interesting hardware to that USB port. So as I said, the firmware these devices usually come with is limited. It's made as a consumer device. It's usually set up in a way that
kind of does the usual stuff you would expect from such a device. But there's very much limited use, very interesting stuff, like actual configuration management that goes beyond what you can set up over the web interface. You usually cannot log in via SSH.
And of course, if you have a cool idea to do something interesting with some kind of I-O device, USB device, like a camera or an X10 interface, or I have no idea, you can't do that because you cannot actually load your own software on that. So there is a number of Linux-based distributions,
specialized distributions, that already support all these additional applications. But simply, they're not FreeBSD. So I want BSD on these boxes. So I'll point out Adrian.
He has all kinds of hardware there. That's the stuff we're talking about. What Adrian has is some reference designs. So the usual consumer hardware you buy in a store looks slightly different. But the internals are almost identical because almost all manufacturers simply take the reference design and produce their own PCB, and that's about it.
So you can make it better, right? No, you can open yours a little better. Ask a serial port, mine is a bit of port. So as I said, Adrian did most of this work already.
And so we're just talking about adding the last little bits of pieces to actually make this into something that works out of the box. I'm going to talk about the ethernet switch that these devices usually have. Adrian is working on supporting all the different variations
of wireless controllers. I think he has most of the work done, but there might be one or two pieces missing. One big challenge is that most of these devices only have eight megs of flash or even four megs of flash. And I don't know whether you have run FreeBSD on a system that has that little storage.
Last time I tried that, that was a different century. So of course, you can't pare it down, but nobody has really looked into trimming FreeBSD, the kernel, and the user land into such a small size in a long time. So that's something that needs to be addressed in one way or another.
How do you present it? Well, is that still FreeBSD? So well, I mean, there is ways. It's not impossible. It's just work needs to be done. If you run a meg world, it's nowhere near eight megs.
So of course, with the flash-based storage, you probably want to have slightly different configuration mechanisms. You want to run with a read-only file system mostly. You want a way to write that configuration information in a sensible way. That doesn't really work necessarily with the standard RC system.
So that's something we probably want to address in one way or another as well. And of course, one thing that we are apparently going to get pretty soon is an actual flash file system. Right now, we run with just a UFS image that is mounted read-only. And we just rewrite that entire partition
whenever we want to change some configuration data. With a proper flash file system, then we can just run basically normal user land with read-write access to that file system. There we go. So I'm going to talk about four main topics.
First of all, what kind of hardware is this actually? What does it consist of? What can you expect it, what properties can you expect it to have? I'm going to talk about some aspects of this Ethernet switch framework architecture, which is a pretty big word for actually what is supposed
to be a very small driver. But we ran into a couple of issues, which I find quite interesting and I certainly didn't expect to run into. I'm going to talk about the configuration interface mostly in terms of what API can you use to actually control the switch and what kind of model is behind that.
And then, of course, I'm going to have a quick outlook of what we want to do in the next couple months. So what's in the box? Adrian can maybe pass around one of these things if people want to look at it. This is what a TP-Link 3420 looks
like that's based on a single chip design that's actually hidden under this heat sink system on a chip with CPU and IO and all the basic stuff. We have a ROM, a flash ROM, that's
connected to the CPU via SPI. We have one RAM chip, two mix of RAM, and we got the wireless radio chip, which is connected in this case via PCIe, a single lane PCIe interface. Most of what you actually see on the board is more or less passive stuff and like a DC-DC converter,
and that's pretty much it. So all the interesting bits are in very small chips. So conceptually, this looks like this. We have the CPU in the center. We have RAM and some flash for storage. So one or two USB ports. We have some GPIO pins, which are usually just used
for some LEDs to indicate some state of the device, and usually two buttons, one to reset the thing to factory default configuration, and a second one to start WPS. Yes, we have one or two ethernet interfaces
depending on the system on the chip that's in use. To that, usually we have connected a switch chip and, of course, a wireless interface. So in case of this router, the picture that I just showed,
that is quite highly integrated. The white box shows what's in the system on a chip. So the ethernet chip is actually part of the system on a chip. External are only the RAM, the ROM, and the wireless radio chip.
So that's the feature set, pretty common. In this case, we have two ethernet ports. One is actually directly connected to a plug on the device. The ethernet is connected to the built-in switch controller. And the switch controller can do
a number of interesting things, which I'll get into in a minute. There's another device, just to give you some idea of how it could look like as well. There's a different system on a chip that has different functions integrated. And here we have an external switch chip.
And there's actually quite a multitude of these kinds of configurations. There's a lot of switch chips out there. And the frustrating part is that the switch chips are based on kind of the same IP blocks. But of course, they differ in little details. And usually, it's annoying little details
that they differ in. So trying to build a driver that supports an entire class of switch chip might not be that easy. Again, some basic features of this device. Interesting about this one is that it only
has a single ethernet interface. And all the five ports on the back of it are actually connected to the switch. And this is the one that got me started because the port that you plug your cable modem or your DSL modem into is on the switch. And if the switch is not initialized properly,
that port is connected to the four other LAN ports. So if you switch the thing just on and it doesn't get initialized by the OS, your local computers are just connected to the cable modem, which usually leads to things that you don't want to happen.
OK, so some architecture. We have hardware-specific drivers for each switch chip. And we're currently hashing out how much commonality there is between the individual ones and whether we have a class that
can handle an entire family or whether we actually have to have different drivers. And Adrian, over the past couple of hours, basically, has done some work on that and is probably doing that right now.
We want to have a generic in-kernel API to do switch configuration and get some information from the switch into the kernel, into other subsystems. Of course, we want to expose an IO control interface to LAN so you can run a command line utility to do whatever
you need to do with the switch. One major thing that we want, all these switches have a standard PHYs. And we want to reuse the existing PHY code and the MII bus code for the PHYs as much as possible in order to take advantage of all the drivers that are already there.
For most of the integrated switches, it wouldn't absolutely be necessary to do that because they just work fine with the standard UK PHY code. But Adrian's employer has a couple of more interesting PHYs that might turn up in these kinds of products.
So it's actually going to be beneficial to be able to use all the PHY drivers that are in the system. Here's one switch chip. And that is pretty standard layout. In the center, you have the switch controller
with a couple of MACs. The switch controller takes care of forwarding stuff in the ports. You have PHY physical ports with a PHY each. And you have one CPU port where you have a back-to-back MAC connection. The switch controller is hooked up to the CPU via I2C
interface. Almost I2C, but not quite. And on the CPU side, that's actually on just two GPIO pins because the CPU doesn't have an I2C interface. Well, so correct what it does, but TP-Link
will not go to the wrong GPIO pins. OK, so the system on a chip could do it in hardware, but TP-Link, for whatever reason, decided not to do it that way. So when I started out in December,
I thought, OK, how can I do this? And so I started reading code. And I was quite amazed that almost all the bits are there. The only thing that I really needed to even touch is the actual switch driver. So the entire attachment, the entire hardware, is already in the tree.
There were just a couple of places that needed slight adjustment. The major thing was an IRC bus. Because the switch chip doesn't implement proper I2C, but some bastardized form of it, I needed to relax the enforcements that
are in the bus code to make sure that the transaction actually works. So that was actually quite nice because doing all this bit banging by hand and doing all the logic behind it and just writing that code is highly annoying and error prone. So being able to just plug this together via hints
is really cool. Leading up to another switch chip, I want to quickly explain how phis normally work. Because we're going to get into a situation or into a discussion where we found that our model in the tree
does not work. We want it for the switch chip. So the idea is that the ethernet card has the transmit logic, transmit and receive logic, in a media independent interface form. So you actually don't have to deal
with the specifics of how the bits are actually encoded on the actual transmission medium. And they thought, well, we come up with a system where it can actually have multiple ones, single interface, because you might want to be able to switch between them. Or it's cheaper to manufacture a single card
than can hook up to multiple things. So I have the actual data transfer lines. Those are here in the back. And that form one bus that goes to all the phis. And you have a second mini bus of two lines, MDIO and MDIC, which is IO and clock.
And those go to the phis as well. And that is the way the CPU through the ethernet controller can tell the phis what to do. What's important to remember is that only one of the phis can actually be actively involved in any data transmission. So only one of the three can be active at any time.
And that is reflected in our driver model. There's a couple of switches that use our driver model. If you have two ethernet interfaces, each ethernet interface gets its MII bus instance.
And attached to that is some phi or even multiple phis, depending on the actual controller and what phis are connected to it. And that even auto probes, there's up to 32 possible phis that can be hooked up to the system or to each interface.
So the MII bus interface is, well, first I should say, MII bus actually is not FreeBSD specific. It's shared by all BSDs, which immediately leads to a somewhat challenge to actually integrate that
into MII bus. And that means that the normal configuration mechanisms that other BSDs don't have are kind of in conflict with MII bus, because we're trying to stick it
into MII bus in ways that are compatible with MII bus but not necessarily so. One way in which you can actually see that is that we have the new bus attachment with our MII bus interface methods for accessing the MDIO control
registers, sending and writing, reading and writing registers off the phis. And at the same time, some messages by which the MII bus, when it detects some phi change informs the ethernet driver that it
needs to adjust its own MAC, for example, to adjust to a different link speed. At the same time, the IFmedia infrastructure has a number of callbacks that do similar things. And so the MII bus not only uses some new bus methods
to communicate with the interface, it also directly calls into IFmedia and gets called by IFmedia directly without the interface actually taking part in that.
So in order to be able to use the existing phi code for our switch chips, we needed some way, or I needed at that point, some way
to fake up an interface. Because MII bus expects to be talking to a standard IFnet and it expects it to be there. So I figured I'd just try, and surprisingly that actually works. So it can initialize an IFnet and you just
don't hook it into the rest of the system. It sits there. Nobody knows about it except our own private code. And MII bus is fine with that, though we can actually use the phi driver, have callbacks into the switch code, and do all the things that you expect to do with the phi,
like change the speed and duplex settings and shut down the port or stuff like that. So that was quite nice. One question for you guys might be, is that actually OK? Can I initialize an IFnet and expect it to work?
Or is there somewhere where that might be kept track of and which might get into the way of things because we suddenly have an IFnet that's not linked anywhere? Switch controllers that are connected to MDIO.
Earlier switch chips presented itself like they would be phis. You have a question? OK, so you were raising your finger. So from a software perspective, they
look like a standard phi. And we actually have a driver for that for some real tech chip in there. I don't remember quite what the model number, but there's some stubs to actually do some initialization to the switch and have it work as a phi.
That's fine, but we don't have the infrastructure to really talk to the switch part and configure it. It's somehow hard coded in that driver. There's other models of switches, some of them in the Atheros line, that are using the MDIO bus
but don't look like phis at all. They just reuse this register access space, and they also do not present a single phi. They use the entire address space of the 10 bits of address that are there. And of course, that plays havoc with MII bus detection
because there's worst case scenarios. It detects that there should be a phi, but in fact, it isn't one, and it's not working like one at all. So that's one problem. So how does that actually look?
That's a model of the switch that is in one of the embedded chips, and we have on the right-hand side our phis that are connected to the physical parts. And we have built into the system on a chip
Ethernet ports, and one thing that is interesting and gave us great reason for debate is this little thing here. So it has some control registers for an MDIO interface, but they're actually
not hooked up to anything. So no problem. Well, you just use that from G1, right? But this one here is not talking to the phis. It's talking to the switch controller. So G0 is directly connected to a single phi. So that's the WAN port on that router.
So the data flows directly between the phi and the gigabit controller. But if this phi sends us a link event, that interface needs to know about that. So it can adjust its settings. How can those two actually communicate?
Well, the only way they can do that is through the switch controller onto a second MDIO controller that is implemented in the switch, in the switch register space, and then get onto this MDIO bus to talk to this phi. That doesn't really work well with our existing code.
So the device attachment tree is something like this, where the ARG0 somehow needs access to an MII bus that's actually attached to the switch, because it hangs off
the MDIO control registers that are in the switch. So we need something in here that somehow enables this kind of communication. And to my great surprise, this has never come up before.
This apparently was the first time that there's any piece of hardware that cannot be modeled by a tree. There's actually a need for having some additional communication between nodes in the device tree. It's been known for a long time. OK. I asked on ARC, and nobody could tell me
of an example where that problem came up and how it was solved then. The problem has come up multiple times, and how it was solved, I think, during this time. OK. Because it's invented in the last decade. Sure. Of course.
So I tried a couple of things. I wrote like five different prototype implementations of how to deal with this. And we eventually ended up with two possible ways of doing this. So Alexander decided he's going to write a special PHY driver
that takes care of this. So this PHY driver has some internal knowledge of what switch it actually wants to talk to to issue register reads on the real PHY. But it's going to present itself as just a standard PHY on an MII bus.
And that MII bus is just normally attached to G0. So in terms of the tree, you see that's very nice and very clean. It has a couple of drawbacks. First, it needs to have access to that other node. It needs to find that. That needs to be solved.
So there's different ways on how to do this. He found a very simple way that worked in his case. So one problem with this is that it replaces the existing PHY drivers because to MII bus, it's just a PHY driver.
So whatever features the actual PHY requires need to be re-implemented in this driver. Plus, I'm not entirely certain that all the features that a PHY driver can present are actually going to work in this way because the MII bus generic code also accesses the MDIO registers directly, bypassing the PHY.
So I'm not sure how to deal with that. But he did get it to the point where he can actually get link status of the ports. And that in itself is very useful to have. The second option that I decided to implement
is a bit more complicated, so I split this up a bit. So we have an attachment to GE0. It has an MII proxy connected to it, which is the new piece. And to the proxy connected is the MII bus and some PHY
driver according to the normal mechanism that MII bus uses. So all the interesting bits happen here. So how does that get to actually talk to the right MDIO lines? Well, we have a new driver that only implements the MDIO
register access. It's the exact same interface as MII bus rec read rec write, but it splits that out from MII bus. So it exports the generic MDIO bus. And connected to that is the actual switch driver. I added that in here because that
is the address of the MDIO controller on G1, not G0. And because that is kind of generic, this driver does this very same thing.
It has its own MDIO access to the switch hardware that controls the MDIO bus in the switch and then exports another MDIO bus to any consumer that might be interested in that. And we have a second piece, which is an MDIO proxy.
Of course, these two are connected. That's the whole trick. So the G0 interface gets a hint, has additional code. If that hint is present, it instantiates the MII proxy and tells it to which MDIO proxy it wants to connect.
That turned out to be actually a lot more complicated than I first thought, basically for two reasons. MII bus has an API that guarantees that you cannot obtain device references outside of the context in which
it can guarantee that they won't go away. So you cannot simply ask new bus, oh, give me the pointer to the device with, I don't know, ARG1. That call doesn't exist. It exists internally, but not externally. And at first, I thought, oh, that's annoying.
I'll just add that. And then I thought about it and realized, no, no, that's probably on purpose. Once I have that pointer in hand, I will never get notified when that device goes away. So I would have a dangling pointer if that device ever gets unloaded. So I need something that actually
takes care of dynamic loading and unloading and attaching and detaching of drivers. So that's why there are these two halves that have an internal connection. And what the hint does is actually tell this device, look out and see when that attaches,
create that connection. And when one of them detaches, that connection is broken. And then calls to this proxy simply return an error because the connection has been broken. So there will be no dangling pointers. Of course, in the embedded world, those drivers will never be unloaded, or I cannot really foresee that happening.
But why not do it right if you can? OK. Great. OK. OK. So the main feature of this is it's up the MDIO access,
register access to the PHYs from the notification from the communication between the PHYs and the MAC of the Ethernet interface. It does that by having two attachment points,
one for the interface, one for the MDIO register driver, and it has one main feature. It's completely transparent to MDIO bus. So MDIO doesn't, sorry, MII bus. MII does actually not see that there's anything different.
OK. What was I going to say with this? So at that point, the UK PHY magically gets infected.
Yeah. The missing mark just solved.
One point I forgot, actually, to mention, and that is probe order. So MII bus actually has an order parameter in its API, but nobody uses it. And one we ran into is that we actually need to make sure that this probes first,
because once ARG0 gets into its attached routine, it just expects to be able to fully initialize. And I tried saying the actual attachment of the interface, but that didn't really work. So one hack is actually to add a parameter to the MIPS
Nexus, have an additional hint that makes sure that this gets probed first and attached first because of that. So any better suggestions? I'm very happy to hear them. But somebody told me, oh, that's a crude hack. You cannot do that.
My justification is it's already in the API. It's just not exposed in any way. So it's a one-line change or a two-line change to the Nexus attach that detaches children. And whoever told you that it's not an approved API, I'll use it elsewhere. OK.
We'll do the train now and we'll give you a hand. OK.
OK. Now remember, OK, there's another point of contention between Alexander and I. Basically, how should the hardware-specific switch driver talk to the hardware? And how should various features be
exposed in terms of the API? So Alexander's idea was, oh, I'm going to write a special generic switch driver that presents a generic register interface for any and all switch chips out there because he
has some hardware that can actually attach through different buses. I don't remember. I think it's a Broadcom device. Yeah, that can either have memory-mapped IO register access or have some I2C or MDIO register access. So he wanted to make sure that there's actually
a way to have a single switch driver that is abstracted from how that register access actually happens. So he decided he will have this generic interface that
presents this generic register access, a couple of shim devices or, yeah, driver shims that attach to the actual bus device that then translate the generic API into the specific bus calls. The switch driver then attaches
to this generic driver. When I finally understood what he was trying to do, I figured, well, I thought I had read something about bus space. Isn't that exactly what he's trying to do?
And so one thing that, well, I'm getting ahead of myself, sorry. So that's how that attachment would look like. And the interesting bit basically is here, the IO control CDEV that
exports the generic functions. And as you can see, basically, you can only attach one hardware-specific driver at this point. You would need multiple switch drivers to have multiple drivers attached here. And also, it's not obvious to me how you would actually connect up internal driver that
wants to talk to the switch. Because this interface basically is, or the hardware-specific driver, do you attach that here as well? And what kind of interface would it use? Would it be the new bus methods for the switch driver or some other set of drivers?
And I haven't heard an answer to that question yet. Like he, I don't know. He needs to tell that himself. The model that I came up with, I think, is straightforward how the APIs are supposed to be used.
So we have a hardware-specific driver that attaches to whatever bus the hardware is actually attached to, like I2C or MDIO or memory-mapped IO or whatever. Each of the switch drivers exports generic API that translate from generic control model
into the hardware-specific settings. And the way it does that, it exposes a set of new bus methods to do the actual configuration. And then we have a generic driver that basically just translates IO controls
into these new bus methods. And of course, this interface is available to other drivers as well. So one thing that could be implemented in the future might be some spanning tree implementation for one of these switches. And that could be attached at this point and could use the very same configuration interface to configure the switch in the appropriate way.
OK, configuration interface. These switches are all over the place. Some are very simple, can do very few things.
Others have very advanced features. So it is unclear how we can actually put that into a single model that can be presented as a single utility to do configuration on. Because even the way VLANs are configured can vary widely between these things.
Like, can you mix port-based and tagged VLANs on the same device? And if you can mix them, how does that actually work? Like, what is the precedence on a single port? Or what gets discarded? What gets added automatically or de-tagged?
So there's a couple of things that almost all of them can do that we looked at. And that is basically all of them have some form of PHY or another. So we can do link management, including shutting down ports. There's support for tagged VLANs, usually 16 entries.
Some have limitations on the range of VLAN IDs that can actually be used. But many can be freely configured from over the full 12-bit range.
And there is more or less some way to manage the MAC table. So there could be a way to disable learning and hard code which ports forward for which MACs. So we really want to have, and we're
very close to actually having that in the tree, at least for the first two parts, is initialization. We want to bring up the switch in a sensible default configuration depending on the device. We want to register access to any client so that we can do things in Userland
until we have fleshed out the actual API that we want. We will probably need some capability API so the utility in Userland can figure out what kind of switch it is it's trying to talk to and what it can or cannot do. So as an administrator, you can actually figure out whether something is supposed to work or not.
And I think the next step in terms of actual configuration of this would be port-based and tagged VLAN configuration. We've decided that we will switch modes. It's either going to be port-based VLANs or tagged
VLANs because, in a sense, tagged VLANs are a superset of port-based VLANs, at least for most configurations. As I said, these switch devices can mix the two modes together, but neither Alexander nor I really understand how that works.
It's very confusing in the data sheets, if it's described at all. There's differences in how these switch chips decide what to tag and untag on which parts, like on egress and ingress. So we decided we just have a single switch port.
It's either going to be all tagged or none of it is going to be tagged. That's a certain limitation, but I think for the kind of switches we're talking about with only four ports, that is a semi-sensible restriction. There's a default VLAN ID, which for the untagged ports
decides what VLAN ID to assign frames on ingress. And, of course, each of the VLAN configuration entries has a VLAN ID and a list of the member ports.
Okay, so what's still to be done? Basically, Adrian has made the call and has decided this is what's going into the tree and he's committed the last bits yesterday.
There's lots of code, especially that Alexander has, that we want to bring into what we have now decided is going to be the base version. So that's going to require a lot of work. I think he has around, I don't know, 15 or 20 different switch chips, something in that range.
Yeah, but it's a lot. It's a lot across different vendors, so we'll see how that goes. And then pick up whatever is left in common hardware.
To give you an idea of the space we are talking about, Atheros-based designs alone, there's, I don't know, 80 or 100 different models of routers by like 20 vendors. So dealing with all that is hard enough, but then there's, of course, Broadcom-based designs
and Realtek and I don't know what. So it's hundreds, hundreds of different models. Sorry, what? More than I can remember. So we'll look into good things to support. Like, of course, they should be not too expensive,
they should be powerful enough. Yeah. So what can be done in the future with this? So, of course, all these switches, since they do support tagging,
almost all of them support some form of priority queues. All of them support more or less deciding what gets forwarded where. Some of them actually have quite advanced things in there up to actual packet filtering.
And I think the newest Atheros switch chip actually does IPv6. I was really surprised. The old one also can do some form of NAT, IPv4.
So it might be interesting to see how to integrate that and what to do with that, because, of course, the CPU in these things is somewhat limited, so you can maybe do 100 megabit wire speed, but above that, it gets hairy. And you might want to use the CPU for interesting bits
like actually talking to the wireless chip and shuffling data back and forth between that and the Ethernet interface. So if the hardware can actually do the NAT for you, that's one less thing you need to deal with in software. One thing that I think might make a really great
Google Sum of Code project is trying to figure out if the existing Spanning Tree implementation that we have in the tree can be hooked up to one of the switch controllers, because almost all of them support hunting the management frames to the CPU port. So they don't implement Spanning Tree themselves, but you could implement it on the CPU
and hold the switch to enable forwarding on the ports or disable it. And, of course, you can also do things like port security or stuff like that if you're interested. And I'm sure there's plenty more things that can be done in terms of interesting networking stuff, but that's going to be available then.
One thing that, I'll leave that off, that's fine. Okay, so here's the people involved and a couple links.
The Wiki page I have set up currently is more of a brain dump than a tutorial, so I need to work on that and bring it in line with what we now have in the tree. Z-Router is Alexander's project, which is FreeBSD-based,
but it's his attempt together with a number of other people to build ready-to-use firmware images for many of these devices, including their own web-based configuration in standard ways. So basically, it's to replicate a standard firmware
for standard use cases like OpenWrt or DDWrt or something like that. So that might be very interesting as well to look at. All right, questions?
I haven't done IPsec. I'm more, I'm using OpenVPN and current hardware, like the 400 megahertz CPUs that are typically in there
can do around, I don't know, six megabits second encryption with, in software, yeah, with the right encryption algorithm. So that's completely CPU bound, but it's a very interesting question.
Adrian, are there any CPUs that have hardware acceleration? The trouble with this new stuff and so to see, imagine building a chip like this, enable the so-to-see to say the feature F, the chip is embedded to the chip.
I don't know, I don't know if it's a chip, like a chip, I can't remember. The short answer is, if there's a privilege and someone's really a driver, sure. These little bits, I don't think the ones I have here have the hardware encryption and hardware acceleration.
And if they do, it's hopefully, but yes, I'm sure the opposite stuff is having stuff like hardware encryption. Yes?
So the problem of marrying a Pi with a Mac, so far from the slides, I get the sense that people still think in terms of hierarchies to try to solve this. I thought of that, all right. Sure, what could potentially be the advantage
just in your sense of the discipline, is if you could create a full bison one and the full maxima other one, and the other one, the present device tree, to find out how they are going to solve it. That would be outside of Nubus then.
The big problem, we all did it really nicely, but the problem is the transport,
and that's why when, and there are example proxies, much more complicated than most proxies, if you, in a better world, you can search a market for them. Put that up, it doesn't work, because you now have the worst of transport from the tree.
There might be other ways to accomplish that support, whether that or optimization, you're gonna have a significant support factor behind it, and you're not gonna be able to do that now. So what- Right, right. It wouldn't necessarily be adding to it,
unlike more of that, it's lacking more than what we- I think that the tree should replace the mechanism, or is it new? It's not new. But the other problem here is, when you pull them out, they'll grow.
It's actually five at that point. You just have to cut it. There's no mechanism right now for you to go. Notify right now that, at some point in the future, five may come along, at which point I will, late with all of this stuff, and make a five here. A five here is a- What rate is the code for this?
What by, is the internet driver to expect one to appear? I mean, that's exactly what he's saying. It's in the background, and then, by the fault reports, as Kevin has put it, fault reports take our faults back, and we'll all feel it. If there's nothing attached to me, I'm this big, two legs,
I'm this four feathers, and if it gets cold, it goes out and it does a device lookup to say, is here, there, yet? Is my key at there yet? And you have to go through this thing, and it's all outside of new bus, because new bus, and the MII bus attachment handling, in particular, expects everything to be there at startup time. So, if we wanted to divorce the more,
then all of this crap suddenly goes away. If we could just say, there'll be a five there at some point. Give us the call back, call back, a five day would be a deal. The first bus pass will be able to move as well. So, the operation,
don't fall into the integration, you'll hear my I-bus, and new bus, kind of in the track, and somebody else can do something better. You know, in a four leg time, right, they grew up in that time, and it's been so long ago. And one of the things that I've learned,
is the ability to help, you hear my I-bus, hey, I want a greater MII bus. I don't want you to fail, if you don't have the children right now. Because, I had a system to go, where I wanted to, I wanted to succeed in growth,
but then I wanted to look at an FPGA, somewhere else in the three. Would, you know, clock, turn it on, you know, five, and then broke it. That's how I know that's the problem. So, for example, one chip that we,
I don't know if that's important, maybe a part of that. You have to switch, turn it on to the switch drive, before it comes on. Right? And so, you squeeze, finding the switch drive, and the chip, on kernel land, before the, you bring it up to root, or you use the, you use the user,
and like, at one point during the device, the device profile, you actually, bring up the switch chip, squeeze it from root, make sure that happens before you even have the internet controller attached to it. And if you have to go through the bus of the ethernet card and you can bring up the first before you talk to it, what do you do? Which is one of the reasons you split
NDIO, the NDIO bus from the ethernet platform. So if I wanted to support some of our legacy, real legacy boards with these specific firmware files and you have to bring up the NDIO01 first, squeeze a bunch of firmware over it and magically reach for a VMI bus and ship it to 5 AM. Right, right.
I think I'm going to help you with reach for a VMI bus. I haven't been happy with it. Well, that already exists.
I think you like work that fits into it. I broke, I found nothing, I'm going to use it for myself.
That's the part that's broken and wrong. That's what I volunteered for. I volunteered to help you. So I can shoot you a board, right?
Yes. Adrian asked him to just swap boards so you can get one of theirs. No. I'm going to need some other paperwork. He doesn't need an export.
As long as we do any. He's going to ship the paperwork, he's giving the board now. He's going to have a few of them, but I trust him. If I'm pointed out, then it's 30 bar. I'm going to keep an eye on him. Oh, no, no, no, no. All I'm saying is, he's doing 74K.
He's doing 74K board. No, not 34. Not yet. By the way, the offer is extended to any net end of the year to be available. I'm going to email him. One quick project. Perfect. What do you do?
I have a question. In our area, we're going to ask you, why can't you just go to MI? At this point, you're aware that all the hardware is in place.
If you attach an MI bus without any problems in ethernet, actually, I can't do anything with it anyway. True. I tried deferring, completing the ethernet driver attach
until such time as I have the MDIO register access to the right MDIO bus. I've been into lots of problems. I just don't understand the code well enough. All the ethernet drivers expect to complete their attached routine and have the if-net completely initialized and hooked into everything,
including the if-media structures. At the attach point, you have to decide, is this going to be an ethernet interface with if-media or not? You have to defer attaching the ethernet interface until a later time. And I couldn't get that to work.
with RGE. The other reason we catch it quick, it's very desirable to know if you switch kernel drivers to be able to KLD load and unload. You might not have thought this was a good product, but if you require all the hardware to be there, it will never all be there because you're loading and unloading and
the code that you pass to the other device, maybe you're doing something like that. It's thrilling. But that works. I work like that.
Yes? Sorry, the data sheets for the hardware?
Yeah, there's two sources of information. I actually started out by reverse engineering the Linux code, and then Adrian said, why don't you look at the data sheet after you've signed this NDA? And interestingly enough, I hope I am not violating my NDA by saying sometimes it's actually better
to look at the Linux code because you know it's working to some degree, and the data sheet can be very confusing at times. The NDA is more than NDA. Yes, yeah. Paul Klima and Ferris have an NDA, you sign it and you do that stuff.
It's more than NDA. We call you reading NDA. I guess that's a good question. Yes, the algorithm is nice, because you won't disclose the data sheet. Yes, that's all I'm saying. Yes. Don't say that you won't disclose the data on the data sheet.
The NDA has this other nice thing where if you've got access to driver source, they say you can't share the driver source to open source any code. The theory is that if you've signed an open source NDA and you've given you all this data and all these things and driver code is free for NDA, then you have to put
an open source implementation for anything you do. You can't sign the source NDA and then provide total source for the product. So if you've got a bunch of, especially in the Linux world, a bunch of copies, you sign the open source NDA and keep the license to the driver, and they contribute to the NDA collection.
So the same deal is available. Can you get a switch? The problem is that 90% of the work that is already done in the driver that you've captured, right? And that driver GPL, if you know where you're looking on, say PDWRT, you can find out really
how many other people have been vended for the switch in the PDWRT stuff, right? So what we did is we used the data sheet to figure out a big list of magic records with random registers in the file. What do they mean, right? Let's write a switch, right?
That tends to be the way, that's how the Linux guys rewrote their files. They took the existing stuff that worked, they mapped all the magic numbers, they figured out that's the way to do it. The 90% of the work, the 10% of the work was all the weird things popped up weird ways that vendors decided to push their switch files up.
So for these things, in the SAC, there's one way it's hooked up. You bloody well know what's that way, right? But for other things, where there's multiple disturbing files attached to files, God knows what, you actually need to look at vendors privately to see,
all right, so this switch file is hooked up on GTO, this is how they represent the SDI file. How about in the vendor, you know there's a switch file, you know there's... So that's the other good thing about it, you don't get the data sheet. Especially if there's more than one external switch file,
or what vendors do. Is there a platform, a common API, or a common way to support switching?
I don't have one right now, so I looked at the Linux API, and there's a couple, you could do it like that, but then again, open WRT doesn't really support anything but VLAN configuration.
And there's a couple things that we're interested in that go beyond that, so I think we actually might come up with our own API there. I explicitly committed no APIs for free.
There's a VLAN group API that's not used for one of the switches, Ray has this... API, I don't want to forget it, the bare minimum to get the MPA robust to the MII... the API discussion is laid out. We have a file for the patch, we initialize the switch,
we can write all the files for that, we can have this long-form limited byte sheet of what we want to support, but we'll have a discussion about that over the next... Green, I want it green. I want it green. From an end-user perspective,
is configuring the individual switchboards always in the process? Yes and no. So I wrote a command line utility, uses the same keywords to do the individual port configuration,
so I think that's a natural way of expressing that on the command line, but as Adrian said, we haven't really committed anything to the tree yet, so we'll see how that works out, but one specific goal is we are using the existing infrastructure for IFmedia,
I expect the configuration to be similar to what IFconfig does. Given the lack of switchboards, do you have something a little...
No. No, IFmedia actually is very expressive, it tells you exactly what kind of standard you want to apply to that port. Just a quick question in terms of development tools...
I'm actually just using a serial console, and that's it. If you buy something off the shelf, you need to open up the case and solder in a cable, but that's about it, so I'm just doing this with DDB and lots of print-ups.
Most of them have it on there.
There are people in the OpenShift community who have taken, say, a D-Link off the shelf, and I'll just use a lot of the shelf JTAG thing to speak to the device, so I haven't done JTAG to the off-the-shelf yet. It's like the other events. There's three or four people in the previous people who are using OpenShift rules.
I'm using this. I'm using that. So Adrian, you're using something commercial? Yeah, we've got the big WAP commercial JTAG development tools. The other WAT guys, they have said that they do JTAG debugging.
There's certainly a lot of useful stuff to get on the off-the-shelf hardware. Okay.
It's an interesting question. The standard setup of these switches is that it's actually a switch, like as if you had plugged in through cable a standard desktop switch into the one port on the router.
Almost all of these switch chips port a configuration in which they prepend received frames with the port number it was received on and forward that to the CPU port. So you could write up a completely separate driver model
where every single port on the device is represented as a standard interface. That's something we have talked about, but I don't think anybody has any idea how to actually do that properly.
Yeah, you can do lots and lots of things. Yeah, just write the driver. As of now, as of last night, you can actually start doing that if you know the switch chip and you can just peek and poke at the registers of the switch and put it into whatever configuration you like.
All right, thank you very much.