We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Groking the Linux SPI Subsystem

00:00

Formal Metadata

Title
Groking the Linux SPI Subsystem
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
The Serial Peripheral Interconnect (SPI) bus is a ubiquitous de facto standardfound in many embedded systems produced today. The Linux kernel has longsupported this bus via a comprehensive framework which supports both SPImaster and slave devices. The session will explore the abstractions that theframework provides to expose this hardware to both kernel and userspaceclients. The talk will cover which classes of hardware supported and use casesoutside the scope of the subsystem today. In addition, we will discuss subtlefeatures of the SPI subsystem that may be used to satisfy hardware andperformance requirements in an embedded Linux system.
17
Thumbnail
24:59
109
Thumbnail
48:51
117
Thumbnail
18:37
128
146
Thumbnail
22:32
162
Thumbnail
23:18
163
Thumbnail
25:09
164
Thumbnail
25:09
166
Thumbnail
24:48
171
177
181
Thumbnail
26:28
184
Thumbnail
30:09
191
Thumbnail
25:08
232
Thumbnail
39:45
287
292
Thumbnail
25:14
302
Thumbnail
26:55
304
Thumbnail
46:54
305
314
317
321
Thumbnail
18:50
330
Thumbnail
21:06
333
Thumbnail
22:18
336
Thumbnail
24:31
339
Thumbnail
49:21
340
Thumbnail
28:02
348
Thumbnail
41:47
354
Thumbnail
26:01
362
Thumbnail
18:56
371
Thumbnail
13:12
384
385
Thumbnail
25:08
386
Thumbnail
30:08
394
Thumbnail
15:09
395
411
Thumbnail
15:10
420
459
473
Thumbnail
13:48
483
501
Thumbnail
32:59
502
Thumbnail
14:48
511
518
575
Thumbnail
25:39
590
Thumbnail
25:00
592
Thumbnail
23:32
Ordinary differential equationPresentation of a groupExecution unitFundamental theorem of algebraPersonal digital assistantCommunications protocolInterface (computing)Serial portPeripheralStandard deviationMaxima and minimaShift registerGame controllerFlash memoryBit rateoutputRaw image formatFunction (mathematics)MaizeStandard Generalized Markup LanguageBus (computing)BitRight angleAsynchronous Transfer ModeStandard deviationType theoryMaxima and minimaReal numberCommunications protocolSpacetimeCountingFiber bundleNatural numberGame controllerCASE <Informatik>Personal identification numberGraph coloringTelecommunicationData storage deviceDataflowFundamental theorem of algebraoutputAnalogyDatabase transactionShift registerSelectivity (electronic)DigitizingDifferent (Kate Ryan album)VelocityInformationWaveUniform resource locatorFunction (mathematics)Computer configurationMultiplication signFlash memoryData conversionLimit (category theory)Serial portDuplex (telecommunications)1 (number)PeripheralNumberPhysical systemVery-high-bit-rate digital subscriber lineSet (mathematics)Interface (computing)MIDIHydraulic jumpReading (process)Line (geometry)System programmingEEPROMWritingComputer animationSource code
DiagramAsynchronous Transfer ModeCharacteristic polynomialPhase transitionChemical polarityFunction (mathematics)ChainingDuality (mathematics)QuadrilateraloutputBand matrixDuplex (telecommunications)Line (geometry)outputStandard deviationComputer configurationPole (complex analysis)Right angleAsynchronous Transfer ModeInverse elementWaveformCASE <Informatik>BitDescriptive statisticsControl flowConfiguration spaceBus (computing)Selectivity (electronic)Point (geometry)NumberQuadrilateralArray data structureTruth table1 (number)Multiplication signNetwork topologyState of matterInternational Date LinePhase transitionBuffer solutionReading (process)Order (biology)Personal identification numberChemical equationAnalogyFlash memoryGame controllerComputer programmingMultiplicationField (computer science)PeripheralTable (information)Type theoryChainCountingFiber bundleCharacteristic polynomialPolarization (waves)SoftwareStability theorySingle-precision floating-point formatAreaVery-high-bit-rate digital subscriber lineDirection (geometry)Address spaceParallel portHeat transferUniqueness quantificationWritingCycle (graph theory)Source codeXML
Ordinary differential equationGoogolAsynchronous Transfer ModeGame controllerCommunications protocolFunction (mathematics)Computer hardwareControl flowMessage passingHeat transferAnalog-to-digital converterConfiguration spaceShift operatorFrequencyCharacteristic polynomialTelecommunicationMilitary operationData bufferPointer (computer programming)Atomic numberSequenceParameter (computer programming)Fundamental theorem of algebraOperator (mathematics)Game controllerCommunications protocolMessage passingLogikanalysatorTelecommunicationDuplex (telecommunications)Bus (computing)Heat transferPeripheralAtomic numberSelectivity (electronic)BitSequenceDataflowType theoryAsynchronous Transfer ModeFunctional (mathematics)Endliche ModelltheorieMultiplication signData structureLevel (video gaming)Parameter (computer programming)Personal identification numberMereologyHookingContent (media)Latent heatPoint (geometry)Matching (graph theory)QuicksortWritingPointer (computer programming)Reading (process)Kernel (computing)DiagramComputer hardwareBuffer solutionFrequencyMappingFundamental theorem of algebraCharacteristic polynomialCASE <Informatik>Right angleBijectionSystem on a chipPhase transitionPolarization (waves)Complex numberPhysical systemSource code
Message passingHeat transferGoogolPresentation of a groupPersonal digital assistantGame controllerKernel (computing)WhiteboardCommunications protocolControl flowPhysical systemCharacteristic polynomialNetwork topologyComputer fileOvalConvex hullAnnulus (mathematics)CASE <Informatik>BitMultiplication signInformationMathematicsCommunications protocolDevice driverWhiteboardSerial communicationOrder (biology)Game controllerMaxima and minimaCharacteristic polynomialHookingRight angleAnalog-to-digital converterLevel (video gaming)Parameter (computer programming)Table (information)FrequencySequenceMessage passingOptical disc driveHeat transferSpacetimePhysical systemData conversionDiagramReading (process)Computer hardwareComputer configurationDigital rights managementSoftwareSlide ruleNumberReal numberGrand Unified TheoryChainExterior algebraComputer fileAuditory maskingState of matterNetwork topologyType theoryMereologyFlagLine (geometry)Flip-flop (electronics)Computer clusterWritingDifferent (Kate Ryan album)Selectivity (electronic)
GoogolPresentation of a groupCodeWhiteboardComputerStandard deviationData modelKernel (computing)Communications protocolGame controllerMessage passingContext awarenessWrapper (data mining)Function (mathematics)Bit rateSoftware developerBus (computing)Polarization (waves)Kernel (computing)WhiteboardPlastikkarteCovering spaceMaxima and minimaFrequencyKey (cryptography)Network topologySlide ruleString (computer science)Address spaceTrailSelectivity (electronic)Moment (mathematics)Communications protocolNumberGoodness of fitFront and back endsEndliche ModelltheoriePattern languageHeat transferMessage passingStandard deviationSet (mathematics)Contrast (vision)Computer fileData structureNormal (geometry)Field (computer science)Type theoryFunction (mathematics)Virtual machineGame controllerSystem callHookingBitInformationModule (mathematics)Multiplication signThomas BayesConstructor (object-oriented programming)Context awareness2 (number)SynchronizationImplementationRegulärer Ausdruck <Textverarbeitung>1 (number)Computing platformForm (programming)FamilyDirected graphOverlay-NetzComplete metric spaceRight angleHypermediaKeyboard shortcutWrapper (data mining)Source codeXML
Standard deviationGame controllerMIDIMessage passingHeat transferData modelConfiguration spaceParameter (computer programming)Ordinary differential equationDuplex (telecommunications)Control flow6 (number)SynchronizationMessage passingParameter (computer programming)Heat transferType theoryLibrary (computing)Bus (computing)Data structureCodeKeyboard shortcutSystem callElectronic mailing listGame controllerSoftwareSelectivity (electronic)CASE <Informatik>Kernel (computing)Communications protocolSpacetimeSystem on a chipPiDatabase transactionMultiplication signWeb pageComputer hardwareStandard deviationFlash memoryPattern languageAreaDampingWritingConfiguration spaceDuplex (telecommunications)Complex (psychology)Software developerDifferent (Kate Ryan album)Natural numberDirectory serviceWhiteboardSoftware testingBuffer solutionUniqueness quantificationAddress spaceRaw image formatFamilyCore dumpInformationReading (process)Right angleOrder (biology)Directed graphWeb crawlerContext awarenessLevel (video gaming)Execution unitCharacteristic polynomialDiagramoutputQuadrilateralSemiconductor memorySet (mathematics)Endliche ModelltheorieSource code
Memory managementGame controllerSynchronizationLevel (video gaming)Band matrixContext awarenessHeat transferComputer networkKernel (computing)Presentation of a groupGoogolStatisticsLogicComputer hardwareMessage passingError messagePatch (Unix)Series (mathematics)NP-hardDuplex (telecommunications)Personal digital assistantDependent and independent variablesCommunications protocolPhysical systemOverhead (computing)Multiplication signMereologyRevision controlMessage passingBootingMixed realityPower (physics)FlagContext awarenessHeat transferSoftwareRight angleDirected graphFlow separationSelectivity (electronic)1 (number)AbstractionFront and back endsBitHistogramPairwise comparisonSoftware testingMathematicsCASE <Informatik>Bus (computing)Default (computer science)Game controllerCommunications protocolNP-hardPatch (Unix)HeuristicMIDIThresholding (image processing)Duplex (telecommunications)Mathematical optimizationReal-time operating systemDependent and independent variablesPoint (geometry)LogikanalysatorStandard deviationPhysical systemLimit (category theory)Online helpRun time (program lifecycle phase)Social classSoftware bugCharacteristic polynomialStatisticsModule (mathematics)Reduction of orderNatural numberSynchronizationLogicLoop (music)Kernel (computing)Endliche ModelltheorieKeyboard shortcutCache (computing)Source code
MIDIComputer animation
Transcript: English(auto-generated)
Thank you. Okay, cheers everybody before we get into this. Spy subsystem. My name's Matt Porter, as he said, and let's jump into it.
Of course now it quits working. How about that? Well it worked just a second ago. Alright. Exactly.
Alright, just to de-obfuscate
the name of the thing, our community has gotten more diverse so people might not get hind line references, so we're gonna understand intuitively the spy subsystem, hopefully,
by the time we get done with this. If you haven't read it, read Stranger in a Strange Land. Awesome book. Alright, little overview, we'll talk about what spy is, okay, we'll go over some spy fundamentals, because we can't truly understand the subsystem unless we really understand how spy works first. So we've gotta go through that
first. We'll talk about some fundamental Linux spy concepts, so how the subsystem takes the concepts of spy itself, translated into Linux world, and we'll look at some use cases to kind of drive us through the subsystem, right? Adding a device, doing a
protocol driver, we'll talk about what that is, how that relates to the real world of spy, controller drivers, user space drivers, and then we'll talk a little bit about performance and then what's coming up in the future. So getting into what spy is,
so serial peripheral interface, Motorola developed this, it's a de facto standard, so if you're hoping for a written spec with a committee you won't find it here. It's a master slave bus, four wires, and we'll talk a little bit about
the signals coming up here, except when it's not. So four wire sounds like the easy case, everybody talks about the standard case, but there's a few others, we'll get into those a little bit, options where you don't need to use all the wires and so forth, or use more.
There's no maximum clock speed, obviously there's practical limits that chips run into, but then this really poorly formatted URL is obviously the usual place you can go get some information on it, so
if I lie about anything you can go there and find the truth hopefully. So one of the jokes about spy is that everybody makes a big deal about it, but at the end of the day it's just a glorified shift register. Alright, so what are some common uses, where do we see
spy used? Well, pretty much everywhere, but just to highlight a few of them, you have things like flash memory, you guys often find serial flash devices, why do we do that? Why do people design those in that way? Low pin count, right?
Same reason you see I2C EEPROMs for your non-vowel storage. Huge advantage with that low pin count in embedded systems. So again, same reasons drive its use with analog digital converters, a number of different sensors, thermal couples
for example, you might say, well that's just a temperature sensor, I know that there's lots of I2C ones and they're low speed, right, but in industrial use and so forth, things like thermal couples have to be sampled at very very high rates of speed in a process control that has very little
deviation permitted in the temperature. LCD controllers, you start seeing a theme here, and of course as we saw this morning, the chromium embedded controllers can use spy as one of the communication channels. One of the themes here when we look at all of these
is that they're all fairly high speed type devices, right? ADCs, thermal couples in the kind of sense I gave where we sampled very quickly, LCD controllers can have a very high rate of speed if they're
a color controller and so forth. Alright, so now that we've got that little overview out of the way let's hit our fundamentals. So we start with our standard signals, and here it can get a little bit ugly when you're looking at data sheets, and we're going to talk a little bit
about data sheets and translating those into some reality of extracting the information you need out of them, but you start with MOSI, as I pronounce it, master out, slave in, and what I show here is you're going to find a lot of different names for this. Because it's
a de facto standard, every manufacturer vendor seems to have picked some different things, and I've even seen data sheets where they'll show I2C names for the data in a half-duplex type device. So that's why SDA is on there as well. So you'll see all these, it depends
on the vendor. Same thing with master input slave output. You'll see that same set of things and then your serial clock. Okay, that's an output from the master. You'll see those kind of different monikers for it. And then you have the
concept of a slave select, that is a master output and so that's how a spy device is actually selected should there be more than one on the bus. So clock, you have output and input, and so that's the four wires we talked about. That's our common case.
And keep in mind as you see this, that if we've got dedicated channel for output and a dedicated channel for input, it's a full duplex protocol by its very nature. So this is what it looks like in our really, really trivial example.
So we got our spy master, we got our flows for each of those and if you want to do a transaction he would go ahead and assert this slave select flow and say he wants to do a write, he would put data on there
and clock that. Now we're going to look in detail how that works because you can't verbally explain it and give it justice. So let's look at some timing. And with these wonderful timing diagrams that Wave DROM helped me do, we'll look at, don't worry about
write mode zero yet, we'll explain modes. But we have a typical write cycle in zero and so what you're seeing here is just an 8 bit transfer. And in this case what is usually the de facto standard
is MSB first on the wire. So that's why you're seeing D7 down here. So what you see is you see the data stable here. I can't really hold this, I must be nervous.
Thank you for that. So as you'll see it's latching that data on the rising edge is what we're showing there in a write. And you see it on the master out slave input. Makes it easy with the name if you use those signals to know which direction you're going there. And then on the read case
you have it latching on the rising edge here. And you can see that this chip select went low before that first edge for that bit. So that's your very basic case. We'll get to the modes here in a second.
And here we are. So spy modes you have lots of options. So there are many modes but it's a pretty simple truth table. The way it breaks down is you compose it of two characteristics of your clocking relative to the data. So you will see
CPOL is the abbreviation you'll typically see for clock polarity in most data sheets and in the original spy description from Motorola. And then C phase, the clock phase. And so right here we see how those break down. So
if the polarity is zero or CPOL is zero then your idle state of your clock is low. And so it's transitioning from low to high in the active state. And then the inverse is true if CPOL is one. So that allows options on the clock polarity for devices.
And then the phase, very simple, some people forget it, but your data is latched on falling if C phase is zero. And then it's output on rising. And then on one it's latched on rising. So this is going to
vary on the type of peripherals you deal with and so forth. So you end up having to program your controller to operate with the proper phase for whatever peripheral you want to talk to. And so what they've come up with is a way to encode these, that's a de facto standard as well, that simply
modes zero through three. So if you're used to seeing modes zero through three that's how they translate back. It's simply just telling you what polarity of the clock and what phase I'm latching data on. So now let's look at these modes. I showed you the basic example. It was writing and reading and mode zero.
So now let's look at each individual. We'll just look at writes just because it doesn't matter for illustration of this concept. So what you see first is our chip selects going low to activate the chip at this edge here. Don't care what the data looks like here.
And then I have my first clock edge. And so this is just like, this is the original example we looked at. And so you see it's latching data on each rising edge. And notice that it was
idle low clock. So that would be the clock polarity zero. And then the phase rising is where it's latching. Then we go, we still have clock idle low in mode one, if you remember our truth table there.
So when we're looking at one we're just changing the phase. So now you see that instead of latching on the rising edge, it's latching on this falling edge here each time. You see how that lines up with each stabilized data bit.
And then when we go to write mode two, now CPOL is one. So our clock is idle high. So that's why you see this waveform. It's high and now we're going to a falling edge. And in this case
I show falling edge. Falling edge latched there. Remember they're all right so that's why it was Mozi. And down the line. Same thing here except in mode three it's on rising edge. So that's all the modes.
But we only looked at this really trivial example. Single slave and so forth. Now it gets complicated because we live in the real world. So first off obviously we've got the reason we have the chip select in the first place was so we can have
multiple slaves on a bus. And the way to think about a chip select is from a software stance that gives us an address. So everything is going to have a unique chip select in order to activate the chip activate its buffers and so that becomes a convenient
abstract way to reference multiple slaves connected to one master. The next thing to get complicated is daisy chaining. And the very common case that they'll show with daisy chaining
is the daisy chaining of the Mozi to Miso on each device. But there's also daisy chaining cases. In fact the Anodyme field programming analog arrays they do chip select chaining during configuration. So
they shift so many bits and then it asserts another chip select. So now you don't have the master controlling the chip select down the line. And so that can get ugly as well. And the next area I promised you there's lots of ways you can abuse the fact that it's a four wire bus. Well
in high speed flash devices they start adding more lanes. More lanes, more throughput. So there's a balance there on pin count versus your ability to burst read data off your spy flash. So we'll see that there's devices
spy flash devices that do dual or quad quads popular. So essentially instead of one Miso we have N number of them. And there's nothing preventing. I can't remember if there's some eight lane yet but it probably goes beyond the
point of doing it and I know that the quad ones are very close to or right around the parallel NOR flash speed now anyway. So obviously like I said that increases your throughput. So every time you add another lane like that. Alright and then the last one
here variant is the three wire variant micro wire. And so what they do is they combine Miso and MOSI and it operates in half duplex. You can also do things like say that your peripheral is a write only peripheral. Some LCD controllers
that's all they are. And you don't even need Miso hooked up for those if you're just caching the contents of the registers and operating from reset. So a lot of designers just won't hook up a pin if they don't need to read back from it. So essentially you're back to a three wire bus. Or maybe
it's your only device and you drop the chip select and it's always asserted. So there are designs like that as well. Alright so looking at that more complex case, which is really the more common one that you'll run into, what you see is the same
type of flow and MOSI, Miso, and the clock are all propagated to all three of these together. But you'll see that each one of them has a unique slave select now or chip select. SS1, 2, and 3.
And if you can read this eye chart, it's hard to fit this stuff on there. So what you see here is this one here, and I know that's probably hard to see. I knew this one was going to be tough, but that's slave select 1, 2, and 3. And so this is just showing
three 8 bit writes to each of those slaves. And so when you see the timing of that, how you manage that is you drop the chip select for SS1, this is write mode 0, and so what you're seeing is when that chip
select is asserted, you see it latching the data for the first slave. And then you see slave select 2, you see that unasserted, that go low, and now slave select 2 is
receiving 8 bits of data and the same thing. So that just shows you what it would look like if you were dumping that on your logic analyzer. Okay, we're through that background now, right? So now that we understand those concepts, now we can talk a little bit about the next spy subsystem.
So now every subsystem in the kernel has to translate this hardware into some sort of concepts that we can use. And of course with the device model in the kernel, the way it's broken out in the spy subsystem is we have a concept of
controller and protocol drivers. And those match up one to one with what the controller is that master that we saw, and the protocol driver is whatever protocol is needed on top of just shifting bytes out to actually do something with your peripheral.
And so the controller drivers, that supports the spy master controller, both of these run on your SOC, if you will. And all the controller driver does, the subsystem is very carefully split out that protocol
peripheral from the actual shifting of bits out of that controller and so forth. So all it's doing is controlling when that clock comes on, when the chip selects are asserted, or slave select, if you will, shifting the bits, and then configuring those characteristics so that you get the right
mode, you get the right frequency of a clock, and so forth. And of course the polarity and phase, as we showed in coded in the modes. So an example would be like for Raspberry Pi 2.3, the 2835 driver.
So now protocol drivers, as we said, very carefully split that concept away. So if you're writing a protocol driver, you don't have to think about what controller am I on, what are its characteristics, to a point.
That's all abstracted for you. So the way to think about them is that's your slave specific functionality. So if I've got an ADC, it talks to whatever that ADC says for these are my registers, this is how I get samples started, and this is how I read them out, that's what you're coding up in a protocol driver.
And so fundamentally every operation in a spy subsystem is based on top of the concept of messages and transfers. We'll talk about those in detail here and really understand that as it relates
to the APIs, but that's a fundamental concept they've added that's somewhat outside the realm of that low level spy hardware, but maps well to how peripherals actually use spy in functionality. So remember, the protocol driver
relies on you having a controller driver, it doesn't care what controller driver, everything is abstracted for you. So an example would be like one of these old MCP 3008 ADCs. So as promised, communication
the subsystem of course, it breaks everything up into transfers and messages. And so a transfer and if you think back to those timing diagrams I'm showing, a transfer if say my device wants an 8-bit, it has 8-bit registers that it
implements behind it. A transfer might be a single 8-bit read or a single 8-bit write. In generic terms, it's a single operation between the master and slave. Part of the transfer data structures,
what's carried in there, remember it's a full duplex wire protocol at the low level. So each transfer structure carries both TX and RX buffer pointers. One can be consumed and the other filled within the same
operation. You can make either null if you're just doing a half duplex operation. And each transfer can have its own optional parameters set on it so you can control what happens with the chip select after that operation ends. And that
matters a lot when you go look at your peripheral data sheet. It may say oh you can't drop the chip select after a certain sequence of things. So you can go per transfer to find how that's going to behave. And then you also may need to insert some
delays in there to satisfy the timing and so there's ability to do that. Now that said, what our message is, just an atomic sequence of transfers. So when you build up a message, you can add n transfers, all of which can have this individual behavior of how they
manage a chip select, a delay in between, and so you can build up that message for whatever your peripheral data sheet says. You need this sequence, you can build that all up, add the delays and so forth. And so the message itself is the fundamental argument to all the spy subsystem
transfer calls, the read write APIs. And there's a bunch and we'll talk about that. But just to give you an idea, if it wasn't clear verbally that's the best way to kind of look at a spy message, right? You got a chain of transfers inside of it, n number of those,
pretty simple. Alright, so let's use some use cases to walk ourselves through the subsystem now. So I want to hook up a spy device, right? I know there's a kernel driver for it, like let's say the MCP
320X that runs MPC 3008, but how do I hook the damn thing up and get it working on my board? So the next one, I want to write a protocol driver, I want to write a controller driver, or I want to write a user space driver, because I've got something odd or I want to do
something different than a kernel driver for some reason, or I'm lazy. Alright, so adding a spy device to the system, the first thing is you need to know the characteristics of your device. You need to understand the timing of it in order to properly add it in, because you're going to have to set some parameters.
So learn how to read a data sheet if you're going to do that. Even if you're using an off the shelf one, if you don't understand what the max frequency is, then you're not going to be able to hook it up right. So there's three methods to do this. Device tree, obviously pretty ubiquitous.
So we have it everywhere now. The board file method that's basically deprecated now, except x86 people tend to use it sometimes. And then ACPI which a lot of drivers are getting ACPI ID tables now to support that.
Alright, wow. So learning to read data sheets for spy, there's a few important things. First off you might not even notice when you look at the data sheet, sometimes they don't even say spy. And they'll just say
ah, it's got a serial interface and you have to kind of go look at it and maybe they're masking because they don't want to say the Philips I2C name or something. So it might be I2C. But you got to look a little bit. Usually they'll say this is
a spy device. So there's a few clippings out of an ST77 7735 LCD controller. And what's important, this was one of those examples where I was not kidding you, they show SEL and SDA but it's spy when
they show that timing. And that's also an example of one where I said if you write only like this, it's like know this part well, you have a situation where it's essentially a three wire setup. There's no need to have the
read path. It's optional. They actually show a four wire hookup as well if you do want to read certain status things. But most designs don't even hook that up. So important things here, identify at the beginning it's a spy device. Okay, that's pretty obvious. But then
be aware of your timing diagram. You have to be able to read the timing diagrams. They're not much different than the stuff I drew up. They got a lot more detail though, right? So if you're a software person that hasn't got comfortable with data sheets, now's the time to do so if you want to do something with hardware.
So you need to look at things like this time here, which is the time between where you drop the chip select or you assert it, and when that first clock edge starts. And then they're showing things like the
period by this high time and low time. So now you have the period and thus the frequency. So if you can do simple math you can figure out what your maximum frequency. So that's generally how they communicate that information to you. So you'll see they've got that setup time and they've got hold
times and so forth. So that's where you get your frequency from. Just a little bit more on that. Then what happens is all they were doing in that timing diagram is showing you a very low level. So then later you're going to see
more information that's actually how their protocol works. So in this part it's got the concept of a data or command type state. So you'll find tables like this where now you're talking protocol. Now it's showing me how in this case on the data line
this is a data or command flag and so everything that's following in that eight bits is either a command which could be a register access or if you're doing data you may be just streaming your frame buffer
if you're using the FBTFT stuff or the new DRM stuff they're working on for that. So this is where you start seeing that. So you need to look for how those registers are mapped and so forth and that's where you get what you need to do the actual protocol drive or the guts of it.
Alright, so let's look at another example real quick. That's one where it's an LCD controller. I'm going to use this example when we go through hooking things up a little bit but again it tells you it says I'm a spy device somewhere at the beginning of the document
these are just clippings I snap shotted out. Again, same type of timing and they show T high and T low here, add them together to get your period and you find out that it can do a max of four megahertz. Well, nominally people find that everything works out
of spec but in reality that's your safe time. Same thing there and again just reiterating that it's one thing to have that low level timing, it's another to go further in the data sheet and you start seeing how the ADE converter goes
when it starts doing its conversion, how soon after the data is available and D out here, if you remember they use back to, I described the signals and all the different names that's the MISO, that's master in, slave out.
No, this is a four wire, it's just using the alternative naming for the things so just don't get confused by that. If you don't see MOSI and MISO on the data sheet, D out is
also MISO. That was back on that earlier slide on all the different crazy names that we see used. Oh, this turned out terrible. Contrast is not good, huh? Well, can you guys read that at all?
Yeah, you guys can. Can we turn more of the lights off because I think it will be legible enough, otherwise I don't have any examples.
Now this
is an intimate setting. So, this is not grokking DT so I expect you to know it but this is just remember where to reference and find information for spy stuff. If you're going to hook things up, so we're back to let's hook this thing up, right?
Via DT, go look at the tree binding, documentation device tree bindings, and sure enough we find this and I clipped a little bit out of this for the example but one of the compatible strings is microchip MCP
3008. There's some deprecated ones in there as well and then there's a little example of how you would hook it up there. That's for a 3002, we have a 3008 and so let's say, again I'm not showing every variant of how you can actually implement this but
just as an example, a DTS fragment for this device would look something like this. If you don't understand DTS fragments or DT fragments and overlays, that's a different talk, but the important thing here is that in the overlay, the operative piece here is that this
reg here is a chip select or your slave select so that's your unique ID. I said you can use it like an address and that's how the subsystem uses it. You've got that compatible string that we saw in the
binding on the previous slide and then we figured out what our max frequency was from our data sheet because we're smart and we're able to plug it in there. So 4 MHz there. Now, this is how you do it the wrong way. So if you've got a board filer, you're on
something like maybe a middle board, typical developer board where you don't have DT handy and ACPI doing overrides is really a pain. You'd do this type of thing which is well documented
in the subsystem docs. This is the old way. Normally you'd see in a board port, you also see this structure and so forth in the old days before DT you'd see all the board files in the kernel registering all their devices this way. So just in a
big array here. So this is just an example of one. Again same data that we needed, right? We know we're bus number one, chips like zero, and this 4 MHz rate and we also need to tell it what module. Doing it by ACPI is almost unreadable
as with most ACPI stuff. But you see there basically the same things are there. It mixes a bit up here. You've got the clock polarity and so forth. You've got the max frequency
well that's this one here. So that's your 4 MHz in machine readable form. So that's what that looks like. So three ways to get there, it just depends on your platform. So you'll see these type of constructs on
Bay Trail type, if you dump their ACPI you'll see some of the stuff for spy devices. It looks mysteriously like that. Alright, so that's how we hook one up. Basic example. Time check, alright we're good. So now let's talk about protocol drivers.
And now we're getting in the meat of things we're gonna come back to the whole messages and transfers and so forth because we're now gonna use some of that as we get into these. So if I do a protocol driver standard Linux driver model, you find these same
patterns throughout the kernel. So you have to instantiate a spy driver. So there's a few important fields. Once the probe fires, so again we assume that you know the driver model first off, they're not gonna cover all that
here, but just as a highlight of what you've gotta fill out to do your protocol driver. You have your optional PM ops that you probably want and you gotta probe and remove which all standard hooks you're gonna find and everything that works in the driver model. The important thing operationally with a spy protocol
driver is that as soon as it probes you can start using the kernel APIs to do messages. So that's immediate and so yeah once you implement that probe you can start banging on the device using the kernel APIs. So what are these
APIs? These are the APIs you're gonna use to build a protocol driver. We said the subsystem's carefully architected to separate out that controller thing so all we have available to us are these neat little transfer APIs.
So the first is spy-async and it's not really an obfuscated name, it means what it says, so that's an asynchronous message request. And when that message completes you get a callback executed. The key with
async and when you're using these is that it can be issued in any context. I'll explain what the other side of that is in a moment but you can execute that for example in IRQ context since it does not sleep. Now spy-sync and I will say that when you look at the
type of drivers that use spy-async you find it's not heavily used directly and we'll get back to that in a second. So now let's talk about spy-sync. Again, synchronous message request. It does exactly what it says. One thing implementation note
is that all of these synchronous family calls are all wrapped around spy-async. So the same back end's used but it guarantees that it will return on completion. So you can only issue this in a context
that can sleep. So if you need to be in an IRQ context you're gonna have to use this guy. As I already said, it's a wrapper around those. So you can build up complex messages and then issue them with the spy-sync.
Then there's some helper calls where you can just hand a buffer and do a write and a read. So they're just wrapped around there. What you'll find is you'll see things like some of the network drivers will use spy-async. They're more optimized around throughput
so they can drive everything off of callbacks that way. Things that want low latency are gonna want to use something based off of spy-sync. They may have a short little command to send and they don't want to have some arbitrary amount of time and handle the callback coming back. They want to do it sequentially.
Then you have some specialized APIs. Well, just this first one. This is kind of out of order, fortunately. So SpyReadFlash, you've got an API if you deal with spy flash devices, they have
a standardized set of commands. So that's actually optimized for that set of commands. It's also optimized around specialized engines like some of the quad-spy controllers on the newer SOCs. They'll IPL
out of that quad-spy flash and so they have memory-mapped IO and it actually triggers spy transactions on the bus by accessing that. They'll demand page in chunks of that spy flash device into that MMIO
area and execute the code XIP there. So that supports those types of things. And then when you do need to build up the message, show how you have all these transfers, there's APIs to do that. So not surprisingly you can
spy message init and that initializes an empty message. So you deal with that kernel structure and then you can spy message add tail and just add transfers onto that list that we saw in that diagram earlier. So you can put any arbitrary type of transfers together, different characteristics, how they leave their
chip select, everything by using that API there. Alright, that covers the APIs that you're going to use with protocol drivers. With a controller driver, again standard driver model, not everybody is going to be writing one of these every day. Usually you're going to be touching protocol drivers
for most people. But it works, again, like a lot of other subsystems. We have those strong concepts of master device and also called a controller in the subsystem. And so you're going to spy alloc master and then once you've done that
you need to set a whole bunch of methods. So set up configure spy parameters, clean up prepares you for removing the driver, and then you have four different that are the meat of it. So one is prepare transfer hardware. So you have to implement
you may need to do some preparation for a message is going. So when the message pump in the core spy subsystem is about to send you a message to be processed by your driver, he'll let you know
before he calls transfer one message, so that you can get ready so you can call transfer one message. So you get a hint there that a message is going to arrive into your driver. And transfer one message does exactly what it says, you dispatch a message there, so when you get called by that, and then transfer one is just
dispatch a transfer. So you implement all those and then you register it with the subsystems by register master. Okay, so user space drivers, obviously this is
a popular one with all the developer type boards and the maker community likes to do the user space type drivers and wiring pie and stuff like that over top of things. The way that's done is through a protocol driver, a special one called spy dev that exposes a simple character device
out to user space. So when you bind that spy dev driver to an actual device using methods that we showed earlier you get these instantiated so you have spy dev and whatever your bus dot your chip select. So that gives you that
unique address per bus, remember. And so same thing, you'll see information on it under sysfs and then you see that character device there. And usage of it is very straightforward. So simple open close, you've got
half duplex read and write commands just by their nature using the standard calls. And then if you want to do more complex things...yes? So half duplex in user space, I didn't realize that but what actually happens in that case to
the actual transfer, does it just get discarded like the other half of it? So if you look at the actual transfer struct when you're just doing a write, by being half duplex that just means that you can't set up a full duplex transfer using that API.
Does the other end just get thrown away? So that would be the case, you would do this very commonly even in kernel space where you just set, say you want to do a write, you would set the rx buffer to null. So that's a common pattern if you were building it up
by scratch. So that's what happens underneath those types of calls. Good question. So to get more complicated, SpyDev does support very complex drivers. So to get at that you have to go beyond read write that aren't sufficient and use your IO controls
and so there's a few... don't show all of them here, but the important thing is, and you can read the docs on it yourself, but the important thing you've got there is the Spy IOC message. You can do raw messages, full duplex, you can control the chip select, everything, and then you have
a whole family of read write IOCs where you can set all the same spy parameters that you would in the kernel APIs. So you have full control, you can build up complex messages with lots of different transfers just like you can in kernel space.
Best place to go to get more details on that is in that SpyDev doc. And then what's great is the kernel has, and sometimes people aren't aware, in the tools directory a couple great examples. So you want to see that thing used in a raw way. There's a full duplex example, there's
a test piece that a lot of people use in their examples. And then once you move beyond that I can recommend Jack Mitchell's libsock library has a higher level, it fits into his whole libsock API so you have a uniform API, makes it easier to work with
SpyDev, maybe a lot of times you want to use GPIO too if you're doing user space stuff so you have a common API. So that's helpful, it's a very familiar standard C library. And then a lot of people, I think Landon Avi was talking about Python and stuff and the maker community and his
talk and a lot of people are using this Python binding to SpyDev. Alright, we're almost there, we'll talk a little bit about performance. So this is all well and good, you've got all this abstraction, you've got these standard
APIs and everything, but at the end of the day you need to understand how to use them properly. And so first thing that happens when you have bad performance is you probably miss some characteristic of the controller driver. So we've got all this nice separation but there's always going to be some things that aren't
fully abstracted and you find one of those where down in the controller driver the McSpyDriver on OMAP that's used on pretty much all the TI parts they actually, they have hardcoded because of DMA overhead it doesn't use DMA until you get beyond
160 byte threshold. So it doesn't use heuristics or anything so you have to be aware maybe you're doing a test, a simple test and you're like why is this taking so long or you're looking at it and there's the pause but they're using PIO so if you do a simple 128 byte thing
you might not see the DMA routines, well you won't, fire and so forth. So that's a specific example but you do need to know some of the characteristics of what's going on underneath to kind of understand how that got translated into the real world from those abstracted APIs.
The other big thing and we kind of touched on that already is you need to know where to use sync versus async. So I said network drivers typically that are already driving off a callbacks where they're recycling SKBs and so forth. You already have that model where you're optimizing for throughput
so you typically use asyncs there, you don't care exactly when it goes but as long as it just completes and you do your housekeeping off of the callbacks but you're going to have some latency there. With the synchronous
one of the nice things now, and I don't, which kernel did that come in? That tries to do immediate and early 4x. I was too lazy to go run git and check it but it was somewhere in there.
Old unless you're on some crappy vendor kernel from. Exactly. So the point there is if you're on a newer kernel or if you're stuck on an older one there's some great optimizations in there where you want to have
very low latency, it will try to execute in caller context and you can avoid a context switch. You avoid the sleep and reduce latency. Now here's the eye chart and I want everybody to see this and probably
somebody would take a patch to maybe get it in the docs itself but I always found, I had to explain this to a couple people before, you will find this in include-linux-spy-spy.h but the spy overview docs are a little bit more terse about this behavior and this is very very good.
One of the optimization things when you're working on protocol drivers is that you want to be very careful about the behavior of the chip select because the backend controller often will have a very long delay if your chip select is unasserted and then you do another transfer
and it's got to reassert it and it can handle just having the chip select asserted the whole time. You can change some flags in the transfer to keep that asserted all the time. There's default case and so you use cschange to do that. This is where you
can go read the full explanation of that and also you can use it as a hint that if you're dispatching a lot of transfers you can have a hint that the next transfer is going to the same device and so it doesn't have to drop the chip select. That can save you
if you go look at it on your logic analyzer or scope it can save you a lot of time in between those so when you're optimizing you need to be aware of these things. Quick thing on tools and then we're done. You want a logic analyzer, go take a look
at these things. Sigrock has a great comparison of everything and what it supports and it's pretty comprehensive. The spy loop back test module is a nice little tester if you want to look at your performance with something that's a canned test. Then
the spy subsystem statistics that came in a couple years ago or so, really great stuff. You'll find them under your bus.chipselect statistics and what's great about those is when you're thinking about that spy sync stuff
and did I actually how many times am I actually sleeping or actually executing in the context of the caller, you start having visibility of that per device by using how many messages you sent, transfers you sent, how many times I did spy sync immediate which executed in my caller context versus how many times did I fall asleep.
Timeouts, so forth, and then this is really awesome there's histogram Sysfs attributes now. So you have buckets of size of transfers so you can actually look and see with a histogram
what your data looks like. So that will help you out a lot when you're optimizing. Last thing, future slave support is coming and it's useful in some limited use cases because we have the hard real-time nature of slave support where it's a full duplex bus and you're going to have to respond with something
as soon as you get that first bit clocked in. A lot of cases you can't do but you can do things with pre-existing responses, things that are just a one-way command to your Linux system. So Geert Jeterhoven has a RFCv2 patch series, he's working on some bug fixes
and he's got a couple examples in that where he has basically when you register a slave you get this Sysclass slave device, you can at runtime bind a slave protocol driver into that slave
controller and he's got two example ones, one that does latest uptime, it's got pre-existing cache and it can respond back with a time packet and another one where he can power off, reboot, halt the system, just one-way command. So that's coming, I think he's got a new
version coming soon, real soon now, I put him on the spot. That's it. So I think we do have time for a question if people could be courteous to Matt
who's given us a great talk, hopefully we can be still just for this one question if there is one, go ahead.