We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Displayport Compliance

00:00

Formal Metadata

Title
Displayport Compliance
Subtitle
Fixing Black Screens on Linux
Alternative Title
A Journey through Upstream Atomic KMS to achieve DP compliance
Black Screens and how to prevent them from upsetting Linux Users
Title of Series
Number of Parts
644
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
29
48
Thumbnail
52:23
116
173
177
Thumbnail
24:07
182
206
222
Thumbnail
48:23
287
326
329
Thumbnail
25:09
349
356
Thumbnail
26:14
361
Thumbnail
51:22
373
Thumbnail
25:25
407
411
423
Thumbnail
10:54
449
450
Thumbnail
24:27
451
Thumbnail
27:12
453
459
Thumbnail
34:41
475
Thumbnail
18:23
489
Thumbnail
40:10
496
503
Thumbnail
12:30
515
Thumbnail
05:10
523
525
Thumbnail
19:53
527
538
Thumbnail
25:25
541
Thumbnail
25:08
565
593
596
635
639
Electronic visual displayWhiteboardForcing (mathematics)Projective planeCoefficient of determinationKernel (computing)Device driverPoint (geometry)TouchscreenView (database)Hand fanComputer animation
IntelKernel (computing)Device driverElectronic visual displayLink (knot theory)Bit rateSource codeSequenceOptical disc driveFrame problemPerfect groupGraphics processing unitTouchscreenAtomic numberWave packetData recoveryDirected graphSpeciesInformationConfiguration spaceReading (process)Point (geometry)Image resolutionDivisorMultiplication signProcess (computing)TouchscreenGraph coloringExpected valueDifferent (Kate Ryan album)Covering spaceTable (information)Symbol tableLatent heatSpacetimeWhiteboardSource codeLink (knot theory)Electronic visual displayDevice driverArmRight angleCASE <Informatik>Projective planeError messageMessage passingCoefficient of determinationPhase transitionState of matterConnected spaceCodecLaptopObject (grammar)Electronic mailing listSequenceSoftware bugSoftware testingMatching (graph theory)Computer hardware40 (number)Bit ratePlanningField (computer science)Event horizonMappingLevel (video gaming)Equaliser (mathematics)SynchronizationGradientMaxima and minimaAsynchronous Transfer ModeLoginKernel (computing)SkewnessSet (mathematics)Perturbation theoryFreewarePattern languageDigital rights managementForm (programming)CountingParameter (computer programming)Mathematical optimizationInterrupt <Informatik>Tap (transformer)Atomic numberMultiplicationNumberoutputReference dataComputer animation
Perfect groupLink (knot theory)Gamma functionAsynchronous Transfer ModeComputer configurationGraphics processing unitParameter (computer programming)SpacetimeSoftware testingIntelOpen setWhiteboardSystem programmingKernel (computing)Event horizonAsynchronous Transfer ModeLink (knot theory)Image resolutionPulse (signal processing)Wave packetCategory of beingPoint (geometry)Bit rateProjective planeMultiplication signPatch (Unix)Suite (music)Phase transitionIntegrated development environmentSpacetimeKernel (computing)TouchscreenCountingInformationWhiteboardCASE <Informatik>Device driverPhysical systemVideoconferencingSoftware testingMereologySet (mathematics)Game controllerCommitment schemeLaptopConnected spaceMathematicsParameter (computer programming)Combinational logic2 (number)Different (Kate Ryan album)Type theoryMaxima and minimaOpen sourceAtomic numberEntire functionElectronic visual displayComputer configurationComputer hardwareNormal (geometry)Physical lawView (database)Symbol tableGoodness of fitHypermediaFile viewerMessage passingMobile appOnline helpBuildingMetropolitan area networkPlanningVideo gameLatent heatComputer animationProgram flowchart
Data storage deviceSoftware testingOnline helpRight angleBasis <Mathematik>Point (geometry)Pay televisionVotingWhiteboardComputing platformSpacetimeAndroid (robot)Multiplication signTotal S.A.Source codeScaling (geometry)Category of beingCASE <Informatik>AreaRevision controlPerturbation theoryComputer hardwareSurfaceSuite (music)Computer animation
PressureKernel (computing)CollaborationismService (economics)Gamma functionProgram flowchart
Transcript: English(auto-generated)
So, hi again. My name is Mansi Naware, and I work at Intel's i915 graphics kernel driver team. I've been on that team for about two-and-a-half years, and today I'm going to talk about actually my very first project on the team
which was DisplayPort compliance. From the user's point of view, yes, it meant fixing the black screens with DisplayPort, which maybe it happened just now. I don't know. I shouldn't have because I have the latest kernel with the fixes.
So has anybody else experienced this before with the DisplayPort or Mini DisplayPort, just a black screen? Okay. So good. So I guess you can relate to the problem that I'm talking about here. When I joined the team, there were almost I think 97 percent of
the free desktop bugs that we had on the display side were because of the black screens, or the hot plug issues, and connect multiple monitors, and there's just no display. So the goal of my project was to make the DisplayPort driver DP compliant and upstream the solution.
So let's start with the basics. So what happens when you connect a DP cable? So you have the DP source on one end, which is your PC and the DP monitor on the other end. So when you connect it with the DisplayPort cable, the first thing that happens is
the hot plug detects signal that the sync device sends to the source device. It's just an interrupt signal saying, hey, there's a new connection. After that, the source is going to initiate the DisplayPort configuration data register read-writes, through which it's going to start reading the capability supported by the monitor.
It will know the resolution supported, the link parameters that are supported by the sync device. Once it has negotiated these parameters between the source and the sync device, then it's ready to encode the data and start sending it on the actual cable.
The very first thing that happens when you connect the DisplayPort cable is this negotiation sequence between the source and the sync device, which is called as the DisplayPort link training. So, yeah, the first signal is the hot plug detect signal, then the source will start the link training. First, it has to go through the clock recovery sequence,
where it sends the known training data patterns to kind of lock the clocking information on the DisplayPort receiver end. Then it's going to send another known training pattern sequence, but this time it's going to send it
with the known skew between the two lanes. So this data is used on the other end to find out the interlane alignment and get the mapping of how the data symbols are sent between the two lanes on the DisplayPort cable. After these two sequences succeed, the source and the sync device kind of lock
a particular link rate and lane count, and then they can start sending the data at that specific link rate and lane count. And we say that the link is ready, and that's when it will start sending the data and you see the display on the monitor. So what was our plan for testing
the DisplayPort compliance? So the plan was to use the third-party device, DPR120, which is certified by VESA, to run the exhaustive test suite, the compliance test suite, which makes sure that the driver is compliant with the specification.
So it basically acts as a reference sync device. You connect the device under test, the laptop, to the DisplayPort input of DPR120, and the DPR120 is going to say, okay, display this specific data pattern at this resolution. And the laptop will start sending the data,
DPR120 taps the information on the DisplayPort cable, and it's going to compare that with the reference, CRC values, reference data patterns, and if it matches, then yes, it passes the test. So a lot of these tests here try to verify that link training is going to pass in different scenarios.
Like, for example, if the first phase of clock recovery fails, then the DPR120 actually is going to try to induce that failure and see if the device can recover from that and still send the data after the specified number of retries,
specified by the DisplayPort spec. So the goal was to run all these tests and make sure that the driver was actually passing all those link training tests, and it was able to recover from the failures. So going back to the basics, right? So what was the existing state
of the atomic kernel mode setting? So what does it do when you connect the DisplayPort cable? The user space is going to form the list of parameters, the properties, and send that to the kernel. Then the first step that the kernel does is forms the state of the device.
It forms the state for the different DRM mode objects depending on the requested mode. This is the atomic check phase, and in this phase, it's going to try and validate the mode that is being requested. So in this phase, it will, let's say, get the 4K mode, and it's going to see whether that mode is going to be supported by the hardware,
by the available clock, by the link parameters of the DisplayPort cable. If it can actually support that resolution, then it will go to the commit phase. And in this phase, it's going to take all the data and write it to the hardware registers. Since this is the phase where it does all the writes
or the hardware update, this is where the link training is going to happen because it has to actually send the data symbols. And that's when you're going to start seeing the display on the monitor. So I thought, okay, the atomic mode setting is validating the state in the atomic check phase, and it's then going to write everything to the hardware.
So because it's already validated, it should actually work, and everything should just work fine. So I thought, okay, yes, problem is solved, but did we actually fix the black screens at this point?
So yeah, I ran the test suite, and yes, sure, when DPR120 tried to introduce the link training failures, the driver was not able to recover from the link training failures, and it was just a black screen. So let's see what the problem was. So just a simple scenario again.
You have the sync device connected to the source device. It sends the hot plug detect signal. Let's say it requests this mode at 60 hertz. So the first phase is going to be the kernel is going to set up the CRTC, set up the pipe for a specific configuration.
It will start with the optimum link rate and lane count at this point, and see if that supports the requested mode, and configure the pipe for that specific link rate and lane count. But it hasn't actually sent any symbols on the cable, so it hasn't validated whether it's going to be able
to send those symbols on the cable, if it's going to work or not. That's going to happen in the commit phase, where it actually sends those training pattern symbols, and it goes through clock recovery and channel equalization. And what happens if at that point the link training fails? And that's when you see a black screen.
So the existing state of the driver at that time, it would just get a black screen, and the D message logs would just say, okay, error, link training failed. But there was literally no information going back to the user space, or the kernel wasn't doing anything to recover from this mode failure. And that's when we were getting those black screens.
So what did we do? How was this fixed in the driver to actually be able to recover from such failures, which happened only at the last stage of the atomic kernel mode setting? So we were here, it requested a specific mode,
and let's say, first it tried to link train at the maximum link rate and lane count for 5.4 gigabits per second and four lanes, and that failed. So once that fails, so we introduced a new property for the DRM connector, which would indicate
the status of that specific link. So at that point, the kernel knows that link training failed at 5.4 gigabits per second, four lanes, so it's going to immediately fall back to the next lower values of the link rate, which is HBR, which is 2.7 gigabits per second,
and four lanes. So first it will go down, fall back the link rate, and try again. And it keeps doing that until basically the link training succeeds. So it falls back, and at the same time, it's going to set the link status property to bad and then send a U event notification to the user
space saying that, OK, something went wrong in the configuration, and that specific requested mode did not work. So at that point, user space retries the mode set at the same resolution first, but then it might happen that because we fell back to the lower link rate and lane count values,
it might prune the mode. So at that point, the user space has to get the connector information again, get the new modes, and then try the mode set at the next available, next lower resolution. So at this time, the kernel gets the mode set request. It's going to go through the same phases, but now this time, it's training at the lower values.
And the link training is hopefully going to succeed, and we get a successful mode. So this actually happens for all the combinations it does for the link rates all the way to RBR, 1.62 gigabits per second. If that doesn't succeed, then it starts reducing the lane count, and it
goes all the way to one lane. So it basically tries all the combinations and keeps reducing the resolution until it can show something on the screen, which is even if it's really small resolution, it's better than the black screen. So why was this needed? So the basic loophole here in the atomic kernel mode setting
was that the failure is always an option. And it was assumed that the atomic commit phase will always pass, and the mode was always guaranteed at that point. But with link training, when we are dealing with the actual hardware, it guarantees the requested mode, yes.
But at that point, it's only checking against the GPU parameters, but not the actual physical cable. So link training can still fail, and the atomic commit can still fail. So we do need to handle this case. So this link training failure is asynchronous, because the link might be working and ready
in your transmitting symbols, but it can fail at some point when it's up and running, displaying something. So you need a way to asynchronously send some notification to the user space at any time it detects that the link is not working correctly.
Also, it helps because atomic allows the non-blocking commits. So what that means is it's going to do the mode set and return the control to the user space, but we don't know whether the atomic commit has completed. So the user space still doesn't know if it has successfully been able to display something,
and the atomic commit phase is successful. So in that case, also, we need some way to asynchronously notify this to the user space. So this is how we basically tested the entire stack and made sure that the newly introduced property
and the way of handling the atomic failures was actually validated through the entire stack, and the whole stack was able to recover from this mode set failure. So we use the DPR120 device to induce those failures
in the link training. So first, it sends the long pulse. It requests a specific mode. The kernel at that point is going to validate the mode link train at a certain rate. Once it fails, it's going to fall back, set the link status property to bad, and send the U event. At that time, we did make changes
to both the XF86 video Intel and the mode setting driver to actually handle this newly introduced property. So it keeps looking at the link status property, and as soon as it sees that it's bad, then at that time, it's going to request a new mode set at the same resolution.
But like I said, that resolution might actually get pruned, because now the link parameters have changed, and it might not support that same resolution. So it sends the XRANDAR event up to the desktop environment. And Martin actually wrote this app, AutoRANDAR, to constantly listen to those RANDAR events.
And so every time it gets that event, it's going to re-probe the connector, get the mode information, and then redo the mode set at the next available resolution. So this was the way we were able to test it across the stack.
All the changes are upstreamed for i9-15 as well as the X server, but I still need to connect with the Gnome-KD folks to do these changes in the desktop environment, because otherwise the whole stack is not going to be compliant at this time.
I also wrote a tool. It's upstreamed in the Intel GPU tools now. So it's a DP compliance tool for fully automating this compliance testing. It does need having DPR120 connected to the DUT,
but after that, it just gets the test request, handles the test request from the DPR120, runs the entire suite. And it's going to keep the log of what tests are passed and what failed. So that was a good way for us to do our pre-merge
and post-merge testing at this point. But one of the next steps, future steps for us, is to move this whole testing infrastructure to the open source hardware, Google's Chameleon board. With that, the idea is we will be able to test it with a lot more corner
cases with different types of external displays. And so we'll replace the DPR120 with Google Chameleon board. We would have to move the entire compliance test suite onto the Chameleon board, which is going to be a big effort, because it
has to be certified by VESA to say that, OK, this is a reliable test suite that you can run and say that the driver is DisplayPort compliant. And then eventually, the goal is once we are able to move everything to the Chameleon boards, we can have that as part of our CI system so that for every DisplayPort,
every laptop with a DisplayPort, we have this connected. And we always run the compliance suite. So anything that any patch that gets submitted, it will run against the suite. And we'll make sure that it's not breaking the link training.
So that's all I had. And this was definitely not a one-man project. I wouldn't have been able to finish this without help from the community, from reviews from everyone. So yeah, thanks to everyone for reviews and suggestions.
Thank you. Questions? Any questions? So what Linux versions do you talk about here? So for 15?
So what version does have the fixes? Oh, so it's in the 4.12 and XR work, yeah. I didn't read the answer. Yep. What's the real answer to this? So it is not there on Valent. So that's one of the goals to actually scale the solution to Valent and even on Android Surface
Plinger. Because until then, this property is going to be there. But if the user space does not look at the property and does not handle it, then it's still going to basically fail at that time.
So yeah, I'm trying to reach out to other user space communities and help out to scale this feature.
So far, we have just ordered the Google Chameleon boards. We have the budget for that, right? Yes. But no resources are allocated to that.
So yeah, it's going to take a while. But we will work on that in 2018. And again, I would need help from the community. I know Lyud Paul from Red Hat, she's done a lot of work with testing DPMST and other corner
cases with hot plugs using the Chameleon board. So I'm going to reach out to her and yeah. Yeah, anybody can order it. I think it's worth it.
And replacing that, the DPR120 with Chameleon boards is way cheaper because DPR120, the licensing fees are ridiculous, $5,000 or something. So we definitely can't have more than two DPR120s.
So with DPR120, there's no way we can integrate that with CI. So we would need a solution on Chameleon boards. DP 1.2 and 1.3, we haven't tested DP 1.4 yet.
So DPR120, they have added new tests for DP 1.4 support. They are adding. But we haven't ran that compliance test suite yet.
So they didn't have hardware RCTS support? Yes, yes. Yeah. For the newer platforms too. OK? Thank you.