Setting up OpenQA testing for GNOME
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61904 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023430 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Software testingCodeIntegrated development environmentAsynchronous Transfer ModeAndroid (robot)Game theorySoftware maintenanceModule (mathematics)DisintegrationVirtual machineDistribution (mathematics)Process (computing)CollaborationismSoftware repositoryLinear regressionMessage passingPhysical systemContinuous functionMereologyInformation securityComputer hardwareSoftware developerHand fanSoftware developerSoftware testingExecution unitRevision controlLebesgue integrationTesselationGoodness of fitWindowDistribution (mathematics)Operating systemCodeSoftwareSoftware maintenanceSource codeSoftware repositoryModule (mathematics)Multiplication signData structureRule of inferenceQuicksortSelf-organizationProjective planeInformation securityBranch (computer science)Cohesion (computer science)MathematicsPoint (geometry)Linear regressionIntegrated development environmentSoftware bugOpen setDiagramExistenceConfiguration spaceOnline helpVirtual machineComplex (psychology)Public key certificateInformation technology consultingConnectivity (graph theory)Group actionDiagramComputer animation
05:20
BuildingComputer-generated imageryComponent-based software engineeringSoftware testingLibrary (computing)Interface (computing)Device driverOpen setClient (computing)CodeDerivation (linguistics)LoginVideo gameNormed vector spaceDigital filterRevision controlData typeGroup actionInformationWikiRepository (publishing)Data storage deviceSoftwareScripting languageIntrusion detection systemSoftware repositoryMeta elementInformation securityError messageWebsiteMoment (mathematics)Computer networkWeb pageComputerProxy serverFirewall (computing)Game theoryNachlauf <Strömungsmechanik>Core dumpComputer clusterInheritance (object-oriented programming)SummierbarkeitVirtual machineMaxima and minimaConfiguration spaceStack (abstract data type)Turing testBlogProduct (business)Mountain passProcess (computing)Software testingWeb pageWeb 2.0Queue (abstract data type)BitFlagSound effectComputer configurationEmulatorSinguläres IntegralComputer hardwareArmBranch (computer science)Endliche ModelltheorieMedical imagingLibrary (computing)Core dumpNumberDevice driverMeta elementInternetworkingSoftware repositoryInstallation artUser interfaceBuildingWebsiteResultantTraffic reportingSerial portVideo game consoleSoftware bugConnectivity (graph theory)Structural loadPairwise comparisonFront and back endsProjective planeLogic gateVirtual machineMobile appMoment (mathematics)MultiplicationMatching (graph theory)Instance (computer science)LoginComputer animation
12:25
Mobile appError messageWeb pageComputer networkComputerProxy serverComputerFirewall (computing)Term (mathematics)Gamma functionSummierbarkeitText editorClique-widthSoftware testingUsabilityInstallation artTouchscreenTuring testCodeLibrary (computing)Software testingMatching (graph theory)Module (mathematics)Time zoneExclusive orCASE <Informatik>Multiplication signComputer iconCD-ROMDifferent (Kate Ryan album)Thresholding (image processing)Library (computing)Functional (mathematics)MereologyAreaCartesian coordinate systemPixelBitLoginMathematicsDevice driverElectronic mailing listRevision controlArithmetic meanUnit testingPoint (geometry)Right angleOpen sourceVirtual machineWindowGoodness of fitProcess (computing)Image registrationSoftwareKeyboard shortcutMobile appMetadataCore dumpComputer fileWeb 2.0Message passingSinguläres IntegralSpacetimeNumberLebesgue integrationComputer animation
19:29
Software testingCodeLibrary (computing)Suite (music)BlogMessage passingContent (media)Module (mathematics)BuildingMultimediaContinuous functionSoftware repositoryEmulationPetri netNichtkommutative Jordan-AlgebraTwitterControl flowComputer hardwareSource codeSoftware maintenanceMatrix (mathematics)Online chatInternet forumEmailProcess (computing)1 (number)Translation (relic)Multiplication signSoftware maintenanceContrast (vision)Mobile appOnline helpSoftware testingSuite (music)Integrated development environmentTouch typingSinguläres IntegralPoint (geometry)Software developerSet (mathematics)Software bugInformationMathematicsError messageGame controllerWeb 2.0Moment (mathematics)QuadricPhysical systemCodeProgramming languageVirtual machineVideo game consolePosition operatorDivisorBus (computing)Computer programmingSerial portBootingComputer hardwareCuboidMedical imagingKeyboard shortcutRight angleMoving averageDifferent (Kate Ryan album)Latent heatComputer animation
26:34
Internet forumSoftware testingSoftware maintenanceEmailMatrix (mathematics)Program flowchart
Transcript: English(auto-generated)
00:09
All right, well, welcome to the next talk. I'm going to be talking about open QA testing of a pretty complex graphical desktop environment.
00:21
So, I'm an operating systems developer. I've been involved in GNOME for a long time, possibly too long, and I've also been involved in CodeThink for maybe 10 years off and on. We're like a consultancy firm based in Manchester, and we work a lot with the automotive industry, helping them with testing.
00:42
So that's how we got an interest in open QA, and that led on to research of trying to set it up for GNOME OS as well. It should be, but maybe it's not. It is green, but, yeah, can nobody hear me?
01:02
Okay, so there's no room speakers. All right, I'll try and talk a bit louder. So, GNOME is a desktop environment. How many GNOME users do we have in the room? Quite a few. KDE users? Nice. Other desktop environments? Tiling window managers, etc. Quite an even mix, actually.
01:23
Everybody welcome. So, GNOME is quite an old project, right? GNOME predates FOSDEM, Git, it's older than Greta Thunberg. It's older than some of its contributors. It's older than Myspace, and this leads to some sort of technical challenges that have built up over the years.
01:42
So, the GNOME designers design a cohesive experience of everything working together, but then we release more than 200 individual modules as tarballs, and distributions get to integrate those back together to produce something that hopefully works. So, it's difficult to test those 200 modules. It's difficult to test what we release.
02:03
And maybe you've heard of Conway's law, the rule that a project source code will mirror the structure of the organization that makes it. So, this is a rough diagram of how GNOME development works. Most of the work is done by module teams who focus kind of on individual modules or one or two modules.
02:23
So, things are tested well in isolation, and then the release team tries to build everything, and they get to the point of like, okay, everything builds, so we'll release it, and they give this to packages, we give it to users. So, the question is, which of these groups are responsible for integration testing?
02:44
The maintainers are working on isolated components, the release team are very busy, and distro developers are also very busy. So, certainly when the project started, the users were responsible for integration testing. You got to use Linux for free, and you got to report bugs if it broke.
03:05
I mean, would give the works on my machine certificate. And you get a lot of crazy bugs at integration time, like, oh, this feature doesn't work because it turns out the code isn't broken, but you pass the wrong configure flags, so you don't get the feature that you wanted.
03:20
Time has passed since then. There have been lots of development. Git, GitLab, GitLab was a huge help for GNOME, and it means that we can now do CI quite easily. So, the situation now looks kind of like this. Module maintainers generally have unit tests, and we'll check that the module works in isolation.
03:42
The release team have an integration repo that says these are the versions of the components that make up this GNOME release. So, we know what we're releasing. And distributions have started doing some downstream testing as well. At least some distributions have. There's a lot of good work going on testing that the released software is good.
04:01
But there's still a gap, because from landing a commit into the main branch of your module to actually having integration testing run by a distribution, there could be months. There could be months from you making that change to someone cutting a beta release and actually testing it. So, there's still a lot of time for problems to appear. So, the question we tried to answer
04:23
over the last sort of 10 years within the GNOME project is, what if we built our own distro just for testing? And so we did. It was a long job, but GNOME OS exists. Lots of people worked on this over the 10 years, and it exists specifically for testing. So, some people say, can I use it? And well, well you can, but it's designed to be broken, right?
04:45
So, don't use it unless you want something that breaks every day, has no security updates, and doesn't support most hardware. But what it is good for is testing the up-to-date latest in-development version, and for seeing how new designs might work as they're being developed.
05:05
And a goal was always automated regression testing, but that's kind of been the last piece of the puzzle. And that's the thing I'm showing off today, is that we now have automated regression testing of GNOME OS. You can get it from here if you want, like I say, and they use it for testing.
05:23
And it works for manual testing, but it's quite boring, right? People don't spend their weekends going, oh, I think I'll download this image and, you know, just test and report bugs. And it's not quite suitable for pre-merge gating yet, because it takes hours to build the image. So, we can't gate every merge request and say, well, the OpenQA tests have
05:43
to pass, because it can take hours before the new OS image is built, right? So, what we're doing at the moment is we've set up an OpenQA instance. So, OpenQA, I haven't introduced OpenQA yet. OpenQA is a test tool developed by Suse.
06:02
How many people are familiar with OpenQA, actually? We're in the testing room, so hopefully some people are. Maybe half the room. Okay, well, I'm going to do kind of a deep dive into how it's set up for GNOME and how it works. There are three components. The web interface is the thing you look at, and this is called OpenQA.
06:23
The thing that actually does the work is a test driver called OS autoinst. It's a less catchy name, and that has, it supports multiple backends, right? So, in GNOME, we use the QEMU backend, but you can also use backends that run on real hardware.
06:40
I think some distros are doing this. Some of the CodeThink projects use this. In GNOME, we only use emulation at the moment, because it's kind of the simplest option. And then, we have a library of tests. So, actually, most of the fun stuff in OpenQA lives in OpenSUSE's test repo. And when we want to do more advanced stuff for the GNOME test, we go in there and copy stuff out of it,
07:04
and use it sort of like a library in the traditional sense of something that we copy from. There are some built-in utilities as well, but a lot of the good stuff is in the OpenSUSE tests. Lots of people are using it these days. SUSE, of course, having invented it.
07:21
Fedora is using it. I found an article about EuroLinux, which are using it. Various car companies are using it. Maybe you were using it. Hopefully, you will do after this talk. CodeThink is also using it to test Linux kernel master branches on some ARM hardware using Lava as well.
07:43
That's a whole separate talk, which I'm not going to go into. But if you're interested, find someone with this T-shirt on, and they can talk about it. So, let's be adventurous, right? Here's a screenshot of OpenQA, but hopefully, the Wi-Fi is going to work. And I can show you the real thing. Here's the front page of the Gnome OpenQA website.
08:03
And it doesn't tell you much, right? Actually, we don't use this front page. We go via GitLab. So, here's the Gnome build meta repo. And this defines what goes into Gnome.
08:21
And this has a CI pipeline set up. Ah, the internet is not working. Let's see. Aha, you got me, FOSDEM Wi-Fi. Okay, so let's go back to the screenshots. I did anticipate this happening. Here's some CI pipelines that I prepared earlier. These are the tests running on master.
08:41
And these do various things. You know, they build all the components using a build tool called BuildStream. The interesting one for our purposes is a job called S3 image. So, this builds an ISO installer and pushes it to Amazon S3, which is a good place to store these kind of 5GB ISO images.
09:02
And then, we have another job called Test S3 image. And that's the fun one, right? That goes to S3, downloads the image and runs the OpenQA tests. There's a long explanation of how it works. But actually, I'm going to see if I can show you the job log.
09:22
I did load one earlier. No, I can't load the job log either. So, I'm going to show you the long explanation of how it works. In brief, the design of OpenQA initially was that you'd have like a separate machine or a farm of machines. And the test would run on one of those machines.
09:43
So, that's a perfectly fine model. But it involves maintaining quite a lot of infrastructure. And we're trying to do this in the easiest possible way, right? Because we don't have a big team working on this. So, we kind of inverted the design and we use the GitLab runner as the worker.
10:01
So, the GitLab runner uses the OpenQA worker container image. It calls the OpenQA web UI and says, hi, I'm a new machine, send me your job. And then, it queues a job and adds a flag saying, oh, by the way, this can only run on the machine that I just created. So, the effect is then the GitLab runner becomes the OpenQA worker,
10:24
runs OS autoinst and runs the tests and communicates the results back to the OpenQA web UI. It's maybe a little unsupported, but actually is working quite nicely and there's just a couple of caveats in the web UI from doing things that way. But it means we only have one big build server, which
10:42
is configured as a GitLab runner and we don't have any other infrastructure apart from the web UI, which is fairly simple to maintain. So, that's why we do it that way. And now I'm going to go through what you can see in the web UI. First, I'm going to drink some water, actually.
11:05
I've got a lot of talking to do today. So, each test run gets an ID. This is, I can't see the ID, but it'll be some long number. And we have one long test job, which tests everything we care about.
11:21
So, actually, I think this one I loaded. Yeah. So, here's the real thing. And we test all the way from taking the OS image on a bare machine, running the installer. Can I open that? No, I can't open that. You'll have to look at the tiny screenshots. Running the initial setup.
11:41
This is the GNOME initial setup. We poke at the serial console a little bit once we've created a user. So, once we've created a user account, we can log in over serial and we just enable journal logging to the serial console to make things a bit easier to debug. And then the fun starts. We start poking around at the desktop and we run each of the GNOME core apps.
12:08
And at the moment, we just check that it starts and then it looks the same as it did the day before. So, the core of OpenQA is doing screenshot matching. And it has some tools for making that a little bit nicer than it would be if it was just pixel-by-pixel comparisons.
12:26
But the core of it is screenshot matching. And so, we have a screenshot of each app and we say, this is how it should look. And as long as it looks the same or sort of within 95% the same, then the test passes. And if it looks different, then the test fails.
12:41
This one, I guess you can't see, but this one has failed because a pop-up has appeared over the top. Which is pretty annoying and one of the things that we still need to sort out. Most of these have passed. This one has failed because the icons change size slightly. So again, the image match is maybe 95% and the threshold is 96%, so it hasn't quite passed.
13:08
In most cases, the solution to these failures is just update the screenshot. And that's quite an easy process. Let me show you how. Well, this is a gallery of some tests for you to closer up.
13:22
When you click on one of the screenshots, you get to see like the before and after or rather the golden screenshot and the real screenshot. And you can drag this slider across and go, OK, this is, you know, here's the difference. These areas are the actual match zones which are defined in the screenshot.
13:41
So OpenQA calls these needles. A needle is like a screenshot plus a metadata. And we define zones that we want to match against. And he uses OpenCV to do the matching. So that's what this percentage means. It's saying, you know, it's 99% the same. A cool thing about using OpenCV is that it can move around the screen, right?
14:04
So this window might have popped up in a different place. But OpenQA would, you know, if the match was over here, 20 pixels to the right, it would detect that. And the test would still pass. So that's that's pretty useful.
14:21
And you can also lower this threshold. The manual says don't lower the threshold below 90%, I guess, because maybe anything will pass at that point. I haven't I haven't played with it too much. I tend to go with between, you know, 95 and 100%. Your tests can input text. So here we're creating a user.
14:40
All this is done by the QEMU API. So it's simulating a real mouse and a real keyboard to do this. It's going really through the whole, you know, stack, the whole software stack from the kernel through the graphics drivers right into everything in user space into GNOME. So the ultimate in integration testing. Here's some more screenshots of needles.
15:02
This is an exclusion. So I excluded the version number so that when we bump the version number, the tests don't fail. This is the needle editor, right? So the web UI lets you edit these needles. They're stored in a git repo, but not everyone wants to dig around and get repo.
15:20
So there's also a web UI to edit them. And you can drag and drop. This is a screenshot, but the green is like, let's match this area. And the brown is an exclusion because this is a process list, right? So it's going to be different every time. So we exclude that from the match. When a test fails because the screenshots changed and you want to update the screenshot, which is a really common case.
15:43
You can go in here, change the screenshot, use the existing matches, and then you can commit to your changes under a new name to the needles git repo. So it's like a two click process. It's pretty straightforward.
16:01
And here's the actual needles repo, right? So it's nothing too complicated. Each needle is a PNG file and some JSON metadata. And here's a really simple example of what the JSON metadata looks like. This has one match area and it has a tag. So the important thing here is the tag. In your tests, you would say, assert screenshot app baobab home, and it'll validate any needle that has that tag.
16:30
So you build up this collection, like maybe version 40, it looks like this, and then maybe version 42, the design changes. So you make a new needle with the same tag, same tag.
16:42
And OpenQA will now accept either of those needles. So the old design would still pass. And if your application randomly regresses to the old design, actually, it wouldn't catch that case unless you've deleted the old test. Seems kind of limiting, but actually OpenSUSE have built an enormous library of tests using this method.
17:01
So I trust that it works well. I think some people have actually improved on that, on one of the CodeThing projects, but I don't know the details. And the last thing I wanted to mention was the tests. So this is the fun part. You get to write your tests in Perl.
17:20
It's like a trip back in time, but it's not super complicated. I don't know much about Perl, but I can figure out most of what's going on. This is the main entry point. So we import a couple of helpers, we set some constants, please don't steal my password, but this is a constant that we reuse.
17:41
And then we load each of these is a specific test and it's its own Perl module. So then if we go look at one of those, here's a test case. And the meat of the tests is calling these functions. So assert that this needle matched with a timeout, and then we click on an area that's defined in the needle.
18:06
So in this case, it's the next button and that area is defined in the needle. We can also, we eject the CD-ROM after the install and then we reset the machine. So a lot of this is building up libraries of useful functions and calling them.
18:24
So your tests end up being quite readable. All right, I've got a couple more things and then I'm going to open up to questions. Things that I've learned, OpenQA is very good, is probably the best thing to do this that's open source and available. So use it, contribute to it.
18:40
On the other hand, don't go crazy with it because you will, you know, go slow and try and test integration. You know, don't do unit tests in there. The main documentation doesn't list the actual API that the tests use. So look here, look for the test API docs and also look at OpenSUSE's tests, which have loads of examples of good things that you can copy from.
19:06
You don't have to run the tests in CI. When you're developing tests, you can run them locally. It's a little bit of a faff. You have to work out the right container command line, but you can run the container locally with OS autoinst and you can then see the logs locally and iterate much faster than having to push your changes to CI every time.
19:24
That's a great help. Uh, I've messed up the numbering slightly. I guess you can follow it. There's a couple of errors that only appear in the logs, so the web UI is great, but occasionally you see like a 0% match and you're like, but these are the same. But it turned out there was something invalid in the needle and you get a 0% match, but it's only reported in the logs, so don't lose time to that.
19:45
I've lost probably hours collectively to forgetting this and then remembering it again. Also, the upstream containers are useful and they're usable, but it's a very rolling release process, so you probably want to pin a specific version and update it when you're ready to deal with the new changes.
20:01
Don't just pull the latest one every time. Um, so within GNOME, this is kind of ready, it's working, but it has a bus factor of one at the moment. So before we can declare it stable, we need more people to get involved, both maintaining the infrastructure, maintaining the tests.
20:20
And these are some credits of people that have worked on this over the last 10 years. Apologies if I missed anyone, but I wanted to make it clear this is not something that I've done myself in my free time. I'm really adding the finishing touches to a huge amount of work that's taken a decade to get to this point. Um, on the topic of OpenQA, separately to GNOME, we're quite interested in it
20:44
and code think we're continuing doing the OpenQA lava testing of the Linux kernel. We've written a small tool to control hardware from within OpenQA tests, and we've built this USB switcher, which if you have lots of test rigs and lots of USB hardware connected to them, it's very useful.
21:03
If you want to see it, then they have some real ones over there. Find these people and they can tell you about how great it is. Um, so that's everything for me, please, you know, if you want to get involved with the GNOME initiative, it's a good time. I'll help you to learn, um, everything about OpenQA you ever wanted to know and more,
21:25
you can get involved on the discourse or on matrix, and I'm going to leave it there. I think we have a few minutes for questions.
21:42
Uh, we've got one here already. Okay. The question is how many bugs has this caught and how many false positives? It's caught some real bugs. Um, we're not testing a huge amount of stuff really at the moment. I mean, we're just testing that every app starts that can already
22:01
catch some interesting breakages because in GNOME we use 12 different programming languages. So if any of the bindings break, then one of the apps will stop working. But it's found at least two known like real bugs where I reported it to the app maintainer and they said, oh wow, yeah, that's broken. And I don't know, probably 20 or 30 false positives in a sense of the test has failed.
22:24
I've had to go in, update the screenshot and the test pass again. Um, but that's quite an easy process. So at the moment, I'd say it's worth the effort. We're still going to evaluate over time if it's really worth the effort. But you know, I think, I think it's promising that it's, that it's, as long as we keep the test suite small and we don't
22:42
have to keep updating all of the like a million screenshots every time they change the desktop background, then I think it's going to be useful. Um, one at the back first.
23:04
That's a really good idea. Yeah. So the question was testing other locales like Chinese or German or I hear Turkish is always a fun one. Like you, you put in the Turkish translation and things always break. Um, it's not an immediate plan, but it definitely would be good to do that. And another thing is testing themes. Like we have a high contrast theme and a large text theme for accessibility
23:22
and those often also break because they're not widely used, but they're very important. So yeah, in the distant future, we want to do that. Uh, one here. Have you thought about some process where usually when maintainers of apps do visual changes, they usually know that they are doing visual changes, some process.
23:46
But they can add some, some comment in the CI or like some process which that information can be injected into open QA. Yeah. So the question is if we can get app developers to, um, notify the tests of changes somehow.
24:02
And yeah, I think the solution is to get app developers interested in actually using this. Um, so my goal is to have the developers of these apps actually finding the test useful, um, maintaining their own small set of tests and then yeah, they will know at that point. Okay, I just changed this. So obviously we're going to have to update the tests.
24:22
Uh, it's certainly not going to scale if it's just always me doing it. So hopefully at the GNOME conference quadric this year, I can get everyone excited about this. Uh, we have one more question over here. Some, uh, I was like, yeah, an actual bug in an app that's only like happening and you give them
24:46
the reproducer, like how easy is it for a developer to reproduce the environment? Okay. Yeah. Good question. So the question is how easy is it to reproduce this environment? It's actually quite easy because it's GNOME OS.
25:00
So the developer can go to os.gnome.org. They can download a VM image and run it in GNOME boxes and so they can boot the virtual machine and it's exactly the same code, like right down to the kernel and system D and everything. So it's a, you know, it's a, it's an effort. They have to download this image, but they get exactly the same environment and they can reproduce exactly the bug.
25:25
Most of them don't. They just install into slash user and try and reproduce it there, but it's possible to do it this way. Um, I think from what we saw, OpenQA is focused on like a visual testing of comparing screenshots.
25:40
Is there a way to mix that or perhaps only do like headless testing, testing settings with using commands from the CLI or from console? Yeah, that's, that's a good question. So the question is if it can do more than just screenshot testing. Um, it can. I mean, we, we got a serial console so we can run arbitrary commands.
26:01
I think if we were going to do that, we wouldn't write the tests in Perl in OpenQA. What we'd probably do is write the tests in Python or in C, inject them into the OS image, and then we just run the test program over the serial console and check that it outputs pass. So it's definitely possible. And I think that's, that's how we would do it.
26:22
So how are we for time? No more time. Okay, well, thanks everyone for watching. Very much appreciate it.