Linux Kernel Functional Testing
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61883 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Meta elementOperations support systemSoftware developerPrincipal idealHypermediaRootSuite (music)Functional (mathematics)Statistical hypothesis testingConvolutionLinear regressionState transition systemAndroid (robot)Statistical hypothesis testingArchitecturePhysical systemNumberRevision controlComputer architectureBuildingPoint cloudFingerprintScale (map)Portable communications deviceHexagonKeyboard shortcutDirectory serviceComputer-generated imageryCache (computing)Source codeEmailModul <Datentyp>Software bugVirtualizationCombinational logicDifferent (Kate Ryan album)Physical systemCASE <Informatik>Set (mathematics)Point (geometry)Real numberNumberProcess (computing)State transition systemProduct (business)ResultantAndroid (robot)Computer configurationService (economics)Configuration spaceLinear regressionRight angleBitConvolutionPoint cloudStatistical hypothesis testingFingerprintSoftwareBuildingChainScaling (geometry)Projective planeComputer architectureTraffic reportingOpen sourceCartesian coordinate systemStatistical hypothesis testingComputer hardwareReal-time operating systemSoftware developerArmBootingMathematicsMechanism designPlanningLatent heatStapeldateiHydraulic jumpView (database)Principal idealNetwork topologyScheduling (computing)Virtual machineMultiplication1 (number)Revision controlPeripheralSpeech synthesisMultiplication signBranch (computer science)Computer animation
08:16
Keyboard shortcutDirectory serviceComputer-generated imageryCache (computing)Source codeEmailModul <Datentyp>ConvolutionRevision controlChainArchitectureRepository (publishing)Virtual realityStatistical hypothesis testingSuite (music)Point cloudScale (map)Portable communications deviceSPARCCrash (computing)Function (mathematics)ParsingRootDefault (computer science)BuildingLinear regressionCodeComputer hardwareStatistical hypothesis testingSoftwarePhysical systemType theoryComputer architectureAutomationBootingTelnetLocal ringWhiteboardDynamic Host Configuration ProtocolComputer networkConcurrency (computer science)Public key certificateDomain nameClient (computing)PeripheralBootingMultiplication signStatistical hypothesis testingComputer fileBefehlsprozessorSoftware developerComputer hardwareRun time (program lifecycle phase)ResultantFile systemComputer architectureSoftware bugProcess (computing)Instance (computer science)Physical systemType theorySoftware as a serviceLine (geometry)MultiplicationCASE <Informatik>VirtualizationWritingConvolutionSoftwareData storage deviceClassical physicsRevision controlCartesian coordinate systemDifferent (Kate Ryan album)BuildingWhiteboardArmFirmwareInternet der DingeLatent heatServer (computing)Level (video gaming)Link (knot theory)Commitment schemeComputer architectureDevice driverReal numberValidity (statistics)Module (mathematics)ChainRootScaling (geometry)Point cloudScheduling (computing)Directory serviceAbstractionDiagramMereologyCuboidMetadataStatistical hypothesis testingLinear regressionUtility softwareControl flowVirtual machineEqualiser (mathematics)Projective plane
16:21
Concurrency (computer science)Cache (computing)Domain namePublic key certificateClient (computing)Server (computing)Error messageContent (media)InternetworkingComputer networkStability theorySoftwareLink (knot theory)Statistical hypothesis testingMeasurementInstance (computer science)Web pagePoint cloudSoftware developerStatistical hypothesis testingComputer fileSoftware bugResultantSuite (music)Process (computing)Binary codeMultiplicationState transition systemDifferent (Kate Ryan album)CASE <Informatik>HoaxMultiplication signSoftware maintenanceCodeConvolutionCache (computing)SoftwareInternetworkingProjective planeClient (computing)Instance (computer science)Traffic reportingRevision controlPublic key certificateServer (computing)Key (cryptography)Streaming mediaThermal expansionCategory of beingLocal area networkVirtual machineGraph (mathematics)Uniform resource locatorConnected spaceDirection (geometry)Statistical hypothesis testingType theory
24:27
Program flowchart
Transcript: English(auto-generated)
00:05
Welcome to this session about LKFT, the Linux Canal Functional Testing Project. My name is Rémidi Rafour. I'm a principal technique at Linao. I've been working on open source projects since 2007 and I've been the lava architect and main developer for eight years now, so quite some time now.
00:26
So I will speak today about LKFT because it's a project I'm working with. So what is LKFT? So the goal of LKFT is to improve the Linux kernel quality on the ARM architecture by performing regression testing and reporting on selected Linux kernel branches and the
00:44
Android command kernel in real time. That's what is written on the website. So it's a project that is led by Linao. The goal is to build and test a set of Linux kernel trees. So we care mainly about LTS trees, mainline and next.
01:00
For LTS in particular, we have a 48-hour SLA, which means that we have to provide a full report in less than 48 hours for any change on LTS. If you look at the numbers for 2023, we tested 465 RCs. As we test mainline and next, we also built and tested 2,628 different commit versions,
01:27
which means that we built 1.6 million kernels and ran 200 million tests in a year. That's only for Linux. If you look at Android command kernel, only for the test, that's 558 million tests,
01:41
580 million tests, so VTS and CTS mainly. And this is all done by only three people. So the question is, how do we do to build that many kernels and test that many kernels with only three people? Obviously, automation. So my goal today is to show you the architecture of LTFT and to also show you the
02:01
different tools that we created and maintained to make that possible, because I'm sure that you can go back home with some of these tools and it might be useful for you. So let's look at the architecture now. So this is a really simple view. We have a set of trees in GitLab. These are just simple mirrors in GitLab of the official trees.
02:21
We just use GitLab for a scheduling mechanism. So it will pull the new changes and it will run a GitLab CI pipeline. But we won't do anything specific in GitLab CI pipeline. We won't do build or test inside it. It's too slow and costly. So we just use it for submitting a plan to our system that will do the build and
02:41
test and reporting. And at the end, we will just get a report that three engineers will look at and decide if we have to report something to the main developers or if we can just find a commit ourselves and send a batch. Let's dig in a bit now. So as I said, we don't use GitLab CI for building. We submit only from GitLab CI a build request to our system.
03:03
So for building, we created a tool which is called TextMake. I will explain the different tools later on. I'm just showing the architecture right now. So we use a tool called TextMake that allows for building the system, so with different combination of options. And we created a software as a service that allow to use TextMake at a large scale
03:26
in the cloud. So we can build something like 5,000 kernels in parallel in the cloud in some minutes. When one build is finished, so when TextMake finishes build, they are sent to a storage. It's an S3-like bucket somewhere.
03:41
And a result is sent to Squad, which is a second project that we also maintain. That will be what I like, where everything is stored. As we send results really early, if there is a build failure, a build regression, you will notice that in some minutes or hours, depending on how long the build takes. Because, for example, if you do an all-mod config build with Clang,
04:02
it will take up to one or two hours easily. But this way, we can have early regressions that we can send immediately to the main list, saying that it's failing to build on this architecture for this toolchain. That's for building. I will explain TextMake a bit later on.
04:21
So as I said, when TextMake build finishes, we send the result to Squad. We store in the storage. And we also submit multiple test runs that will be done in the cloud. So we do a test in the cloud and on physical devices. For the cloud, we have a product called TextRun that will allow it to test on
04:41
virtual devices, so QEMU and a VP. And the same, we have a system that allows it to scale in the cloud the TextRun processes. So you can spawn the same thousands of TextRun processes in parallel in the cloud. And they will send the results to Squad also. Testing in virtualization is nice.
05:02
You find a lot of bugs because you can test a lot of different combinations. But that's not enough. You have to test on real devices. That's where a second software comes in, which is Lava. That will allow it to test on real devices. So the same, when TextMake finishes to build, it will submit a set of test requests to Lava that will run on real hardware in this case.
05:23
So obviously, we run less tests on real devices and on virtual devices because we don't have enough board. It's always the single point that you're missing. And the same, results are sent to Squad. And then when everything is finished, we have a full report that we can provide to the developers. We run something like thousands of tests, thousands of builds, and
05:45
everything is working, or we found some regressions. That's the overall architecture. I will now look at the different projects so you can know if something can be useful for you. So let's look at the build part. So as I said before, we use TextMake. It's a project that we created to make building easy and
06:03
reproducible. So it's an open source command application. It allows for portable and repeatable Linux kind of builds. So for that, we use containers. We provide a set of containers with all the tools you need inside, and everything is done inside a container. So it can be reproducible from one machine to another.
06:20
So because that's often a problem when you report a build failure, it's always a nightmare to know the exact tool chain that you're using, everything. So as everything is inside a container, you can just reproduce it in another machine. So we support multiple tool chains. GCC from 8 to 12, Clang from 10 to 15. In fact, 16 has been added this week.
06:41
We also have a Clang Android version and a Clang Nightly. Clang Nightly is specific because we rebuild the Clang tool chain every night, and we push it to our system so we can just test with the latest Clang. We also support multiple target architectures, all the ARM versions, Intel EMDs, and then some MIPS, PowerPC,
07:04
RISC-V5, and some exotic ones like S390, SH4, things like that. So building is really simple. You just specify the target architecture, so x86-64 in this case. You specify the tool chain, so I want to use GCC 12. You just need to have text make installed on your computer,
07:22
because everything will then be done inside a container where you will have GCC 12 tool chain for x86-64. If you want to build with GCC 13, just change tool chain to GCC 13, and it will use another container to build it. As I said before, we have a private software that allows
07:40
to run text make a large scale in the cloud, but I'm not presenting that. It's a close-up software. So just to explain how it's working, text make will pull the right container for you. So for this specific target arch tool chain couple, it will be x86-64 GCC 12 container. We have thousands of containers, hundreds of containers.
08:01
It will create a unique build directory, so it's reproducible from one build to another. And then we just start a podman container, jump into it, and just build. We advise to use podman, obviously, and not Docker, because it will be a rootless container. So you can at least don't run as boot your build.
08:20
And then it will invoke a set of different make commands, depending on what you want to build. And then it will move everything to a specific directory that will be kept on the machine. And we'll have all the artifacts, kernel, headers, et cetera. And also a metadata.json file that will include a lot of metadata about your build, like version of your tool
08:40
chain of different utilities on the machine, the time taken by different steps, the size of everything. Really useful for debugging also what's going on, if something breaks. And yeah, we provide multiple containers that you can reuse. And it's an open-source project, so you can contribute to it a few months.
09:01
And you can just use it right now. And some kind of developer use it for reproducing builds, build failures. And in fact, as I said, we have a Clang nightly tool chain that is rebuilt nightly. It's, in fact, because the Clang project asks us to do that, because they use Tuxmake with Clang nightly for validating their Clang version against different
09:23
kernel versions to see if Clang is not regressing. That's for building. So now, how do we test? So as I said, we test on virtual devices with Tuxrun and on physical devices with Lava. So for Tuxrun, it's the same. It's an open-source, common line application.
09:41
It's the same for Tuxmake, but for running. It allows for portable and repeatable kernel tests. We support multiple devices, EVPMVA, which is an ARMv9.3 emulator, a simulator. That's the latest version that you can try for ARM. And then multiple ARM versions with multiple QEMU devices,
10:05
many ARM Intel MIPS in many different versions and PPC, et cetera, and multiple test suites, so LTP, KUnit, KselfTest, et cetera, et cetera. And adding one is quite easy to do. The same, the common line is quite simple.
10:20
We also use Spamman for containerizing everything. You specify the device that you want to use, the kernel that you want. It can be your UI, obviously, and root file system also, if you want. And again, we have a SaaS that allow to run that at large scale in the cloud. When you call that, that common line,
10:41
Tuxrun will download all the artifacts that you need. So kernel, DTB, we'll write system modules. It will inject the modules inside root file system for you, so that it will be used at boot time. And start the container, start QEMU system, so Arch64 in this case. Look at the output, et cetera, et cetera, all the classical things, and store the results.
11:03
As I said, we provide a lot of root file systems, because we know it's painful to build your root file system for multiple architectures. So we do the work for that. We use Billroot and Debian. Billroot allows to have the 19 supported architectures, one root file system for each.
11:21
And for the main one, the one supported by Debian, we do provide the Debian root file system that we build. And obviously, if you build your own one, you can use it if you want. And we will do the job of rebuilding the Billroot and Debian file systems regularly. And in fact, it's a fun thing. We've actually found bugs in QEMU when we, before pushing the new file systems,
11:44
we test in our system with the new root file systems. And the last time we did that, we found issues in QEMU 7.2 that are currently being fixed by QEMU developers. Something fun, because Tuxmec and Tuxrun has been done by the same team.
12:01
So we make the work to combine the two tools together. So obviously, doing a bisection of a build failure is quite easy. You just need a lot of CPU time. The same for a runtime issue, which is you find a regression where a test fail on a specific architecture.
12:21
For example, when you run a LTP test with on QEMU arm 64, it's failing. And you want to bisect that, so find the faulty commit. You have a good commit and a bad commit, and you want to find the faulty commit. Git allowed to help you on that. But thanks to Tuxmec and Tuxrun, we can automate all that job of testing.
12:41
So with this common line, Git will call Tuxmec on different commits to try to find the faulty one. And Tuxmec will just build. And at the end of the build, thanks to minus minus result hook, it will exec the command that is behind that will run Tuxrun with the kernel that has been just built.
13:01
So it will build with Tuxmec, and at the end, run with Tuxrun the exact LTP test that fails. And if it's passing, it will return zero. If it's failing, it will return one. So based on that, Git will be able to find the faulty commit for you, which is quite, we find a lot of regression or test regression and find the faulty commit thanks to just that common line,
13:21
which is really cool, thanks to Anders for the idea. So that was all virtual build, build in containers, test on virtual devices. But as I said before, we have to test on physical devices because multiple bugs are only found on physical devices because they are based on drivers failing,
13:42
and things like that. So for that, we use Lava, like many, some people in this room. So Lava stands for linear automated validation architecture. It's a text execution system. So it will allow it for testing software on real hardware automatically for you. So it will automatically deploy,
14:00
boot and test your software on your hardware. So it's used by kernel CI a lot, by LKFT obviously. And you can do system level testing, boot level testing. You can do bootloader also testing. You can test directly your bootloader on the firmware. And it currently supports 356 different device types.
14:20
So from IoT to phones, Raspberry Pi-like boards and servers. So multiple different device types. So for example, if you want to test on a Raspberry Pi, without Lava, you will have to pour on the board, download the artifacts, so kernel, rootFS files, DTBs, place them on a specific directory,
14:40
like NFS or TFTP directory, connect to the serial, type a lot of commands, boot the board, watch the boot outputs, type the logging prompt, et cetera, et cetera. So it's really painful to do that manually. Lava will just do exactly what I just listed, automatically for you.
15:01
It will just provide a job definition, which is a YAML file with links to all the artifacts that you want to test. You specify the kind of board that you have. So it's a Raspberry Pi 4B. And Lava will know then how to interact with that board. And you will say that, I have a U-Boot installed on it, and I have a TFTP server. Just use that and test what I want to test on it.
15:23
And Lava will do that automatically for you. Obviously you can have multiple boards attached to the same worker, and you can have multiple workers on a Lava instance. So as a user, it's really an abstraction of the hardware. And you just send a YAML file and you get results.
15:40
And all the hardware part is done automatically by Lava for you. So as I said, maybe you remember the first LKFT diagram. I'm sure you don't. There was a small box called KissCache. So when we submit jobs to Lava,
16:02
we submit multiple jobs for the same artifacts at the same time. We have multiple devices. So the scheduler will start the job for the same artifacts all at the same time. So it will download multiple times the same artifact at the same time. So we just should be able to catch that
16:20
and decrease network usage. So we tried Squid. And the short answer is Squid is not working for that use case for different reasons. So the first one is that, as I said before, all the artifacts are stored in an S3-like bucket. So it's somewhere on internet. So obviously we use SSL, HTTPS to download it.
16:41
And Squid and HTTPS are not really working well together. You have to fake SSL certificates. It's all creepy things to do. And also a thing that, as I said, we download, Lava will start all the jobs at the same time. So they will more or less download all the same artifacts at exactly the same time. And if you do that with Squid,
17:02
Squid will download, if you ask for n times the same file to Squid, if it's not already cached, Squid will download it n times. And only when one is finished, or when download is finished, the next one will use a cache version. So it's just pointless for us, just not working. So we created a tool called Keys Cache.
17:22
So Keys is for keep it simple, stupid. It's a simple and stupid caching server. It's not a proxy, it's a service, which means that it can handle HTTPS and it will only download once when you have multiple clients and it will stream to the clients while downloading. It's not transparent because it's not a proxy.
17:41
And because it's not transparent, it can do HTTPS because you will have to prefix your URL by the Keys Cache instance that you have. And you will talk to Keys Cache directly. It also automatically retries on failures because we found multiple failures that all the HTTP code that you can have when you request on an S3-like bucket, just insane.
18:03
And sometimes also you will get, the connection will finish like if everything was done correctly. And in fact, the file is not complete. It's a partial download and you don't get any errors. So Keys Cache will detect that for you. It will detect that it's a partial download and it will retry and download only the remaining things for you. And it's fully transparent as a user.
18:21
It will do that in the background and still stream your data to you. So thanks to that, we've been using it for 2.5 years now. In the graph, in green is what we serve locally from Keys Cache and in red is what we download from internet. So we downloaded 25 terabytes of data from internet
18:40
and we serve 1.3 petabytes of data in the local network which is a 52 times expansion ratio. So it's quite useful and it improves stability also. So it's really cool. It's a good tool for your CI if you don't use it already. And last but not the least,
19:00
we store all the job results in Squad. So it's software quality dashboards. It will store, it's a data lake. It will store all the results for you in different categories and it will allow you to create reports. So failures, regressions, et cetera. Everything is stored in this one
19:20
and then we extract data and make report based on Squad. And that's all. That's what I just explained. If you have any questions, I have some time for questions. Five minutes, perfect.
19:43
Oh yeah, that's good. Testing methods, like which, we use LTP, kunit, kself-test, all the kernel test suites that we,
20:04
we are not creating new test suites. We are using test suite that does exist and we run, we build for the community and we test for the community and then we provide reports. We obviously interacted a lot with the test suite maintainers
20:21
because we found bugs in the test suite too. We have to report to them. And this is reporting a lot to them. And one of our project is to test kself-test in advance, test kself-test master to find bugs in kself-test before they are actually run in production after.
20:41
If you find any problems, then report them. Are the current developers actually looking into them or do you have to ping them and make sure they take care of the problem?
21:00
Okay, so we have an SLA with Greg Crouartman. So he's waiting for our results. So they will look at it for LTS. And for mainline and next, we are used to reports. We report a lot of issues. So they know us. If you look at LWN articles,
21:21
they classify the different contributions to the kernel and Linaro is in the tested by top, top in the tested by. So they know us a lot. So they know that we provide good results. And when we provide a mail, there is everything that, every tool they need for reproducible, reproducing a build.
21:40
So we provide all the binaries that they need for reproducing it. If it's a big failure, we provide the text make them online that you can use. And they are now used to use text make for rebuilding things. And if it's a test failure, we provide the logs, obviously, as a job definition and all the binaries they need for reproducing it.
22:01
Do you actually check that every problem you found is actually fixed? And those are all the bugs that we found fixed. No. Not all of them. Of course, some bugs they don't care about. Yeah. Yeah. If you found some bugs on SH4,
22:22
no one will care, for example. Like QEMU 7.2 has been released recently, just not working on SH4.
22:43
I cannot answer that. We use AWS. No, it's not AWS. No, no. As we have a, yeah, we build a dynamic system, which means that we do not rent 5,000 machines file. Obviously not. It's just impossible for us.
23:00
We are a small company. Everything is dynamic. So from one second to another, if you look at the graph of usage, when Anders submits a plan for testing, in one minute we'll book 5,000 machines for building it. Likely more, 1.5 thousand machines to build it. They will build and they will just stop at the end.
23:27
So no, we're not, yeah, we don't have 5,000 machines.
23:43
So for the LK, we have multiple Lava instances in Linaro. In LKFT, how many devices? I don't know, about 20. 20. Yeah, and about five different device types. It's like Raspberry Pi's, Dragon Balls. Junos. Junos. X8, X15.
24:02
Yeah, X15, yeah. But yeah, you can have really large labs in Lava. So we have another one for just Linaro usage, where we have something like 100 balls, I think. The main one. Yeah.
24:20
Thanks.