Open Source Firmware Testing at Facebook
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47377 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
FirmwareStatistical hypothesis testingOpen sourceFacebookComputer architectureStatement (computer science)Interface (computing)Plug-in (computing)Statistical hypothesis testingSystem programmingPersonal digital assistantStatistical hypothesis testingInterface (computing)Computer architectureOpen sourceStatement (computer science)Plug-in (computing)Physical systemFirmwareArchitectureComputer animation
00:37
Statement (computer science)Open sourcePhysical systemData centerFirmwareProcess (computing)Statistical hypothesis testingExecution unitDisintegrationCodePhysical systemFacebookException handlingProduct (business)Open sourceGoodness of fitSoftware developerINTEGRALMathematicsCodeProcess (computing)Statistical hypothesis testingPhase transitionPower (physics)Statistical hypothesis testingComputing platformOffenes KommunikationssystemData centerCore dumpFirmwareBootingComputer animation
02:22
View (database)Right angleMereologyMassINTEGRALCASE <Informatik>Software developerVirtual machineStatistical hypothesis testingProcess (computing)Traffic reportingProduct (business)Unit testingSubsetScaling (geometry)Component-based software engineeringRepository (publishing)CuboidBuildingExecution unitCodeComputer animation
03:36
FirmwareStatistical hypothesis testingSoftwareComputer hardwareChannel capacityMoving averageStatistical hypothesis testingCuboidCASE <Informatik>Process (computing)InformationProduct (business)ResultantVirtual machineMereologyNumberOperator (mathematics)Web serviceSoftware bugFirmwareSoftwareChannel capacityMoving averageComputer hardwareSoftware developerComputer animation
05:21
Term (mathematics)Physical systemPoint (geometry)Product (business)Computer clusterFirmwareStatistical hypothesis testingComputer animation
05:41
Physical systemStatistical hypothesis testingFirmwareOpen sourceError messageComponent-based software engineeringCodeConfiguration spaceSystem programmingComplex (psychology)Personal digital assistantStatistical hypothesis testingContinuous functionDisintegrationDatabaseBinary fileSingle-precision floating-point formatRead-only memoryUsabilityService (economics)Independence (probability theory)SequenceGroup actionBenchmarkProcess (computing)Scale (map)Data centerPlug-in (computing)Network topologyControl flowSoftware frameworkFunctional (mathematics)Extension (kinesiology)Statistical hypothesis testingPhysical systemCuboidWritingCASE <Informatik>Process (computing)Bit rateSoftware developerFormal languageDemosceneOpen sourceProduct (business)RoutingError messageLimit (category theory)Projective planeNumbering schemeFacebookQuicksortForcing (mathematics)Revision controlOpticsStatistical hypothesis testingComponent-based software engineeringDataflowOrder (biology)TwitterMultiplication signAgreeablenessTerm (mathematics)Virtual machineConfiguration spaceTraffic reportingExpert systemBootingHydraulic jumpBenchmarkOperator (mathematics)Service (economics)Semiconductor memoryBinary codeThermodynamisches SystemIndependence (probability theory)ResultantComputer hardwareFirmwarePlug-in (computing)Scaling (geometry)DatabaseMultiplicationGame controllerOrientation (vector space)CodeCore dumpData centerComputer animation
12:16
Computer architectureProcess (computing)Computer architectureCycle (graph theory)Software frameworkVideo gameTexture mappingWindowDiagramMultiplication signComputer animation
12:38
Computer architectureElectric currentInstance (computer science)Descriptive statisticsLogicProcess (computing)Metric systemTraffic reportingVariety (linguistics)Level (video gaming)Event horizonPointer (computer programming)Software frameworkImplementationMultiplicationStatisticsStatistical hypothesis testingStatistical hypothesis testingRun time (program lifecycle phase)Modal logicPoint (geometry)Graph coloring1 (number)Multiplication signDiagramParameter (computer programming)MereologyCore dumpBlock (periodic table)Data managementTranslation (relic)Interface (computing)Plug-in (computing)CASE <Informatik>CalculationSingle-precision floating-point formatMathematicsFront and back endsComponent-based software engineeringData storage deviceOptical disc driveWebsiteContext awarenessTouch typingSound effectPlanningOperator (mathematics)Ultraviolet photoelectron spectroscopySoftware developerVideo gameCycle (graph theory)Dependent and independent variablesComputer animation
15:46
Statistical hypothesis testingComputer architectureGUI widgetTimestampFunction (mathematics)outputControl flowBlock (periodic table)Software frameworkComputer architectureGame controllerBlock (periodic table)Event horizonPlug-in (computing)Statistical hypothesis testingCore dumpStatistical hypothesis testingTerm (mathematics)Row (database)Dependent and independent variablesLevel (video gaming)Interface (computing)Component-based software engineeringTraffic reportingLocal ringForcing (mathematics)Multiplication signRandom matrixSummierbarkeitPlanningArmState of matterComputer animation
17:16
Plug-in (computing)Interface (computing)Parameter (computer programming)Component-based software engineeringLogicSoftware frameworkStatistical hypothesis testingDisintegrationStatistical hypothesis testingType theoryState of matterInterface (computing)Software frameworkPlug-in (computing)Process (computing)LogicINTEGRALParameter (computer programming)Multiplication signSet (mathematics)MathematicsSemantics (computer science)Error messageRun time (program lifecycle phase)Radical (chemistry)Category of beingTerm (mathematics)FlagTask (computing)Core dumpCuboidForcing (mathematics)GodWebsiteView (database)Computer animation
19:42
FacebookTraffic reportingFocus (optics)ResultantValidity (statistics)Physical systemSoftware frameworkUsabilityStatistical hypothesis testingArchaeological field surveyStatistical hypothesis testingPairwise comparisonINTEGRALBitProcess (computing)Moment (mathematics)Computer hardwareVirtual machineThermodynamisches SystemCASE <Informatik>LaptopDifferent (Kate Ryan album)AdditionExistenceQuicksortCovering spaceProjective planeControl flowRight angleBookmark (World Wide Web)Point (geometry)Arithmetic meanConsistencyNatural languageSoftware maintenanceProduct (business)Scaling (geometry)Integrated development environmentVideo gameComputer animation
24:58
Point cloudFacebookOpen source
Transcript: English(auto-generated)
00:05
Hi, everyone. I'm here with my colleague Marco today to talk about how we are doing open source firmware testing. Here's what we are going to talk about today. I will talk about a problem statement and clarify, why do we need firmware testing at all?
00:22
What do we want from a firmware testing system? Then Marco is going to go into details about the architecture of what we are actually using, and some details about how do we use interfaces and plug-ins in this testing system. Starting from the problem statement,
00:41
it's not news that we run open source firmware in our data center infrastructure, and we've been talking extensively about it in the past. Even at FOSDEM, there is a post about Facebook mini-pack platform, which runs open system firmware,
01:01
and all the development happens upstream, so we are using core boot and Linux boot for the firmware that powers this platform. Being core boot and being Linux boot, this means that everything happens upstream on GitHub and Gerrit. This means that we are using
01:21
a very common workflow with our development process is a very common workflow like you can see here. We go through the development phase, we add new features, we make a fix or whatever change, and then we build and test it. It's very similar to any software development process. Then we go through integration and end-to-end tests,
01:42
and when we feel comfortable enough that our change is good, we submit it for code review. A human colleague or somebody outside in the community will look at the code, will validate whether it's good or not, and will give us a green light or a red light on whether we can continue or not. Once that is done, we release the project,
02:02
we merge it on GitHub or on Gerrit, and then we run it and eventually debug it and find the issues in production, restart the development cycle, etc. So this is how it would basically work out there. Today's talk is focused on the integration and end-to-end testing side.
02:23
I'm not sure if you can read, but I will read all the boxes anyway. Since we have an external code repositories with GitHub and Gerrit, etc, but we also have internal repositories and machines, we have to split the development process in two parts.
02:42
The top part is the external process, where we go through the usual development, building unit tests, integration tests, code review, and merging. But then we also have another part inside. After all of this is done, we import the code into our internal repositories. We again build and run unit tests. We may have some additional component that is internal only
03:02
because it wouldn't work with outside infrastructures, not because there is any confidentiality or secret. Then we run integration and end-to-end tests, again, against our infrastructure. So we can test on actual stuff that we run in production on. Eventually, we will do a release candidate.
03:20
We will deploy on a canary, which means on a subset of machines where we run some tests, we validate that everything is good on a smaller scale, and then eventually, we'll go in mass production. Of course, after releasing, we will do debugging of issues in case something arise. Now, given that this development process is pretty simple,
03:43
and you're probably all familiar with this kind of process, even if it's split into internal and external, let's talk about why do we need testing of firmware at all. It's pretty obvious with software, but usually with firmware, we just take whatever the vendors give us, we just flash it,
04:02
update it, and hope for the best. Hope is not a strategy. So let's say that hardware is hard and firmware is not easy at all either. When we are talking about firmware, bugs can be really nasty in the sense that if they break a device, which means if your device is not capable
04:23
of running into production and taking traffic, you are basically running at reduced capacity. With large numbers, this can be very impactful to your business. Rolling out firmware can take much longer. For example, if you're running a web services or anything that is software only,
04:40
you can usually update in minutes. Worst case, it could be hours. With firmware, it takes much longer than that because you may have to reboot the machine. In most cases, you have to. So the problems arising from bad firmware can take a much bigger toll on our operations.
05:03
Additionally, firmware is very critical to a machine because it influences features and performances. So this is a very delicate part of everybody's infrastructure. So it's pretty clear that we need firmware testing and we cannot just update and hope for the best.
05:23
Now, I hope at this point we agree that we need to test firmware before we run in production and that we need the system to test firmware before it goes to production. Let's see what requirements we have in terms of how do we test firmware and hopefully this will be very similar to everybody's requirements.
05:43
The firmware testing that we want, first of all, has to be open source because the firmware that we run is open source. We strongly believe in the open source community. Everything has been developed upstream and we want the testing system to be open source as well, and community first.
06:02
It doesn't have to be some company's project and that requires some employee to approve the code. The owners will be from the community, not from a company. It has to work with core boot and Linux boot, which is our primary use case, but of course it's not the only one.
06:22
We wanted the testing system to be robust. When you work with many systems, you basically, if you're hit by stupid mistakes, they can slow you down a lot. So we don't want the testing system itself to fail, just like we don't want machines to fail in production as much as possible.
06:42
The same thing, we desire the same robustness also from the testing system. Because if you've been testing for 12 hours on a machine, you don't want to find out that your testing configuration was wrong, your credentials were not accessible, or some stupid errors. We want to detect errors way before the job even starts.
07:02
This testing system has to be generic as well, because it doesn't have to run only on our infrastructure, has to be done on everybody's infrastructure, and we will see that this is one of the main, we have been developing this on very public infrastructure, very accessible hardware like a Raspberry Pi.
07:22
It has to be scalable because of course we also have to run it in our infrastructure so we need to run on larger numbers. Another important requirement for us was the human friendliness. We don't want the system to be hard to use, we don't want the system to be human oriented. So it has to be simple by design.
07:41
We need the system to be easy to reason with, has to be obvious. Anytime you're doing something, what is it doing and how? There should be no magic at all. We wanted this to be flexible in terms of assembling the system from multiple independent components, just like bricks that you can put together and they will work together.
08:03
And it must be easy to set up and maintain, because not everybody has to be an expert in order to maintain a firmware testing system. And as I said, human friendly, we don't want people to have to be developers in order to use this system.
08:21
So we are focusing on using configurations, not code, to configure your jobs. If you're a developer, of course you're welcome to develop components and that will be very useful. But this will be mostly, we want this to be something we give to the community in a way that they don't have to understand how the system works internally to use it.
08:44
So of course, writing a new system is rarely a good idea. So we did our due diligence and looked into existing systems and that were matching all the requirements we had. We were not satisfied with the status of the internal and external systems that we observed.
09:03
They were all good in a sense, but they were not checking all the boxes that we just discussed. The main reasons were that we're either too complex to maintain, more complex than we actually were designing or to use. They were possibly only capable of running stuff
09:23
on the device under test and they were not capable of using external orchestrators, like even going through SSH. And in general, they were not generic enough. They had two scoped functionalities. Eventually, we decided to write our own and make it available to the community.
09:41
And we call it CONTEST, which stands for Continuous Testing. And in its public version, it's a single binary with a SQL database automated through a Docker container. So the setup is very easy. You just have to run one command. It will build one binary and will connect to one database with a relatively simple schema.
10:01
And it's written in pure Go because this U-root, which is one of the main components of Linux boot, is written in Go. We are happy with the language and familiar with it, so we decided to go with the same language, which gives us a lot of advantages in terms of memory safety, compile time checks, et cetera. So we can actually check a lot of things before we even run the system.
10:24
And being generic, it can do a lot more than just firmware testing, even if the primary goal is to do firmware testing. It's available on GitHub, on Facebook, incubators, the contest. And very quickly on what contests can do and cannot do. It's capable of running independent execution
10:43
flows on the targets that we are running on, like the device under test. We can have many of them, and they can all flow independently through their testing. Some of them may be faster or slower. It can talk to external services. You can use jump posts like SSH to execute operations on the target through something else.
11:02
It can run microbenchmarks. So for example, you can measure how much time does a machine take to boot, whether it's improving over time. You can measure and benchmark pretty much every micro operation that is executed. And it can also report back custom results in case you want to know if your interest is about regressions, about success rate,
11:21
or whatever you are interested in, or historical data, whatever, you can do that. And as I said, it can operate from Raspberry Pi to a worldwide data center scale infrastructure. What it cannot do is running a background processing. We intentionally want to keep every execution flow scoped.
11:42
So every step is independent, and there is no follow-up from one step to the other. Even if it's possible to do that, but we don't recommend it. There is no dependency tree, so everything you execute does not imply something. But you have to be very explicit on every single operation you execute, so there is no surprise.
12:01
And we intentionally didn't want any complex control flow. Either a target goes to the next step, or it doesn't. If you need more complex logic, you can implement your own plugins. I can skip this slide, because it doesn't really say anything useful. And I want to hand off to Marco, who has talked about the architecture.
12:28
So we'll go through the architecture of the system so that if you decide that you need to implement plugins, you have an idea of what the lifecycle of the job looks like. First, Andrea mentioned several times that the framework is meant to be pluggable.
12:42
So you will see in the diagram blocks that are light colored. And those are the ones that are supposed to be swapped at runtime, if necessary. And you will also see dark blocks. And those are part of the core framework and are static for the world job duration.
13:00
So the entry point into contest is a job request, which comes attached with the JSON description, that basically describes which plugins to load at runtime and which parameters to pass to these plugins. The submission is targeted to a listener plugin that can implement any interface that
13:22
might be relevant for your infrastructure, so HTTP or RPC, whatever. And then there's a translation layer in between the listener and the first core component, which is the API. And the job manager then takes in charge the job request. The job manager is responsible, basically,
13:41
to orchestrate the job through the lifecycle. Every single component in contest has access to a storage layer. And this is roughly divided into two parts. The job API, which is used mostly by the manager to persist the changes in the status of the job. And the events API.
14:01
This is an important concept in Contested that enables some of the use cases that Andrea explained. And the in-event is basically text description attached with a name that you can later, either at the end of the job or even offline for post-processing. You can fetch, and you can aggregate multiple events
14:21
to calculate metrics or statistics on how the job performed. The first step of the job is the acquisition of the targets, which are the devices on which the test will run. And on top of this, there's also the concept of locking, so that multiple workers do not trace on the same devices. Then we'll fetch a test definition.
14:42
So I mentioned a descriptor at the beginning. That will not include a description of the testing pipeline. Instead, it will include a pointer to another description that can live in a different back end. So you can implement custom logic to go fetch this further description.
15:03
And following, we will implement the test pipeline. So we will lay down the steps, and we will orchestrate the targets through each single step. The final stage is the reporting one. And this is where you will go back and fetch all the events that have been emitted. And you will aggregate metrics and produce
15:22
a JSON report that is persisted, and you can consume later for offline processing. Now, as a developer, where you see most of the variety is in the test steps, because this is where the actual logic to implement a test resides.
15:42
So it's probably worth zooming in and understand what is required to implement a step. So each step is associated with a control block, which is a core framework component that orchestrates the targets and has lots of responsibility in terms
16:01
of enforcing a correct behavior of the plugin. First and foremost, the ingress and egress happen via channels. So it enforces that the step acquires targets within a certain timeout and returns all the targets that have been acquired. Then the step has access to events
16:22
and can emit events for reporting. And basically, you will see this architecture replicated for each stage of your testing pipeline. As I mentioned, the responsibility of the control block are to enforce that the API that we ask to implement is respected.
16:44
So it will, first and foremost, record when a target goes in and comes out together with timestamps. It will track that everything that goes in also comes out. And it also enforces other requirements.
17:01
For example, the step needs to accept within a certain timeout target. And if it goes unresponsive, it's flagged as misbehaving and the whole test is terminated. Now, this is possible because we define clear interfaces that need to be implemented. And we clearly state which requirements
17:22
we want those interfaces to meet. And the test type example is the most relevant. So you have requirements in terms of how the channels should be managed, return values, timeouts. And the idea is that each interface should be, first
17:40
and foremost, simple to implement, but also simple to enforce on the framework side. Interfaces are also designed so that we can validate as much as possible at the submission time. So ideally, everything should be validated at compile time. And you have no surprises at runtime.
18:01
But I mentioned that the job descriptor contains parameters for each plugin. So you might have parameters that actually cannot be consumed because there's semantic or even syntactic error. We want each plugin to define rules and give us a way to say, yes, I would be able to consume those parameters, or no, this doesn't make sense for me.
18:23
And if the outcome is negative, we will reject the job straight away. With the interface approach, what we can do is then make sure that our logic that enforces the requirement is not subject to regressions.
18:42
So we make changes to the framework. And we want to make sure that the enforcement is still correct. So we have developed a set of integration tasks that basically operate on the plugins that we have written, which contain bugs, which basically
19:03
either do not respect timeouts or panic or return values that do not make sense for what they should be doing. And what the framework does is basically enforce that this is detected and misbehavior is detected and flag the property back to the framework.
19:21
And most of the times, the reaction of the framework will be to trigger a cancellation signal so that the whole job is interrupted. And this is basically what you are requested to do if you want to implement a plugin to be used by a contest.
19:41
After this overview, we have terminated. So if you have any questions, we are happy to take questions.
20:01
There are a couple of questions. Yeah, so I should think this contributes to even more validation frameworks fragmentation, because there are so many frameworks for validating. And already, they have foundation and not the support to unify things. This is so they are able to automate the testing and the support that they are trying to make,
20:22
and you buy results and other systems that are out. OK, so the question is, aren't we contributing to more fragmentation of testing frameworks? I'm not entirely sure what.
20:42
I mean, how to say? We checked what were the existing frameworks, and we realized that we were not able to find something that was matching all the requirements. We also looked for systems that we could improve rather than rewriting, et cetera. And we actually approached and tried
21:02
to collaborate with other projects, but there were different reasons why we were not able to improve it either, because there were two different from, sorry, they were too difficult to improve to make it work in an environment like ours, and at the same time, staying generic,
21:21
because they were either too focused or what else. So I agree that this adds one more system, but the reason why we published it is that we were trying to make something that was generic enough that could avoid having two focus systems that would only do one kind of thing,
21:42
and not being broad enough to cover many use cases. One of the things that we are also trying to do is to integrate this with the Corbut project, so that every commit on Gerrit would automatically trigger a test and report back. And looking at existing systems, it was difficult to make it work with every possible hardware
22:05
that we could find in the wild, like the person's laptop at their home, the machines in the data center, and so on. So I agree that this is adding a new system, but we are hoping that this would actually bring more consistency with adoption.
22:20
And it's open source, so at some point, if this is not adding any value, the best one will survive. You can say that. So these and the requirements for the framework were developed in the open. So there were multiple companies contributing and saying, we have the same needs, and this
22:42
is how we can meet those needs. Do you, by chance, have an overview of the automated runners that are available, like GitLab CI runners, et cetera, that you've already integrated this in,
23:00
or that others have integrated in, and you have failed and success stories, but don't do that so GitLab totally fails and breaks? So the question is, if we have already integrated this system into other CIs, like workflows, we are in the process of doing so. We have some integration with GitLab at the moment. Yeah, exactly.
23:21
Andra specifically is working on the external side. So GitLab, I am mostly focused on the internal side. So for examples out in the open, I think it's still a bit too early. We are finalizing GitLab integration.
23:55
All right, so the main issue that we found, actually that we were being reported about people using Linaro.
24:02
Yes, sorry, what's the comparison with Linaro's Lava? So we don't use it internally, but when we did the survey and talked with other companies, use it at some scale, their main issue was about the maintenance, the usability, the possibility to do migration, et cetera.
24:23
So this is not first-hand experience with Lava. But when we laid down the requirements, they were pretty much unhappy with the, let's say, suboptimal usability and the ability to maintain and migrate the system.
24:40
But again, this is not first-hand experience. All right, thank you very much. Thank you. Thank you.