Testing microcontroller firmware with Python
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 160 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/33667 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
EuroPython 2017155 / 160
10
14
17
19
21
32
37
39
40
41
43
46
54
57
70
73
85
89
92
95
98
99
102
103
108
113
114
115
119
121
122
130
135
136
141
142
143
146
149
153
157
158
00:00
CodeLibrary (computing)FirmwareModule (mathematics)Software developerPower (physics)Process (computing)Single-precision floating-point formatMicrocontrollerUnit testingSource codeSimilarity (geometry)Direction (geometry)Functional (mathematics)WritingLecture/Conference
00:45
Software testingPhysical systemExecution unitCodePersonal digital assistantFeedbackMaxima and minimaSingle-precision floating-point formatString (computer science)OvalIntegrated development environmentElement (mathematics)Active contour modelComputer fileSource codeDirectory serviceContent (media)EmailPreprocessorFunction (mathematics)UsabilityParsingRevision controlStrutImmersion (album)AutomationComputer hardwareSource codeString (computer science)Data structureFunctional (mathematics)CASE <Informatik>Computer architectureCartesian coordinate systemStandard deviationData typeError messageProjective planeProgrammanalyseFormal languageFamilyDifferent (Kate Ryan album)Pattern languageUnit testingInterface (computing)Programming languageContext awarenessVirtual machineSoftwareComputer fileLimit (category theory)Software developerMathematicsDomain nameHypercubeInterpreter (computing)Revision controlCore dumpExtension (kinesiology)Interrupt <Informatik>Module (mathematics)Electronic mailing listComputing platformMereologyInsertion lossLine (geometry)Event horizonOrder (biology)Vector potentialPrototypeDynamical systemImplementationFluid staticsMicrocontrollerMathematical analysisInformationSubsetSet (mathematics)Gastropod shellPreprocessorSound effectDirection (geometry)NumberoutputSingle-precision floating-point formatSoftware testingAttribute grammarFunction (mathematics)Variable (mathematics)Complex numberResultantAbstractionGreatest elementComplete metric spaceTemplate (C++)PeripheralContent (media)EncryptionCompilerMaxima and minimaComputer configurationScripting languageLibrary (computing)Directory serviceProcess (computing)BitObject (grammar)CodeCross-platformInterior (topology)Thermal expansionComputer programmingTablet computerOpen sourceType theoryRepresentation (politics)EmailOnline helpGame controllerReal numberVideo gameNormal (geometry)Machine codePresentation of a groupNamespaceSystem callMechanism designIntegrated development environmentWindowHierarchyPhysical systemGoodness of fitComputer hardwareLevel (video gaming)Repository (publishing)Uniform resource locatorDigital photographyLatent heatPiBuildingTouchscreenToken ringAreaMultiplication signSpring (hydrology)Group actionIndependence (probability theory)LengthDrop (liquid)Declarative programmingINTEGRALTelecommunicationWordBinary codeUtility softwarePlotterSurvival analysisPower (physics)View (database)Endliche ModelltheorieWhiteboardPoint (geometry)Address spaceSemiconductor memoryFuzzy logicCryptographyParsingFile system1 (number)Validity (statistics)Resolvent formalismGraphics tabletMatching (graph theory)Thread (computing)FirmwareExecution unitSimulationBound stateWrapper (data mining)Loop (music)XML
Transcript: English(auto-generated)
00:08
Hello, I'm Alexander and I work as an embedded software developer so I write firmware for microcontrollers and today I'm going to show you how you can test such firmware much faster by not running it on the real device but integrating it with Python.
00:23
And I already gave a talk in a similar direction last year it was called writing unit tests for C code in Python where I used the CFFI library to just extract single functions or single modules from your C source code, build a Python module out of it
00:40
so that you could load it into your Python process and then use all of Python's power to write unit tests. I'm building on this idea today and you might have already seen in another context this hierarchy of tests. With the unit tests last year we were at the bottom level where we only look at individual modules, individual functions and try to test them.
01:02
Of course it's also important to test your code in integration with all the other parts and so this year we're going to move up a layer into the integration tests and try to make sure that basically all of our firmware really works. The motivation for that is that the firmware that we write is rather complex
01:25
so in the end we might have half a megabyte of compiled code on our microcontroller. For this we've written thousands of test cases and when we run those against the real device it takes several hours for all those tests to complete. As a developer that's not really what I want
01:42
because when I make a change to the software I need to know fast whether this change is good or bad. Maybe a quarter of an hour is the upper limit for me. I don't want to wait hours before I can tell that. So what we did in the past was we just selected a subset of those test cases that tried to cover as much as possible
02:00
but of course you can't guarantee that it really gets every corner case that you have in your code base so there might still be errors that slip through. This is what I want to avoid and how this project started. I'm first going to show you the basic concept now and then afterwards give you a complete demonstration
02:21
based on some firmware example to show you really the code that does all that. If you look at your typical microcontroller application it might look something like that. You've got a large application code base that's pretty standard C code that you could compile for any architecture but of course you've also got hardware specific parts
02:41
and if you've structured your firmware in some way you might have a hardware abstraction layer that really interfaces with the hardware and provides a nice and clean C interface to your application. This is what we base this approach on because we want to make it look like this we keep the application code and just replace the abstraction layer beneath it
03:02
with some Python code. The approach for this will be similar to what I showed last year with the CFFI library. But first when we are talking in the context of microcontroller firmware that's already written in C you might wonder why do we use Python at all
03:21
we could just replace this hardware abstraction layer with a different hardware abstraction layer for another machine that's faster and just use C for that. Why Python? Now we are at a Python conference here so I don't need to tell you much about the general advantages that Python has over other languages. When you compare it with C code
03:41
then you can easily see that you need to write less code to achieve the same results. And it's also usually easier to use for example our microcontrollers have cryptographic functionality built-in hardware so we have for example an AES peripheral in there where we can just pass in some data
04:00
it does the AES encryption in hardware and returns back the result so this is something that we have to re-implement in our Python code for this to work. And there are libraries in C where you can do that there are libraries in Python where you can do that but the Python ones are usually easier to use easier to get around with. And in the end Python is also very powerful for this approach.
04:21
The hardware abstraction layer might contain functions that for this simulation that we're going to build here can work similar and don't need a different implementation so you can just use a single template in Python and let Python generate the code for all those functions that you need. You don't need to specify each and every function in your C code just for the
04:41
program to compile. And this is now what I'm going to show you. The general approach is that we'll collect all the application C source code all the implementation of the application and we'll collect all the header files of this hardware abstraction layer so everything that specifies the interface of the hardware abstraction layer
05:00
and when we've got both of those parts we can pass them on to CFFI. CFFI will use this information to generate a Python loadable module that we can then run from our Python interpreter and then we have our application running inside a Python process on a normal machine not on our microcontroller
05:20
and since the normal machine is much faster than the microcontroller hopefully also our application will be much faster and our tests can execute faster. So as an example I can unfortunately not show you our real code so I looked for a different project and I chose the MicroPython project because it's also very complex
05:41
and a very complex project and has a lot of code so you get really an impression of real life application of this approach and not some artificial example that I just constructed for this talk. You might have already heard about the MicroPython project, if not a quick explanation is that it's a reimplementation of the Python programming language
06:02
that can run directly on a microcontroller started several years ago also with a hardware device maybe you've seen it in previous talk in this room where you've got a little board with a small controller on it and it has a lot of hardware peripherals that you can access from your Python code more or less directly and it has
06:21
basically full compatibility to the standard CPython 3.5 code so they don't provide all the features but most that you want to use and first we'll have a look at the structure of the source code all the source code is open source you can find it on GitHub and if you look at the repository then you'll find a structure that looks like this
06:42
so there are some files containing documentation and then a lot of folders and many of those folders contain the code that is specific to one MicroPython port so MicroPython already supports not only a single platform but multiple platforms there are for example some parts even for Windows for Unix systems
07:03
but the initial port was this one here the ST port for an ST based microcontroller and in other folders for example the py folder there's the generic code that can run in every port so the py folder contains the Python interpreter
07:20
for example and for this example I will choose the minimal port similar to the ST port but very stripped down in functionality it just contains the bare essentials it gives you a Python shell that can run code but it doesn't give you any further hardware access but for this demonstration that should
07:42
be sufficient if we look at this minimal port this is all the files that are contained in there so you see only two C files the main C file contains the basic application startup code that initializes everything and you see this UART core file at the end this is what the implementation of the
08:02
hardware abstraction layer for this project is so it contains some functions for input and some functions for output so that we can provide the Python shell this is what the relevant functions look like from this file you've got one function that reads a single character of input and does something with that and you've got another function
08:22
that can print strings to standard output so in case of this minimal port if you really run it on the py board then it just uses UART communication for that so you see some accesses to the UART registers in this code and if we try to compile this file for our normal machine then this wouldn't work because there are no such registers
08:42
where you could write to so these are the functions that we want to replace with Python code so that we can execute them all the rest of this code that is contained in the minimal port also that's imported from the py folder that should run on our architecture without any problems so
09:01
then there's another project that I need to talk about quickly and that's called PyMake that's a re-implementation of the make utility and I want to use that in this demonstration to parse the make files that micro-python uses for its build process because for this approach to work we need to we need to know which source code
09:21
files to integrate into our binary where do we find the header files, where do we find the source code files and of course I could just hard code that in this example but if you wanted to use that productively it makes more sense to keep this information in one place and the place that already was chosen here is the make file so I just want to parse the make file and extract the
09:41
relevant information from there so that I can still keep all the information in this one place and don't have to adapt many places just for this whole process to work and PyMake gives me such a make file parser in Python so I'll build up on that. When we look at the micro-python make file one bit of interesting information in there are
10:01
the compiler options for example for the include directories so it just builds a list of those here where it specifies some directories where we can find the include files, the header files and in order to extract that using PyMake I can tell PyMake just to parse the make file
10:21
that I have without executing it really it just parses all the data structures and afterwards I can ask give me the contents of this variable inc where the include directories are contained and what I get back is not a string but is an object, the representation
10:41
you can see here it's actually not bad to get back an object like this and not the raw contents because if you look at the beginning then there is this value here that contains a reference to another variable so I'm not interested in the string value but I need
11:01
to have this value resolved to its actual value in order for this process to work and this is what can be done with the expansion object that the last call here returns there's a resolve string method on there and this then returns the final string value that I'm interested in so I can hide just
11:21
all this code in a simple function so that I can use that to resolve and now looking at the cleaned up example we can just call this function get back the string that were declared in the make file everything seems to work so we store this value in a variable
11:40
for later use and start with the real process now collecting the source code so for collecting the source code we'll just change into the micropython minimal port directory so all paths are relative to this directory and again look at the make file there's a variable called source c that lists up all
12:01
the source code files and at the beginning you see two that I've already shown to you the main file the uartcore file and then there are some references to other files in the lib directory again a directory that's shared by multiple ports and so we can just extract this list of source codes again using
12:22
the function that we've already created and again you can see here the last variable again contains a reference the reference is resolved to the actual value now if we want to create a list of source files we can use again the function converted into a set
12:41
then we need another variable from the make file that I haven't shown you so far it contains a list of all the source code from the pi folder not as c files but as object files so we just extract the name so that it matches to the file system location that we're interested in and add that to the set
13:01
and in the end there's one source file that we have to remove again that's this uartcore file that I showed you in the beginning because this isn't really source code of the application that's the source code of this hardware abstraction layer we don't need that now so we remove it here and then there's one more thing
13:21
that's special about MicroPython here if you look again at the path that are contained in here the last one refers to a directory called build and if you try to find that in the source code you won't find it at GitHub because it's not contained in any of the commits it's just a file that's generated during the build process
13:42
and contains information that MicroPython extracts from its own source code so we just tell the MicroPython build environment hey please build this file for us so we can compile it also into our extension module so then we have a list
14:01
of all the files so we can just open all those files collect the source code into one large string that we later pass on to CFFI and before we do that we make one more modification the last line here just renames the existing main function to MP main of course the MicroPython port assumes that it is
14:20
the only application that is running on this machine so it has its own main function when we import it into the Python interpreter there is already a main function so we rename it just to avoid any name conflicts here so with this step one is complete we have collected all the application source code now step two is to
14:41
collect all the hardware abstraction layer header files and for this minimal port that's rather easy there is only one header file that we need to include there were only those two functions I showed you in the beginning the header file defines some more functions that are not really used by the code so we only need this header file
15:00
but unfortunately we cannot pass it directly to CFFI because CFFI's parser for this information doesn't understand everything that the C code or the C standard allows it just understands a subset for example it has no idea of preprocessor directors it doesn't understand some attribute annotations on
15:20
the source code so we need to clean up the source code in order to make CFFI understand it and this is something that I've already shown last year in the example with the unit tests and I'm going to use similar code this year what we're going to do is this here we add some
15:40
definitions for the C preprocessor to the content of the header files for example for this attribute definition it just tells the C preprocessor to discard all this information CFFI doesn't need to know about it and if it's not there it can't get confused by it and afterwards we run the C preprocessor over the source code
16:00
so that it takes care of everything that's included of all ifdefs and other things and then CFFI can understand the results so this preprocess function that's used here looks like this it just calls the GCC's preprocessor and uses its output for the further steps and you can also see here
16:21
a reference to this include options variable from the beginning where we specify all the include directories of course the preprocessor needs to know about that and afterwards everything is contained in the string we get here so this is now an extract from the string that we've produced so far there are three function prototypes
16:41
one of them I showed you the implementation for the one in the middle that can output a string of arbitrary length and we can pass this code to CFFI but what we want to do is we want to have Python implementations for those functions so we want to tell CFFI
17:00
hey these are functions that C code can call but that we want to implement in Python and in order for that we need to prefix those prototypes with extern Python plus C then CFFI knows okay I need to generate some glue code in order to make that work and again the simple solution that you might come up with
17:21
in the beginning might be to just use search and replace and add the string there but depending on how complex your code gets it's better again to use a real parlor that understands the C code and can just make this modification this is based on the implementation that I showed last year it uses the pyc parser
17:41
that's also used by CFFI internally and it parses all your C code into a Python data structure then you can modify that data structure and write it out again and in this case we do that in two functions here we have one function that's called for every declaration
18:01
that we find in the C source code so that's the first one and whenever we hit a function declaration and it's for a function that we haven't seen already then we will prefix it with extern Python plus C and return the complete result otherwise we'll just ignore it and the second function
18:21
takes care of all the function definitions that we might hit so there might be inline functions that are specified in the header files of course we don't want to create a Python implementation for something that's already there so we just remove them from the output as well so we can simply run that
18:41
on the header content that we've collected so far, get back a new string and if we look at that string then we find the same functions as before but now prefixed with extern Python plus C so CFFI should be happy with that but there's one more modification that we need to make and this is this, we have this
19:01
MP main function already renamed in the C source code since we want to call it later from the Python code we need to tell CFFI that this function exists and that it should provide some way for Python code to call this function, so in this case it's the same function prototype as before but there's no extern Python plus C prefix
19:21
so CFFI will assume that it's an existing C function that we want to call from Python and not something new and with that step two is complete, we have collected all the header contents and now can move on
19:41
to CFFI and the CFFI source code is this it's only four lines so we first create the CFFI object to build our module we pass in the header content that we collected before and CFFI will generate the Python interface
20:01
out of this header content information and we pass in all the source code that we collected and CFFI will pass that on to a compiler to build our extension module that in this case will be called MPSim again we pass in the input directories that we had collected in the beginning
20:21
and afterwards we tell CFFI to compile all this into a loadable module and then next steps are completed and we have a loadable module so now we can run it and to run it we simply import that module and then we need to define the functions
20:40
that we wanted to replace with Python code and CFFI provides a decorator for that, it will just match on the function name so if we define a function that has the same name as one of those external Python plus three functions CFFI will know to call this implementation whenever the C code calls the function of this name
21:02
This is the implementation that reads a single character from standard input and this then is the implementation in order to write out the contents of a string and with that our implementation is complete, we have everything we need so I'm going to
21:21
try to show you now that this really works I've prepared a small script that contains basically this code I can run it and then I'm dropped into a MicroPython shell and I can execute MicroPython code in here I have the usual features of tap completion
21:41
that MicroPython provides I can call some of those functions I can look at the objects everything seems to work as it should and in order to demonstrate you that this really uses the functions that we've defined before, I can just modify that code and tell it to print everything
22:01
twice and then you can see every output that we get is there twice everything that I type is printed twice and it really executes our Python implementation of those C-level functions
22:20
Then I want to talk about some of the challenges that you might face and that we faced when we invented this approach for our source code First of all, your code should follow a certain structure in order for this to work easily. If you've just got a single file that contains everything it's hard to separate the hardware-dependent parts from the
22:41
general source code so what you really want to have is a clear distinction between the hardware abstraction layer and the application code Then you can just match on the folders for example collect the one part from the one folder and the other part from the other folder This is what we do in our example
23:01
or have some other mechanisms like the make files that I showed you before Then there's the problem of namespaces It's a perfectly valid C code to have two files that contain functions static functions with the same name but since this example
23:21
collects all the source code into one large string everything ends up in the same namespace and this won't really work so you'd need something like that where you prefix every function for example with the name of the module so that you end up with a unique name
23:40
Another problem is platform-dependent code I've prepared a small example that looks innocent but contains multiple problems when you try to run it on different architectures So what we do here is we have defined a structure we fill in some values into this structure afterwards calculate a checksum
24:00
over that structure and of course the checksum should always be the same no matter on what platform this code runs if the data in the structure is the same The problems that you have here I'll show you the corrected version already is first the data types in the structure if you just use shorts or ints
24:21
there's no specification that defines what byte size you have here so you should use types that really specify that Then you might get problems with padding that the compiler inserts into your structure so we tell it to avoid this padding with the attribute pect and last but not least you need to consider
24:42
the endiness of your data the byte order of your data if you've got multibyte values In the second example I use some standard functions just to convert the endiness of those values always to network byte order which is network engine byte order and so the structure always should contain the same
25:02
values here and the checksum should really be identical Another problem you might get with code that relies on interrupts because that's not really supported on this platform you might get something like this if you use threads to really achieve some parallel events but we didn't have the use for that
25:21
now so I haven't tried that and last but not least let me talk about the external interface for your code if we look again at this picture what's beneath your hardware abstraction layer in your usual application is the actual hardware and when we take away the abstraction layer we also
25:41
take away the hardware so you need to replace that with something else one solution would be to use just Python code running against your application or what we use in our environment is just a network interface that can be used by our existing test cases so they deliver their input there and get their output back
26:00
so the test case doesn't even need to know whether it talks to the real device or our simulation device OK, now you've done all of that you'll also get some benefits out of it and the first benefit and why we did all that was the fast execution so I've collected all the test cases
26:20
that I can run against the simulation and they were executed in roughly five minutes and if I run the same set of test cases against the real device it takes one and a half hours so that was already a huge speed up in fact these are the numbers from the first prototype that could execute everything we didn't invest any more
26:40
effort in optimizing that any further because it was already fast enough for everything we wanted another benefit that you get out of this is dynamic program analysis you might know about static analysis tools the warnings that the compiler gives you or that special linters give you but there are also dynamic program analysis tools that look
27:02
at your code or that don't look at your source code but that look at your binary code while it's being run and can give you more information one tool that we've integrated easily is the address sanitizer that's just some extra compile options that you include into your calls
27:21
and then the compiler will add extra code that checks for invalid memory accesses, out of bounds accesses and if it detects something like that it will just abort at this point and a second tool that we use is a fuzzer it's called American Fuzzy Lop that tries to be a bit more intelligent than other fuzzers
27:41
by trying to find new code path automatically and you can use that with Python code as well in our case we just use a wrapper provided by AFL to compile the extension module, it's called AFL-GCC, it calls internally to GCC but in a way that
28:01
AFL support is integrated so this is all that you need in your code for the AFL support to be present and then there's another nice tool called Python AFL that's actually intended to run Python code with this fuzzer, not the extension code but Python code but it also supports this use case
28:20
and then there's a small script that in this case reads fuzzer input from standard input and runs it against the application in a loop we did this with our code for some, I don't know, 7 billion executions that fortunately or unfortunately didn't find any problems but
28:40
it works not with the highest speed but you can use it and the last benefit that you gain from this approach is a certain kind of hardware independence, you can do your development without having access to the real hardware so maybe in the beginning of your project when the real hardware isn't really available right now or even later on when the real
29:02
hardware is just too expensive or you just have a few of them, with this approach you can easily scale and do your tests in parallel on many devices because you just need a standard PC, you don't need any complex setup for your hardware and with that my talk ends and thank you for your attention
30:10
Is Python the outside world or how did you manage this? In our case the outside world is really just a communication channel we get some input there, we have to process that and generate the correct output
30:22
of course you could do something like I said in the interrupt example that you use some threads that every five minutes the simulated value of some sensor or do whatever you want there but this wasn't necessary for our use case