Testing microcontroller firmware with Python

Video in TIB AV-Portal: Testing microcontroller firmware with Python

Formal Metadata

Testing microcontroller firmware with Python
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Testing microcontroller firmware with Python [EuroPython 2017 - Talk - 2017-07-10 - Arengo] [Rimini, Italy] Last year's talk (https://ep2016.europython.eu/conference/talks/writing-unit-tests-for-c-code-in-python) showed you how to use CFFI (https://cffi.readthedocs.io/) to write unit tests for C code in Python. This year we will take the concept one step further and create integration tests covering (almost) the whole firmware of a microcontroller, again leveraging the power of CFFI. But instead of running the firmware on the controller, it will be executed on the development machine (that is, a standard x86 architecture), allowing for much faster test execution, without requiring the target hardware. For this to work, all the hardware-dependent parts of the firmware code need to be replaced by Python code simulating the hardware functionality, so that all the firmware above this hardware abstraction layer can be executed unmodified. In addition, this allows to use advanced security testing tools like AddressSanitizer (https://github.com/google/sanitizers/wiki/AddressSanitizer) and american fuzzy lop (http://lcamtuf.coredump.cx/afl/) that would not be able to run directly on the microcontroller
Module (mathematics) Functional (mathematics) Code Direction (geometry) Software developer Source code Similarity (geometry) Microcontroller Unit testing Power (physics) Process (computing) Single-precision floating-point format Writing Library (computing)
Context awareness Presentation of a group Source code Computer programming Different (Kate Ryan album) Computer configuration Single-precision floating-point format Personal digital assistant Core dump Encryption Error message Physical system Cross-platform Software developer Electronic mailing list Maxima and minima Bit Unit testing Digital photography Process (computing) Oval Telecommunication Order (biology) Pattern language Computer file Open source Maxima and minima Microcontroller Online help Template (C++) Peripheral Hierarchy Computer hardware Energy level Representation (politics) Data structure Computing platform Computer architecture Standard deviation Information Interface (computing) Content (media) Code Directory service Cartesian coordinate system Limit (category theory) System call Vector potential Compiler Software Personal digital assistant String (computer science) Interpreter (computing) Video game Family Window Library (computing) Building Greatest element Code INTEGRAL Interior (topology) View (database) Insertion loss Mereology Subset Formal language Mathematics Active contour model Programming language Email Thermal expansion Variable (mathematics) Element (mathematics) Repository (publishing) Normal (geometry) output Software testing Physical system Resultant Implementation Game controller Functional (mathematics) Feedback Real number Virtual machine String (computer science) Gastropod shell Integrated development environment Software testing Utility software Module (mathematics) Execution unit Projective plane Single-precision floating-point format Object (grammar) Abstraction
Email Code Length Plotter Direction (geometry) Source code Execution unit Survival analysis Set (mathematics) Function (mathematics) Mereology Subset Preprocessor Single-precision floating-point format Encryption Endliche Modelltheorie Extension (kinesiology) Source code Parsing Email Touchscreen Computer file Electronic mailing list Maxima and minima Variable (mathematics) Process (computing) Preprocessor Oval Order (biology) output Pattern language Resultant Implementation Functional (mathematics) Computer file Virtual machine Directory service Content (media) Declarative programming Attribute grammar Revision control Prototype Goodness of fit Pi Complex number String (computer science) Data structure Module (mathematics) Domain name Standard deviation Information Interface (computing) Content (media) Usability Line (geometry) Directory service Cartesian coordinate system System call Compiler Uniform resource locator Spring (hydrology) Integrated development environment Software Personal digital assistant Function (mathematics) Interpreter (computing) Object (grammar) Abstraction
Scripting language Tablet computer Functional (mathematics) Code Function (mathematics) Order (biology) Gastropod shell Object (grammar) Complete metric space
Implementation Functional (mathematics) Computer file Code Namespace Source code Function (mathematics) Machine code Mereology Cartesian coordinate system Type theory Fluid statics Mechanism design Oval Function (mathematics) String (computer science) Order (biology) Computer hardware Data structure Abstraction
Dynamical system Group action Code Multiplication sign Strut Source code Set (mathematics) Insertion loss Function (mathematics) Mereology Programmanalyse Fluid statics Semiconductor memory Computer configuration Extension (kinesiology) Physical system Area Software developer Binary code Sound effect Bit Type theory Telecommunication Order (biology) Interrupt <Informatik> output Pattern language Whiteboard Automation Data type Immersion (album) Point (geometry) Functional (mathematics) Game controller Token ring Real number Maxima and minima Drop (liquid) Event horizon Attribute grammar Power (physics) Number Revision control Latent heat Prototype Computer hardware Software testing Data structure Computing platform Address space Computer architecture Module (mathematics) Standard deviation Information Interface (computing) Projective plane Mathematical analysis Independence (probability theory) Cartesian coordinate system Compiler Hypercube Word Software Integrated development environment Personal digital assistant Computer hardware Revision control Fuzzy logic Abstraction
who i mentioned there and I work as an embedded software developer so I write them before microcontrollers and today I'm going to show you how you can test such from much faster than not running it on the really wise spot integrating it with and I already gave a talk in a similar direction last year was called writing unit tests for C code in Python on where I used in the sea of I library to just extracting the functions of single modules from your C source code build a Python module out of it so that you could loaded into your Python process and then use all of patents power to write unit tests
on on building on this idea today aren't you might have already seen in another context this hierarchy of tests with the unit test last year we were at the bottom level but we only look at individual modules individual functions and try to test them of course it also important to test your coat integration of all the other parts and so this year we're going to move up a layer to the integration tests the and try to make sure that basically all of of them there really works and the motivation for that is that i the from that rewrite this is rather complex so in the end we might have half a megabyte of compiled code or microcontroller for this we've written columns of test cases and when we run those against the real device it takes several loss for all those tests to complete and as a developer that's not really what I want because when I make a change to the software I need to know fast whether this change is good or bad maybe a quarter of an hour as that the upper limit for me I don't want to wait hours before it can tell that so what we did in the past was we just select a subset of all test cases that try to cover as much as possible of course you can't guarantee that really get every corner case that you have in your code place so there might still be errors that them the and this is what I want to avoid and to how this project started so I'm 1st going to show you the basic concept now and then after what's give you a complete demonstration based on some from the example to show you really the the code that does all that and if you look at your typical microcontroller application of might look something like that you got a lot application code base that's pretty standard C code so that you could compile for any architecture of course all to get hotter specific parts and if you structured your firm ran somewhere you might have an another abstraction layer that really interfaces with the hardware and provides a nice and clean C interface to your application and this is what we replace this approach on because we want to make it look like this we keep the application code and just replace the abstraction layer beneath was some Python code and the approach for this will be similar to what I showed you last you with this unify library in but the 1st react talking in the in the context of microcontroller family that's already wouldn't see you might wonder why do we use Python at all we could just replace this this Hopper abstraction layer with the different ah projection layer for another machine that's faster and just you see for much by 1 Python and now we are at a pattern conference so I don't need to tell you much about the general advantages Python has over other languages we compared with the with the code then of you can easily see that you need to write less code to achieve the same results and also usually easier to use for example our microcontrollers have cryptographic functionality built in hardware so we have for example and AS peripheral in there where we can just pass in some data but does the AES encryption hardware and returns like the result of this is something that we have to reimplement in our Python code for this to work and there libraries and see where can do other libraries in Python I can do that at the bottom I usually easier to use you to get around with and in the end but is also a very powerful for this approach but the hopper abstraction layer might contain functions that for this simulation that we're going to build here cambric similar and don't need a different implementations so you can just use a single template in Python and let Python generate the code for all those functions that you need you don't need to specify each and every function in your you see coat just for the for the program to compile the and this is now what I'm going to show you the general approaches that you collect all the applications see source coach all the implementation of the application and will collect all the header files of the software abstraction layers so everything that specifies the interface of the outer abstraction layer and when you got both of those parts we can pass and want to see if if I see if if I will use this information to generate a Python Loadable Module and that we can then run from our Python interpreter it and then we have our application running inside of Python process on normal machine not on microcontroller and since the machine much faster than the microcontroller hopefully also worked occasionally much faster notice canexecute faster so as an example I can unfortunately not show you our real cold so I look for different projects and I chose the Micro pattern because it's also very complex and over a complex project it has a lot of code so you get really an impression of a real life application of this approach and not some artificial example that it just constructed for this talk home you might have already heard about the like apartment projects that if not quick explanation is that it's a reimplementation of the Python programming language that can run directly on microcontroller started several years ago also within half a device that we've seen in previous talk in this room and we've got a little bored with a small control on that lot of other peripherals that you can access from your Python code more or less directly and test basically full compatibility compatibility to build the stand that the Python 3 . 5 code so they don't provide all the features but most that you would use
and 1st we'll have a look at the structure of the source code for all the source code as open source you can find it and get up and if you look at the repository then you find a structure that looks like this so there are some files contain documentation and then a lot of photos and many of those folders containing the code that is specific to 1 like a python port so Mycoplasma already supports not only a single platform but multiple platforms there are for example some parts even for a Windows for Unix systems yeah and the the initial port was this 1 here BST port honesty based microcontroller and in other folders for example the high for the this is the the genic didn't the generic coach that can run in every port so the pi folder contains so that the Python interpreter for example and for this example I will choose the minimal porch with this similar to the to the ST port but very stripped-down functionality it just contains the bare essentials that gives you a Python shell that can run code but it doesn't give you any further Hopper access but for this demonstration that should be sufficient if we look at this minimal minimal support this is all the file to contained in there so you see only to see files in the main C file contains the basic application start-up code that initializes everything and you see this you want to call quality and this is what the the implementation of the hopper abstraction layer for this project is so it contained some functions for importance some functions faltered so that they can provide the the potential this is what the the relevant functions look like from this file if got 1 function that reads a single character of input and or something with that and you've got another function that can print strings just ended up at so in case of this minimal port if you really want on the plyboard but it just uses the view of communication for that so you see some of the excesses to the you add registers in this coat and if we try to compile this file for our normal machine that this wouldn't work because there are no such just as they could write to the so these are the functions that we want to replace with Python code so that we can execute them all the rest of this code that is contained in the minimal portal to that of an imported from the pi folder the that should run on architecture without any problems so and then there's another project I need to talk about quickly and that cop I make notes and re-implementation of about half of the make utility and I want to use that in this demonstration to pass in the make files that Michael python uses for its the process because for this approach to work we need to you need to know which source code files to integrate into our binary where do we find the header files where you find the source code files and of course I could just talk to that in this example but if you want to use that productively it makes more sense to keep this information in 1 place and the place that already was chosen here is the file so I just want to pass the file and extract the relevant information from there so that I can still keep all information this 1 place and don't have to adapt many places so just for this whole process to work and I make such and make football in Python so although on that when we look at the micro pattern file 1 bit of interesting information there are all the compiler options for example for the 2 directories so it just builds on a list of those here where to specify some territories where we can find the include files that files and on order to extract using pi make I can't help I might just to pass the file that I have without executing it really just piles all the data structures and afterwords like ask by Mike right give me the the contents of this variable ink well-being every into territories are content and when I get back is not a string but is not church on the representation you can see here and it's actually not that to get back and an object like this and not the raw content because if you look at the beginning then there is contained in this value here that contains a reference to another variable so I don't I'm not interested in this industry string value but I'm need to have this value resulted to its actual value in order for this process to work and this is what can be done with the expansion objects that the last call here returns there's a result string method on there and this then returns the final String value that I'm interested in so I can I just this coach in a simple function on so I can use that to resolve we don't know looking at the cleanup example we can just call function and get back the string that were declared in the Makefile everything seems to work so we store this value the variable for later use and start with the real process not collecting the source code so for collecting the source code are you just change into the mike present minimal port directory so all paths correlative to this directory and again look at the Makefile there's a variable called source seeing that lists of all of the source code files and at the beginning you see 2 that I've already shown to you in the main file you would core file and then there there are some references to other files in the directory began the directory that's shared by multiple ports and so we can just extract this
list of source code again using the function that we've already created and again you can see here in the last variable again contained a reference the references result to the actual value now if we want to create a list of source files we can use again the function and convert it into a set of then we need another variable from from the make file that I have shown you so far contains a list of all the source code from the from the pie folder not as C files but as object files so we just jack the names that matches the presence and location that we're interested in and at to the set and in the end there's 1 source file that we have to remove again that this you would call father showed him the beginning because this isn't really source code of the application that's the source code of the software abstraction layer we don't need that now so we're in which here and then there's 1 more thing that's special about like a python here if you look again at the path that are contained in a year and the last 1 refers to a directory called built and if you try to find that in in the source code so you won't find it it up because it's not contained in any of the commits it's just a file that's generated during the process the and contains information that remark extract from its own source code so we just tell the like Python good environment hey please built this file for us so we can compile it put into our extension with you so and then we have a list of all the files so we can just open all the parts collect the source code into 1 large that related passed on to see if I and before we do that we make 1 more modification of the last line here just renames the existing main function to M main on of course the micro port assumes that is that it is the only application that is running on this machine so it has a domain function when we imported into the Python interpreter there is already a main function so we rename it just avoiding conflicts here so with this step 1 is complete we have collected all the application source code no step 2 is to collect all the hopper abstraction they ahead hearts the and it's for this minimal port that's rather easy there's only 1 header file that we need to to include the role only those 2 functions I showed you in the beginning that a all find some more functions so that I'm not really used by the coach so we only need this header file but unfortunately we cannot posit directly to see if a fly because supervised although for this information doesn't understand everything about the code to oversee standard allows a just understand the subset for example it has no idea of preprocessor directives it doesn't understand some some attribute annotations on the source code so we need to clean up the source code so in order to make this if I understand that and this is something that I've already shown last year in the example of the unit tests and I'm going to use similar code this year from what we going to do is seen and this year BA at some definitions of for the C preprocessor to the content of the header files for example for this attribute definition with just tells the C preprocessor to discard all this information to unify doesn't need to know about that and if it's not there it can't get confused by at and afterwords we run the C preprocessor over last source code so that it takes care of everything that included of all the steps and other things and I'll mention of I can I can understand the results so this preprocessed function that's used here looks like this is just called the GCC's preprocessor and uses its output of data for the further steps and you can also see here a reference to this include options variable from the beginning where we specify all the into directories of course the preprocessor needs to know about that and after everything is contained in the spring we get here so this is now an extract from the string that produced so far there's 3 function prototypes on 1 of them I showed you the implementation for for this 1 little but cannot wouldn't string of arbitrary length and so we can pass this coach to see the fire but what we want to do is we want to have Python implementations for those functions so we want to tell see if if I say these are functions that C code can call but that we want to implement in Python and in order for that we need to prefixes those prototypes with Python plus C Nancy of I know circadian I need to generate some glue code in order to make that work and again the simple solution that you might come up with in the beginning might be to just search and replace at the screen there are about depending on how complex the code gets its better again to the user real part of that understands the C code and can just make this modification this is based on the implementation of the showed last year it uses the pie ce that's also used by C of y internally and the apostles all you see coat into a Python data structure then you can modify the data structure and write it out again and in this case how we do that in in 2 functions here we have 1 function that's called for every declaration that we find in the C source code so that the first one and whenever we hit a function declaration and its for function that we haven't seen already then we will prefix that would extend pattern plus C and return the complete result otherwise we just ignore it and the 2nd function takes care of all the function definitions that we might hit so there's might be line functions that are specified in the header files of course we don't want to create a Python implementation for something that's already there so we just remove them from the from the output as well so we can simply run that on that day the content that we've collected so far get back a new string and if we look at that strain that we find the same functions as before but now prefixed with extant Python plus C so see if if I should be happy with that but there's 1 more modification that we need to make and this is this we have this NP main function already renamed in the C source code since we want to call it later from the Python code we need to tell certify that this function exists and that it should provide some way from before Python code to call this function so in this case it's the same function prototype as before but there's no extant present the prefix so see if I will assume that it's an existing function that we want to call Python and not something new the some and with that step to complete we have collected all the contents and now can move on to see if if I I and the supervised source code to is this it's only 4 lines so we 1st create the cipher phi objects to build all module we passed in the header content that become like before conceive of I will generate the Python interface of the 2nd information and we pass in all the source code that we could collect debts and see if I will pass that on to a compiler to build our extension module that in this case will be called and piece in again we pass in the into directories of that we have collected in the beginning and after what's we tells you the fight to compile all this into a lot of a model and then make subsequently this and we have a loadable modules so now we can run at and to run to be simply import that module and then we need to define the functions that we want to replace with Python code and survive provides a decorator for that it will just match the function name so if we define a function that has the same name as 1 of those extend price and plot the functions of survival know to call this implementation whenever the C cold calls the function of this name on this is the implementation that reads a single character from standard input hand this then this the implementation in order to write out the contents of the string and with that our implementation is complete we have everything we need so I'm going to try to show you now that is really works 5 prepared small
script that contains basically this coat McCann Brown and then and dropped into a Michael Python shell and I can executes Michael Python code in here I have the usual features of tab completion that Michael provides I can call some of those functions you can look at the objects everything seems to work as it should and in order to demonstrate to you that this really uses the functions that we've defined that before we can just modify that coat and
tablets to print everything twice and
and you can see UK every output that we get is that twice everything that I type misprinted twice and if we execute our Python implementation of the sea level functions OK mom I want to talk
about some of the challenges that they might face in that we faced a when we're have invented this approach far source code and 1st of all you code should follow a certain structure in order for this to work easily yeah if you've just got a single file that contains everything it's hard to separate the hopper dependent parts from the general source codes so what you really want to have as a clear distinction between the hardware abstraction layer and the application code then you can just imagine the folders for example collect the bank 1 parts from and 1 folder and the other part the other folder this is what we do in our example we have some some other mechanisms like the make constitute to before yeah yeah and then there's the problem of namespaces in the it's perfectly valid the codes to have 2 files that contain functions static functions with the same name but since this example of collect all the source code to into 1 large string everything ends up in the same namespace and so this won't really work so you eat something like that the prefix every function for example with the name of the module so that you end up with the unique name and another problem
is platform-dependent coach have prepared a small example that looks innocent but contains multiple problems when you try to run at different architectures so what we do here is we have defined a structure we fill in some values into the structure afterwards calculated checksum over that structure and of course the Texan should always be the same no matter how bad about that from this code runs if the data structure is the same and the problems that you have here to show you the the corrected version already at is 1st the data types in the structure just use the short story and there's no specification that defines what what bite size you have here so you should use types that we specify that then you might get problems with adding that the compiler inserts into your structure so we tell it to avoid this pattern with the attribute packed and last but not least you need to consider the engine is few data so that the byte order of your data if you've got motorbike values and so in the 2nd example I use some standard functions just compare the desert and in its of those values always to network by order which is big quite began invite order and so the structure always should contain the same values here and the checksum should really be identical the problem you might get with
code that relies on interrupts because that's not really support on this platform you might get something like this if you don't use threats to it we achieve some some parallel events but we have to have a useful that can also I haven't tried that and last but not least so that we talk about the external interface for your coat again and this picture of what's been used your hyper abstraction layer in your usual applications the actual hardware and when we take away the abstraction layer we also take away the hardware so you need to replace that with something else on 1 solution would be to use just Python code running against your application or what we use in our environment is just a network interface that can be used by our existing test cases so they deliver their input their get there are put back and so the test case doesn't you need to know whether talks to the really wise all simulation device contain no you don't have all of that you also get some benefits out of it and then the 1st benefit and why we did all that was the fast execution so I have collected all the test cases that I can run against the stimulation and they were are executed in my roughly 5 minutes and of around the same set of test cases against the really wise takes 1 and a half hours so that was already a huge speedups effect is all numbers from the 1st prototype that could execute everything we didn't invest any more effort in optimizing that any further because it was already fast enough for everything you wanted from another benefit that you get out of this this dynamic program analysis you might know about static analysis tools no 1 except the compiler gives you all that special-interest give you from but there also dynamic program analysis tools that you look at your coat In of that don't look at the source code of that look at your binary code while it's being run and can give you more information part 1 tool that we've integrated easily is the address sanitizer that's just some extra compile options that you include into you called and then the compiler without extra code that checks for invalids memory accesses of bone successes and if it detects something like that it will just board at this point and 2nd tool that we use is a father squad American fuzzy locked tries to be a bit more intelligent than other others by trying to find new coat Paul automatically and you can use that with Python code as well and in our case we just use of the rapa provided by air fell on to compile them extension modules called a fellow GCC cost-control lead to GCC but in a way that if l support integrated so this is all that you need in your coat for the for the FL support to be present and then there's another nice tool called Python AFL that's actually intended to run python code with this not the extension corpora Python coach but it also supports this use case and members of a small script so that in this case reads fuzzy input from standard inputs and runs against the application in group are we did this was our coach for some and 0 7 billion executions that fortunately or unfortunately didn't find any problems the our but it works but with the with the highest speed you can use it and the last benefit that you gain from this approach has a certain kind of copper independence you can do you development without having access to the real hardware so maybe in the beginning of the project when the real Hopper isn't really available right now right now or even later on when the real pressed just too expensive where you just have a few of them along with this approach you can easily scale and do you tests and power on many devices because you just need a standard PC you don't need any complex set up for you for opera and with that when tokens from thank you for your attention Thank you you and the area and the same that we've got time for maybe 1 very quick question is something like fast something do you have the in general and things will talk person and in general you have to simulate the outside world was embedded so the system so there are some imports you're waiting on a new C code and maybe you check something on you the control group was something of this and you have to simulate this and so you to stimulate this some Python the also words or order to ministers you know in our case the outside world is really just a communication channel we get some input there we have to process pattern generate the correct output of course it could do something like this I said the drop example that you use some threats that for every 5 minutes the simulated value of some sense or or whatever you want there was this was in the all use case it and in which