We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Programming Reconfigurable Devices via FPGA Regions & Device Tree Overlays

00:00

Formal Metadata

Title
Programming Reconfigurable Devices via FPGA Regions & Device Tree Overlays
Subtitle
A User View Benchmark on a Declarative Reconfiguration Framework
Title of Series
Number of Parts
611
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2017

Content Metadata

Subject Area
Genre
Abstract
We share our experiences with a new framework in the Linux kernel forprogramming reconfigurable devices, namely MPSoC-FPGAs. Our example use caseintegrates reconfigurable hardware accelerators into the Crypto API. We applya new, declarative and device-tree-driven reconfiguration framework within theLinux kernel as proposed and implemented by Alan Tull. The implemented conceptmaps reconfigurable regions within the FPGA to device tree nodes. Theinsertion of a device tree overlay triggers the reconfiguration of thecorresponding reconfigurable region. The reconfiguration process consists ofthe scheduling, descheduling and execution phase. Based on our usecase,benchmark results for the scheduling phases are shared. We present thebottlenecks revealed by our benchmark and show currently missing components ofthis approach. We conclude that the current implementation is already in ausable state for developing and deploying MPSoC-FPGA based heterogeneoussystems.
Field programmable gate arrayNetwork topologyBenchmarkDeclarative programmingView (database)Software frameworkDevice driverComputing platformStructural loadPartial derivativeLambda calculusNetwork topologyComputer hardwarePhysical systemAlgorithmLatent heatDevice driverPhysicalismInterface (computing)CryptographySpacetimeSoftware frameworkMultiplication signProcess (computing)Speech synthesisOverlay-NetzVideoconferencingComputer engineeringBridging (networking)MereologyBitStreaming mediaStandard deviationSystem on a chipData structureFirmwareConfiguration spaceData managementAssociative propertyCASE <Informatik>1 (number)MultiplicationType theorySingle-precision floating-point formatPresentation of a groupRepresentation (politics)Structural loadLogicPoint (geometry)MathematicsCartesian coordinate systemComputing platformComputer programmingKernel (computing)Message passingPeripheralSlide ruleSemiconductor memoryField programmable gate arrayParameter (computer programming)Line (geometry)Mixed realityRoboticsFigurate numberFunctional (mathematics)BefehlsprozessorXMLComputer animationProgram flowchart
Gamma functionConnectivity (graph theory)Convex hullWalsh functionInterior (topology)Graph (mathematics)Physical systemMultiplication signBootingMereologySoftware testingFluid staticsConfiguration spaceSemiconductor memoryFlip-flop (electronics)MappingAddress spaceSlide ruleProcess (computing)Computing platformEmailDevice driverOverhead (computing)Computer hardwarePhase transitionComputer fileEncryptionAlgorithmElectric generatorPresentation of a groupInformationFunctional (mathematics)Ferry CorstenSpacetimeOverlay-NetzInterface (computing)CASE <Informatik>PiNetwork topologyFirmwareAdditionScheduling (computing)Data managementLatent heatKernel (computing)Cycle (graph theory)TelecommunicationBitMechanism designSoftware frameworkConnectivity (graph theory)RootCausalityResultantStreaming mediaComplete metric spaceWindows Registry1 (number)Volumenvisualisierung
XMLComputer animation
Transcript: English(auto-generated)
So, our next speaker is Ulrich Lande-Barr, and he spoke about programming reconfigurable devices. Let's upload it. Yeah, thank you. So, I'm going to have a short talk about using reprogrammable devices or reconfigurable
devices, namely FPGAs within Linux-based systems. We use the recently mainline ported FPGA framework that has been written by Alan Tarl that is working for Intel, and I think last year on the first time there was already a speech
about that, a talk that was a longer one, but sadly that is lost, so the video is not present anymore. So, why we came across this topic is we implemented a system that accelerates cryptography engines
by, that's the traditional one, the software driver running on a CPU, and we moved that into an FPGA here, and that's basically what we did. We needed some more subsystems, that's the hardware driver that talks to the AES hardware
engine via DMA, and that actually integrates that into the crypto API within the Linux corner. That leverages the accelerator, the hardware functionality, and exposes it to all system
portions that want to use it, so that could be user space tools, the same as kernel space tools because all of them can talk to the crypto API, and this way we have a very flexible interface to our new hardware. Then as cryptography algorithms advance and the systems change, you also want to be able
to change the baseline cryptography algorithm, that's why we use a reconfigurable system so we can swap out the old algorithm and swap in a new algorithm or move more of them,
whatever, and for doing that we use the Linux kernel FPGA framework, and that's shown here. We have the internals of this framework shown here in the slightly darker blue than the other one, and that handles all the FPGA specific parts, and the user actually influences
what happens via device tree overlays, I come back to that in the next slide, and you obviously need some drivers that interact with the physical device, and that's those
ones here that are specific to whatever FPGA device you use. In our case this was a multiple system on chip type of platform, so we have those CPUs, actually it was two, and the FPGA part on one single chip, but other configurations
are working the same way. So you have the FPGA region that represents a physical region within the reconfigurable device, and this one is configured by the user here, as mentioned
via the device tree overlays, the FPGA manager over here manages the association of which firmware, or in this case bit stream actually, is loaded, and that's also leveraging the standard Linux interface structure for firmware loadings, that's the same device system that's
also used by USB systems for example, then the FPGA bridge part decouples every device specific things, and uses the decouples and the configuration access port, this port actually loads the firmware into the FPGA configuration memory, and the decoupler is associated
with those little devices, and during a reconfiguration process that decouples what is in the region from what is in the outside of the region, because the behavior of logic within the
region isn't specified during the reconfiguration process. So how do we actually trigger a reconfiguration, and how to separate what we want to specify,
what goes in there, parameters, platform drivers, or the bit stream itself, that's done via the device tree overlay, so every region that is within the FPGA has a stuck representation in the overall system device tree, and by loading this overlay
onto this addresses, the actual process of reconfiguration is triggered, so we put the device tree overlay loaded into the currently present device tree, that triggers the reconfiguration process, which is shown in the next slide, and the AES bit stream is configured into the FPGA, and afterwards the driver is loaded and the system is fully
operational and can use the new hardware. So that's the process of reconfiguration, we have the configuration part and the deactivation I would say, and it's a multiple of steps,
the coloring, I come back to that later on, so we start with loading the device tree overlay as already presented, and this triggers the bit stream loading part interacting with the firmware subsystem, and when the firmware is available, the devices are decoupled, so
to make the FPGA ready for reconfiguration and do not disturb any other parts in the FPGA that keep on working, and then the actually done reconfiguration is executed, that uses
the program and configuration access port, and loads the bit stream into the FPGA, which is actually one of the steps that contributes a larger amount of time to the overall process, afterwards if this configuration is completed, the region is coupled again, so that's from
this point on the hardware can be accessed, that has just been loaded into the FPGA, and after that the change that has just been done to the hardware needs to be reflected
into the device tree so that all other systems now know and are aware about this new hardware, so this is the application of the device tree overlay, after that all the other subsystems that might be involved and might use this new hardware now gets
triggered, so the driver is loaded that is specific to this new hardware, and all the subsystems are initialized like in our case the support for the crypto API, then the system is ready, it can be used, that's the cycle that usually people are interested
in, but as you have reconfigurable systems you also want to know how fast can I reconfigure a system, how long is my let's say dead time in case I cannot actually use the resources I have within the system, because they are bound within this process here, and that's
why we started to have a closer look at it, which steps are executed, and which take how long, and that's coming later on then, so let's say the execution phase ends at some time, you have encrypted your file or whatever you have been done, and then
you go on, you want to reuse this region, you want to put another algorithm into the device, the platform driver is unloaded, the DTO, the device you overlay is removed from
the currently present device tree, and the region is decoupled again from the system, so after this step we are ready again, and the complete process can start all over again, let's come to some of the results that we obtained during measuring the system, so as
there are a couple of steps involved with drivers from the chip vendor and also some more subsystems within the kernel, we used ftrace to gather information about when functions were entered and exit during the overall process, then originally the intention was
to show or to see what the performance of our accelerator was, but then we discovered about this interesting process and now we present these results because they are more general interesting than our specific ones, and we used ftrace for this, our bitstream
size, so the size of our accelerator was almost 6 megabytes that have been loaded into the FPGA and this overall process for all of that took about 135 milliseconds and
so you see if you want to change the encryption algorithm for your hardware accelerator to I don't know encrypt one email and then another one for another email you see that probably doesn't lead to a very efficient system because you spend more time reconfiguring
it than actually computing, an interesting part was to see that the second largest contributor to the overall time is actually loading the bitstream and that doesn't involve
the maintenance, but let's come back to that later, so that was what we originally also thought about was the main common contributor, that's the actual configuration process itself,
that's the time needed to transport the bitstream from the RAM to the configuration memory on the FPGA through various interconnect connectivity interfaces on the chip, and
the third largest contributor is the driver itself, so that's also interesting as this is not a part of the framework, so anybody who implements drivers for reconfigurable systems should think about the setup times for them, the good news is only 1% of the
overall time is spent within the FPGA framework on everything else that we haven't covered in this slide, so the framework itself seems to be quite well implemented with less overhead and shows good performance and that also leads to the next slide as we see performance
bottlenecks especially with respect to the FPGA reconfiguration interface, that's nothing we can change about as the FPGA vendors have to support that, also some issues within
the fabric are root causes for that, so that communication with the reconfiguration access port is actually a bit slower than it should be, the firmware caching itself has not yet been used in the system, so that can be implemented within the FPGA
framework in the kernel that currently does not leverage the caching mechanism which is already built into the firmware subsystem, additional components that have traditionally built into reconfigurable system supports are schedulers and governors or whatever
other things are needed to make an efficient use of the hardware that has just been enabled, that's hopefully intentionally left out of the scope of the FPGA manager as that's
very use case specific and we can implement it in user space anyway, yeah, using device tree overlays uses interfaces that are more or less stable currently, it has been developed
for supporting shields and modular embedded systems like the raspberry pies out there and we just reuse them for reconfigurable systems and that works quite well for now, which would be an interesting part is getting automatically generated device tree overlay
from the vendor tools as they usually do generate general drivers for hardware parts that have been created, some of the mappings like addresses or offsets and such things are known to those tools, so that could actually help with using those approach
and to get to our last slide it is a good reconfigurable system and we can use it, it's efficiently supports using reconfigurable systems and the reconfiguration times are
slow, fast but the overhead is slow and we can efficiently develop heterogeneous systems by them because we don't need reboots if you change our hardware, this is a traditional problem so you have fast test cycles, you have a compatibility layer for static and
reconfigurable systems so that's not especially needed to reconfigure but it can also be used to static systems. That concludes. That's it, thank you.