GPU-Enabled Polyphase Filterbanks

Cite

FOSDEM VZW

Kraemer, Jan

Formal Metadata

Title

GPU-Enabled Polyphase Filterbanks

Subtitle

Everyday I'm Shuffling

Title of Series

FOSDEM 2017

Number of Parts

611

Author

Kraemer, Jan

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/42065 (DOI)

Publisher

FOSDEM VZW

Release Date

2018

Language

English

Production Year

2017

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Frequency Division Multiple Access (FDMA) schemes are widely used in manyexisting communication systems and standards. On Software Defined Radio (SDR)platforms, separating the channels can prove more difficult though due to highrequirements for the digital filters. This talk will showcase animplementation of a polyphase filterbank on a graphics processor unit (GPU)that can help overcome the heavy computational load of those filters. In thesoftware, all the partitioned filters can run in parallel. Each of thesefilters produces output samples for numerous input samples simultaneously,thus providing an additional parallel approach. Furthermore, several rationaloversampling factors are supported by this implementation. Operations foroversampling can as well be implemented to run in parallel, due to the massiveamount of usable hardware threads in a GPU. Hence, the effects of oversamplingon the throughput can be reduced. On an Nvidia GTX970 GPU, this implementationachieved a throughput of 67.43 MSamples per second, 12 times higher than the(optimized) general purpose processor (GPP) version. Separating information using the frequency space is rather simple if one cando it with appropriate hardware. Cycle accurate operations executed in FPGAand ASIC fabric allows one to carefully create the desired waveform and thencontrol filterbanks and oscillators. This way, the desired information can getinto the air at the exact time and frequency one fancies. Unfortunately,controlling a software defined radio (SDR) in that manner can prove to bequite a challenge. Timing constraints when changing the center frequency oftenturn out to be the main limiting factor [1]. Instead, a pure SDR approach isoften the desired way of generating multifrequency content. To cope with hardlatency/timing constraints, the solution is to generate the whole spectrum atonce and to position the desired information digitally into the time/frequencymatrix. But this poses quite a challenge as one usually, depending on the number ofchannels, has to tremendously oversample the signal to generate the aggregatedbandwidth. The needed anti-imaging filter at the transmitter side and theseparation/anti-aliasing filters on the receiver side can grow to obsceneamounts of filter taps. Coping with this amount of computational load can bedemanding, even for high-end general purpose processors (GPP). Using apolyphase filterbank (PFB) to do the synthesis/separation of the waveform canhelp eminently with reducing the computational load. PFBs do this by breakingdown the needed filter in several polyphase partitions and doing the filteringon these partitions. A division of a filter in M polyphase partitions canalready reduce the theoretical computational load by exactly a factor of M [2]. Additionally, the Fast Fourier Transform (FFT) can be used to extract orgenerate all channels needed at once, using just one filteringoperation.Still, the challenge of separating the channels can prove to be toomuch for the GPP, even with the help of a PFB channelizer/synthesizer. Using a graphics processor unit (GPU) can help immensely with offloading thecritical task of separating the dedicated channels, providing some headroomfor the GPP to perform the remaining task of decoding the informationimprinted on the individual channels. Filtering itself is an operation thatcan be mapped pretty well onto many-core architectures, especially if theconstraints on latency and buffer sizes are not as stringent. But theoperations inside a PFB really seem to be made for over the topparallelization. The notion of performing several independent polyphasepartitioned filtering operations at once is something that can be easilymapped onto common GPU architecture abstractions, as every polyphase filterpartition can run simultaneously. Most GPUs have even more computing capabilities and their resources would lieidle if one only parallelized the PFB algorithm in terms of polyphasepartitions. Additionally, the computation of consecutive output samples isindependent from each other. So one can further parallelize the algorithm bycomputing several output samples concurrently. If the need for oversampling ofthe signal arises, that too can be exploited to produce the additional outputsamples in parallel, thereby minimizing the additional computational stresscaused by oversampling. This talk will showcase, how all operations of the algorithm of the PFBchannelizer/synthesizer can be mapped to a GPU using the CUDA framework.Examples of the code will be shown to highlight some of the oddities one canencounter when developing code for GPUs or many-core architectures in general.Benchmarks and results of the current implementation will be presented anddiscussed. As of now, on an Nvidia GTX970, the implementation reaches athroughput of 67.43 MSamples per second (45 channels, 5x oversampling, 1318tap prototype filter), which is 12 times higher than an optimized GPP (IntelI7 6850K)…