We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building a real-time embedded audio sampling application with MicroPython

00:00

Formal Metadata

Title
Building a real-time embedded audio sampling application with MicroPython
Title of Series
Number of Parts
160
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Building a real-time embedded audio sampling application with MicroPython [EuroPython 2017 - Talk - 2017-07-10 - Arengo] [Rimini, Italy] While demonstrating the pyboard to a group of colleagues, a challenge was set to produce a practical demonstration of the device that would provide automatic and continuous voice recording and playback of short spoken phrases similar to that found in a number of talking toys. This talk covers the process of designing and testing the embedded real-time Python solution and includes the architecture, test methodologies and recordings as the stages progressed to the final source code. The talk concludes with a live demonstration of the final application. The solution uses MicroPython (an embedded implementation of Python 3), the pyboard and its AMP Audio skin. MicroPython is a lean implementation of Python 3 that is optimised to run in a very small footprint on micro-controllers and in constrained environments. It was created by the Australian programmer and physicist Damien George, after a successful Kickstarter backed campaign in 2013. The pyboard is the original reference hardware created to host MicroPython. It is a compact low-power board based on an ARM processor with a heap of approximately 100kBytes that can run at 168MHz. It has sufficient hardware services and real-time capabilities to control all kinds of electronic projects. The AMP Audio skin is a small additional module that attaches to the pyboard that adds a small power amplifier, speaker and a microphone with a pre-amp
95
Thumbnail
1:04:08
102
119
Thumbnail
1:00:51
BuildingSoftwareExecution unitIntelMereologyImplementationComputer hardwareAnalog-to-digital converterDigital-to-analog converterLoginComputer configurationSpeech synthesisDigital signalAnalogyCartesian coordinate systemComputer hardwareSpeech synthesisTerm (mathematics)Multiplication signAnalytic continuationMoment (mathematics)TrailDomain nameComputer programmingBitFile formatWhiteboardMereologyExpert systemSampling (statistics)Game controllerModule (mathematics)Numeral (linguistics)Revision controlAnalogyAdditionMotion captureMicrocontrollerMaxima and minimaPoint (geometry)Standard deviationOrder (biology)Row (database)2 (number)Price indexImplementationMiniDiscImage resolutionCompact spaceProcess (computing)Frame problemKey (cryptography)Parity (mathematics)Computer animationLecture/Conference
WhiteboardReading (process)Analog-to-digital converterDigital-to-analog converterDigital signalAnalogyVolumeFormal verificationMotion captureFrequencyFunction (mathematics)Sample (statistics)NoiseImage resolutionFrame problemObject (grammar)Noise (electronics)Mechanism designPattern languageBit rateMemory managementBuffer solutionSpeech synthesisSoftwareMotion captureFrequencyFunctional (mathematics)ProgrammschleifePersonal identification numberSampling (statistics)Connectivity (graph theory)Game controllerCodeRight angleReading (process)CalculationSemiconductor memoryMultiplication signCASE <Informatik>BitPrice indexDiagramRow (database)2 (number)Dependent and independent variablesWhiteboardReal-time operating systemInformationDigital-to-analog converterData bufferSystem callOrder (biology)Position operatorSignal processingMessage passingReal numberEstimatorBlock (periodic table)Process (computing)Operating systemNumberComputer hardwareModule (mathematics)Java appletBus (computing)Axiom of choiceVolume (thermodynamics)AlgorithmMereologyNormal (geometry)Arithmetic meanComputer configurationImage resolutionTouchscreenInsertion lossComputer animation
State of matterPlastikkarteFunction (mathematics)Sample (statistics)Motion captureData bufferOpen setMereologyFunctional (mathematics)Motion captureSystem callBitSound effectSpeech synthesisSampling (statistics)Buffer solutionCartesian coordinate systemExistential quantificationLoop (music)Game controllerNumberCodeAddress spaceRow (database)PlastikkarteSemiconductor memoryModule (mathematics)Mathematical analysisAsynchronous Transfer ModeLevel (video gaming)Slide ruleDisk read-and-write headDomain namePhysical systemPurchasingFlash memoryDiagramRight angleBand matrixMultiplication signSet (mathematics)Bit rateComputer-assisted translationOrder (biology)WhiteboardImplementationNoise (electronics)WebsiteQuicksortMiniDiscComputer hardwareModulare ProgrammierungPrice indexElement (mathematics)CASE <Informatik>Object (grammar)WordDependent and independent variablesSource codeService (economics)Open setGraph coloringInterior (topology)Compact spaceSoftwareType theoryComputer animation
Process (computing)Functional (mathematics)Term (mathematics)Endliche ModelltheorieMultiplication signCodeLecture/Conference
CAN busSpherical capMotion captureRule of inferenceVolume (thermodynamics)Cellular automatonSampling (statistics)Functional (mathematics)Greatest elementObject (grammar)Buffer solutionMultiplication signProcess (computing)Level (video gaming)Maxima and minimaGame controllerSemiconductor memoryInterrupt <Informatik>BitCodeModule (mathematics)FrequencyBefehlsprozessorContext awarenessPoint (geometry)Personal identification numberBackupParticle systemFloating pointRevision controlWhiteboardQuicksortVariable (mathematics)Computer animation
MereologyNoise (electronics)Software testingRow (database)Lecture/Conference
Module (mathematics)DiagramPoint (geometry)QuicksortConnectivity (graph theory)Internet service providerPlanningNoise (electronics)Proper mapAnalogyDigitizingLecture/ConferenceComputer animation
BitBefehlsprozessorWhiteboardPattern recognitionDomain nameComputer hardwarePoint (geometry)LengthTime domainLevel (video gaming)Multiplication sign2 (number)Open setMaxima and minimaSpeech synthesisAnalytic continuationReading (process)Goodness of fitCodeCommitment schemeAuthorizationPhysical lawUniverse (mathematics)Group actionEmailOpen sourceSignal processing10 (number)Fraction (mathematics)Suite (music)PiSemiconductor memoryConnectivity (graph theory)Lecture/Conference
Representation (politics)Time domainSpeech synthesisMusical ensemblePattern recognitionSpeech synthesisMultiplication signVideo gameLecture/Conference
Transcript: English(auto-generated)
Thank you very much, I assume you can all hear me. Okay, so this session is a beginner session. We're gonna walk through the design of an audio application using MicroPython and the Pi board.
So we'll be constructing a continuous listen, repeat audio application, something like you might find in a toy, you talk to it and it talks back to you. This is for beginners, so I'm not gonna assume anything in terms of programming and stuff like that. And we're using obviously MicroPython, but the Pi board and its audio skin.
So MicroPython, unfortunately there's another track at the moment which is introducing you to MicroPython, so most of the experts are probably there. But it's a lean implementation of Python 3, optimized for microcontrollers, but with numerous modules for hardware control. The board itself is really quite small.
I've got a copy here, but luckily I've brought a bigger version. So it's basically just a microcontroller with a few additional bits and pieces like LEDs and switches. It's very small in terms of capabilities in terms of RAM and processing power, but it does come with really cool bits of hardware.
And in Python, all you have to do is import PYB and you've got access to pretty much all the hardware, quite simply. Christine Spindler is running a poster session in the sponsor's hall, so she can tell you a lot more about the hardware and its capabilities. We also need an audio skin,
the ability to record and play back. The board itself doesn't come with a microphone or a loudspeaker, but you can basically attach an audio skin. I've got one here on an aluminum casing that gives you basically a microphone, a loudspeaker, and the ability to add your own microphone
if you want to. But be prepared. The audio skin comes in bits, so in order to use it, you're going to need good soldering skills and a very good soldering iron. What we're going to be doing is recording and playing back speech. So I suppose we need to understand a little bit about what speech is.
I have a spectrogram, a frequency domain recording of the utterance six here, and basically you've got an axis which is showing you frequency, so there are your high frequency parts of your six, and this axis is effectively time in terms of frames,
which are about 10 milliseconds long. So that's 350 milliseconds worth of speech. If you want to record high quality speech, you need to sample a very high frequency because human speech goes up to 14 kilohertz. But over the phone, you basically are restricted to something like three to four kilohertz because they discover that's all you need
to intelligently understand what's being spoken. So how do we go from analog to digital? Well, we're going to use an ADC, of course, which is on the board, and we have to consider a few items, and that is effectively the sampling frequency, sampling frequency,
and there are basic standards here. Nyquist Shannon tells us that in order to capture an analog signal accurately, we have to sample at twice the speed of the maximum frequency. If we don't, what we've got here is sampling at 1.5 times the speed, and you can see the sample points
basically give you an indication it's actually a lower frequency, and that's called aliasing. So you have to sample quite quickly. The second consideration, of course, is the resolution, how many bits you're going to store each individual sample, and the Pi board gives you devices which can record at eight bits or 12 bits.
Bear in mind that a compact disk, the original formats were 16 bits at 44.1 kilohertz. 44.1 kilohertz is an odd number, but they're a very good reason for it. So you need to understand the frequencies you're gonna be capturing, the capture resolution,
and how much time you need to record speech. So how do we record on a Pi board? Well, we basically create an ADC object, and then we can use one of two read methods. We can call read timed or just read. So we just import, after we've imported our Pi board,
we create our ADC object. It's on a particular pin on your board, so you just connect it to pin X22, and once you've done that, you can hand your reading, the process of capturing samples over to MicroPython, and it will effectively read a block of data for you
at a frequency you dictate. So you can set up a buffer and say capture at 6,000 samples per second. That's okay, but you've got no control over the real-time aspect of this. It captures, and you have to wait till it's finished. So building a device that you want to be continuous
and be interactive, that's not the best thing to use. So in my particular case, I'm forced to read the ADC samples manually, and it's my responsibility then to record them at a designated rate. So we can capture. How do we replay?
Well, that's using a digital-to-analog converter, and again, there's one of those on the board. It's quite simple to set up. There are two parts to the DAC. You need to set the volume, and there's a potentiometer on the I2C bus, and then once you've set the volume, you just provide the DAC with your data buffer, and it will play it back out for you.
So setting the volume is quite simple as connecting to address 46 on the I2C bus, and once you've done that, you can create a DAC object. You define the bit resolution that you want to play back at. You have a choice of eight or 12 here, and then you can use its right-timed method,
provide a buffer, provide a timer, a built-in object in MicroPython, and tell it how to play the audio. In this particular case, normal means just play the buffer till it's finished and then stopped. I think they have a circular option, so you can fill a buffer with a repeating pattern
and just get it to play continuously. So I have mechanisms to record and play back. How do I know that it's working sufficiently quickly? It has to be real time because I have to capture the audio at a designated rate. So I'm gonna rely on hardware components to do this,
but I'm also gonna have software to do the actual signal processing. I say signal processing there. There's not a lot you can do on the Pi board at this stage. But there are two ways to do it. You can either use an oscilloscope, attach it to a pin on the board, and you can use, oops,
you can use a Pi board pin and give it a symbolic name, and you can just set that pin high or low, and your oscilloscope, you can see, you can measure the duration of the code that you surrounded the pin control with. Alternatively, you can use timer objects.
So you can create a timer to count at one microsecond rate and just read it. So the advantage of the oscilloscope is it gives me the opportunity to put a little picture up. So here's a screen grab of the oscilloscope on the board that I've been using,
and this low period here, this trough, is effectively the duration of the capture function. This is the method that is running at eight kilohertz. In this particular case, it was a recording done at six kilohertz. So this is all the time it takes to read from the ADC and put that sample into memory and do some crude calculations with it.
As you can see, it actually takes quite a bit of time, 104 microseconds, but you can do clever things with the pin control and discover that using a timer costs you 20 microseconds and actually doing the read itself is a relatively expensive 50 microseconds. So that gives you a clear indication
of how fast you could actually sample this at. You're not gonna get 20 kilohertz because it's gonna take you 50 microseconds just to get the value into your board. So initial setup is basically I need a buffer to write my data into, I need a function to collect the data, and I need a play mechanism.
So a crude diagram here. Here's my capture function. Its responsibility is just to read from the ADC. It's connected to a timer, and I simply create a timer and attach a callback function. So here I could have a sample frequency of eight kilohertz. So this function will get called at 8,000 times a second.
It simply does a read, puts it in a buffer. When that's finished, my play function goes and provides that data to the DAC. So I have some recordings that I've made earlier. I wish I'd said something a little more impressive than one, two, three in Italian,
but hopefully this will work. I'll just play back an initial recording at eight kilohertz. So that's uno, due, etre, but you'll notice it's very, very noisy.
And that's at 12 bits, eight kilohertz. And when you analyze the noise, you get really quite disturbed and disappointed that of your 4,000 samples, 300 or so are just noise. It's a very noisy signal on this board for a number of reasons.
But I'm losing quite a bit of information just in the noise, and it's unpleasant to listen to. So there are quick ways in order to reduce the noise, and the simple one I put together is something that just searches for periods of silence in the recorded speech and just sets them to zero. So to do that, I basically record into my speech buffer,
as I call it, and on a second pass, I go through this speech buffer, knowing roughly where zero is. Again, this has to adapt. This is not a constant. Depending on your hardware, zero might not be in the middle and I go through the buffer looking for periods with zero
and mark their positions. And then that gives me the opportunity to do two things. I can then adjust my estimate for zero because it might have changed, but I can also now go through the buffer and set all of the noisy silence to real silence. And I'll play back, if we hear the original,
and then we apply this simple algorithm. Yep, so a lot cleaner. There's still noise in the speech signal, and you can't simply just take away those bits because the speech in the noise is still valuable.
So I had recordings at eight kilos at 12 bits, so I decided, well, there is so much noise, so how about recording at eight bits rather than 12? If I play that.
So eight bits gives me an added advantage in that I can store more speech. On the board, I'm very limited in the Java heap. I've probably got about 100K bytes minus whatever the operating system and the Python modules take away from me. So I can only record very short periods of speech, but at eight bits, I can get about eight to 10 seconds.
And there you've probably heard there's a lot of problems with six and seven, say, and set A, and that's probably because I'm recording at eight kilohertz so we're right on the edge of bandwidth here. So a lot of the high frequency stuff just gets lost or disturbed.
So in order to compensate for that, I decided, well, how about increasing the capture rate from eight kilohertz to 10 kilohertz? So we'll hear the eight again,
and then I move up to 10 kilohertz. Hopefully the say and set A will be a little better. Four, three, three, one, two. So I've made some improvements. Why can't I sample at 44.1 kilohertz like I have with the compact disks? Well, as we've seen, it takes me too long to read from the ADC,
and also the power of the device, sorry, the memory of the device just won't let me capture much data at 44 kilohertz even if I could sample at that rate. I wanted some application refinements, so it's okay just recording and playing back, but the purpose of this was to impress my nephew,
look, here's a little device you could put in your cuddly toy, talk to it, and it talks back to you, so it has to have some sort of automatic, whoops, wrong button, automatic speech detection, so you don't have to press a button for it to listen, so I wanted to be continuously listening. I wanted to be able to record to the SD card.
This device gives you the ability to plug in a fast SD card. I wanted to play with the LEDs to give you some indication that it's listening, it's recording, it's playing back, and I wanted to be able, at the very least, to disable the device by pressing the user button so it stopped listening. So, how do we interact with the user button?
Well, there are two switches, a reset button on the board, which reboots it, obviously, and a user switch, and these are brought out as little brass buttons on your aluminum casing, so in order to use a button, all I have to do is provide a handler function, the work I want to do when the button's pressed,
I need a switch object which represents the switch, and then I just attach the handler via a callback method, so there's my work. In this particular case, I'm just setting a control variable which just prevents the recording from taking place. I create a switch object from the Pi board module, and then I invoke the callback method
providing my function, so every time I press that button, that function runs, so it's quite straightforward. Driving the LEDs, well, that's pretty cool. We like things that light up and flash, so it's got four LEDs and four colors, so you can create, much like the switch,
you can create an LED object, and you can switch them on, off, toggle them, and so on, so again, Pi board LED two for the green LED, and I can switch it on, I can switch it off, I can toggle it. LED four supports an intensity method which allows you to provide 256 levels of brightness.
They all support the intensity call, but only the blue LED allows you to do the cool throbbing type of effect. Automatic speech detection. Okay, so this requires a little bit of thought. The Pi board doesn't give you this. This is all part of my application domain code,
so basically, I want to capture some speech continuously and only start recording when I think something's being spoken, so this requires, obviously, the ability to determine whether there's speech or not, and that's relatively simple in this implementation,
so I have two buffers. I have my original speech buffer that I'm recording to, but I have a very small buffer that's treated like a circular memory, so as the capture function is running, it's in two modes. One, it's listening for speech, and two, it's then recording, so while it's listening,
it's writing to this circular buffer and doing some very crude analysis on the signal level to determine whether it's silence or noise, and because I've done some analysis of the noise, I can understand roughly at what level these samples start to look like speech, so once it's discovered that there's speech, it then switches to a straight record
into this main buffer and then stops. Writing to an SD card is useful for storing data in recordings. The device does have some flash on board, but that's very, very small and extremely slow, so you have to insert an SD card,
and discovering an SD card is, my approach is really quite crude. Some MicroPython guru might say, oh, no, you don't want to do that, so I basically just look for SD in the system path. If there's an SD, there's an SD device, and then I just do my basic Python open write close.
So finally, putting it all together, I have sketched this application diagram. The green parts are basically the MicroPython hardware services. The blue parts are the software modules I've provided,
and the red is basically just data. So when we initialize the board, the capture playback main loop essentially controls the capture function with some control variables. So it'll tell it, right, start listening. If I've not pressed the please stop listening hold button, the capture function then essentially listens for speech.
It's capturing all the time. Once it's detected speech, it writes into this long buffer. It conveniently lights the amber LED for me to tell me it's recording. Once that is done, it then unlights that LED and sets a control variable.
So at least my outer capture loop now understands a recording has been made, so it can set the blue playback LED and does one of a number of things. Now remember that I have a little bit of speech, the speech detection buffer, so I need to attach that to my original buffer
because in my crude DAC method, I can only give it the address of a buffer and play it, and also that is circular, so we don't know where the start of it is. So the copy function unrolls this, untangles this buffer, puts it into there. I then, once the copy's done, I then run my crude clever attenuation method,
which adjusts that buffer. It adjusts the value of zero for the next capture, et cetera, et cetera, and then calls the play function. Once that's done, it optionally dumps to the SD card if it exists, and then the whole thing starts all over again. And that effectively is the entire application.
The code is all Python, and I've published it onto GitHub. So all I've got to say now is thank you very much for listening. I hope this has been useful. You've got my contact details there, and the address of the source code,
and I'll probably publish these slides up to that address later on. If I just play. Thank you, Alan, for that wonderful talk.
So he automated away my job. That's not good. Okay, yeah, thank you. I hope there are some questions. Are there? Okay, in the back.
Hi, so you have a very small budget in terms of time to process. Does that mean there are certain things that you shouldn't do, like probably import modules? How about calling functions? And in general, is there any way to make sure that no matter what pass your code takes, it would still be done within the budget you've allocated?
Yeah, indeed. So the overriding rule is inside your capture function, you have to do as the absolute minimum that's required, and my capture function essentially just calls the ADC, checks the sample value, and puts it in the buffer.
All the processing that you would ever want to do that signal, I can't do in the buffer because I've measured the time. You're in a callback function. Apart from other interrupts that are going on the board, you probably are fairly safe in assuming you've got control of the CPU here. You're a Python on metal, so apart from other interrupts that are occurring,
you probably don't have a lot to do. You could, yes, I don't know what would happen if your capture function lasted too long. You could probably detect that by setting some sort of lock variable. The one thing you can't do in a callback is you can't create any Python objects, and you can't do floating point processing in Python
because that in itself creates Python objects. So you really are at the bottom level of code, just moving bits and bytes around memory. But that's why I had to put in the pin function and the timers to make sure. There is some jitter. You can see that when the board is running, but as long as you're within quite a percentage point
of your allowed period, you're probably quite safe. Thank you, that was fascinating. You mentioned your nephew, I think, at the beginning. Have you used this with children or young people in an educational context, and if so, how did it work out?
Not yet, the problem that the Pi board has is that when I pull the cable, there's no built-in battery backup, so it has to be powered by something. So I'm waiting for them to produce a battery module so I could put this into the stuffed toy to allow them to do that sort of thing.
I mean, these things are incredibly popular now. They're all over the place. You know, your squeezy toy, you press the hand and talk to it. But no, I haven't. I haven't got back to him yet. I've just released version one, so thank you. So some questions over there.
Have you done any testing with clear signals so you could tell which part of the noise that we're hearing comes from the recording and which one comes from the playback? Because usually those speakers are fairly low quality. Yeah, so no, I haven't,
but the Pi module does come with schematics and diagrams, so one of the things you can do is obviously not use their built-in microphone, and there's an ability to provide your own signal. The path to the analog components here is very short,
but that is the next step, is to provide a reference signal on here to see where the noise is coming from. It's not unusual to find noise in devices like this. They're not exactly professional quality analog components. They need proper grounding and earth planes to separate them from the digital noise,
but no, that's the next step, is to provide some sort of reference signal in at this point. Okay, so first here. Thanks for the talk. It seems like a really good way of not only getting into embedded, but also learning Python, because I'm quite new to Python, so I'm gonna be playing around with this. I've got two quick questions, hopefully. Firstly, coming from an embedded background,
can you do the ADC in the background? So, for example, in your code, you were waiting for the ADC to get back to you, and then you were buffering that data. Can you just pull that and then have an interrupt come back to you, or, for example, use a double-buffered approach like you would do with old audio?
Yeah, sorry, and the question was, do you know if there's any where in Rimini we can buy these whilst we're here? You've got one here. No, that's a good point. The ADC read method, obviously, MicroPython's open source, so we could go in there and have a look at what's going on, but the read timed is not very useful,
because you don't know where it is, and it's real time, so at the end of the day, you have to do whatever you want once every 8,000 times a second. If you can't guarantee that, then there's a maximum length of time that you can record, so what I've done is the absolute minimum,
which is read from the ADC and essentially put it into memory, but I am checking the signal level. I could connect two boards together, maybe connect this to something more powerful like a Raspberry Pi, and do a lot more signal processing, because you get some really clever things when you turn your continuous time domain signal
into a frequency domain, because then you can walk into speech recognition and actually understand what people are saying. But no, you really have, you're very limited in what you can do. But yeah, not as cheap as the Pi Zero, and I don't know whether MicroPython runs on the Pi Zero. I would guess that it does, but the beauty of this board is that it's got so much hardware ready,
and you just have a microphone and a loudspeaker. Okay. Yeah, so actually my question related to what you said just now. I wonder, what's the advantage of using MicroPython on Pi board versus using one of the,
you know, BeagleBone, Raspberry Pi, or one of these boards that are more powerful and allow you to use the full-blown Python, you know, and all the hardware that you can really work with? So the main reason I looked at this device is its physical size. It comes with headers, but you can buy the board without any headers. So if you're gonna attach this to a smaller component,
like a toy, its size is important. And also it's the power consumption. I think this is only drawing a fraction, you know, tens of milliamps, whereas the Raspberry Pi just consumes too much tower. It's too bulky. So it's a trade-off between power in CPU
and battery consumption and things like that. So I mean, this is about half the size of a Pi Zero. But it's loaded with quite a bit of hardware to play with. So the reason why it's not cheap, I think it's 35 euros or something, so it's not terribly cheap compared to the Pi Zero, but it's got some really cool hardware features.
And it's Python on Metal, so that's cool. And it makes noises. Okay, so maybe I have a last question. So when is Alexa running on this thing, and I can say okay? Yes, indeed. Well, as I say, the problem is getting from the continuous domain signal to the frequency domain.
So you need to get to something like that. Once you're able to get the data into that, you can do your own speech recognition, and I've done that in my past life. But Alexa, they have an API, so you can give it data. So it just needs time. Okay, so let's thank him again.
Thank you.