Building a real-time embedded audio sampling application with MicroPython

Video in TIB AV-Portal: Building a real-time embedded audio sampling application with MicroPython

Formal Metadata

Building a real-time embedded audio sampling application with MicroPython
Title of Series
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Release Date

Content Metadata

Subject Area
Building a real-time embedded audio sampling application with MicroPython [EuroPython 2017 - Talk - 2017-07-10 - Arengo] [Rimini, Italy] While demonstrating the pyboard to a group of colleagues, a challenge was set to produce a practical demonstration of the device that would provide automatic and continuous voice recording and playback of short spoken phrases similar to that found in a number of talking toys. This talk covers the process of designing and testing the embedded real-time Python solution and includes the architecture, test methodologies and recordings as the stages progressed to the final source code. The talk concludes with a live demonstration of the final application. The solution uses MicroPython (an embedded implementation of Python 3), the pyboard and its AMP Audio skin. MicroPython is a lean implementation of Python 3 that is optimised to run in a very small footprint on micro-controllers and in constrained environments. It was created by the Australian programmer and physicist Damien George, after a successful Kickstarter backed campaign in 2013. The pyboard is the original reference hardware created to host MicroPython. It is a compact low-power board based on an ARM processor with a heap of approximately 100kBytes that can run at 168MHz. It has sufficient hardware services and real-time capabilities to control all kinds of electronic projects. The AMP Audio skin is a small additional module that attaches to the pyboard that adds a small power amplifier, speaker and a microphone with a pre-amp
Intel Multiplication sign Analogy Mereology Computer programming Analogy Speech synthesis Compact space File format Building Moment (mathematics) Sampling (statistics) Bit Mereology Digital signal Price index Digital-to-analog converter Numeral (linguistics) Process (computing) Computer configuration Order (biology) MiniDisc Whiteboard Row (database) Point (geometry) Trail Game controller Implementation Parity (mathematics) Image resolution Motion capture Microcontroller Revision control Term (mathematics) Software Computer hardware Implementation Module (mathematics) Addition Execution unit Standard deviation Key (cryptography) Expert system Login Cartesian coordinate system Frame problem Computer hardware Analog-to-digital converter Speech synthesis
Reading (process) Dependent and independent variables Game controller Block (periodic table) Sampling (statistics) Motion capture Real-time operating system Process (computing) Whiteboard Personal digital assistant Analog-to-digital converter Buffer solution Object (grammar) Whiteboard Reading (process)
Axiom of choice Game controller Code Image resolution Connectivity (graph theory) Motion capture Analogy Real-time operating system Mereology Digital-to-analog converter Mechanism design Programmschleife Bit rate Computer configuration Computer hardware Bus (computing) Data buffer Personal identification number Volume (thermodynamics) Bit Digital signal Signal processing Digital-to-analog converter Arithmetic mean Software Personal digital assistant Buffer solution Normal (geometry) Pattern language Right angle Object (grammar) Whiteboard Volume
Personal identification number Functional (mathematics) Dependent and independent variables Multiplication sign Motion capture Sampling (statistics) Bit Motion capture Price index 2 (number) Frequency Mechanism design Frequency Sample (statistics) Semiconductor memory Personal digital assistant Function (mathematics) Calculation Buffer solution Formal verification Diagram Whiteboard Reading (process)
Frame problem Noise (electronics) Algorithm Information Image resolution Real number Sampling (statistics) Bit System call Number Frequency Message passing Estimator Sample (statistics) Order (biology) Buffer solution Computer hardware Speech synthesis Noise Whiteboard Position operator Row (database)
Java applet Code Multiplication sign Motion capture Set (mathematics) Plastikkarte Frequency Bit rate Semiconductor memory Operating system Noise Computer-assisted translation Module (mathematics) Noise (electronics) State of matter Memory management Sampling (statistics) Plastikkarte Bit Price index Cartesian coordinate system Band matrix Word Order (biology) Website MiniDisc Speech synthesis Right angle Whiteboard Quicksort
Module (mathematics) Functional (mathematics) Dependent and independent variables Multiplication sign Sound effect Graph coloring System call Element (mathematics) Personal digital assistant Function (mathematics) Order (biology) Speech synthesis Energy level Whiteboard Object (grammar) Row (database)
Purchasing Implementation Functional (mathematics) Existential quantification Game controller Service (economics) Code Interior (topology) Multiplication sign Flash memory Motion capture Open set Modulare Programmierung Plastikkarte Mereology Disk read-and-write head Number Semiconductor memory Computer hardware Energy level Diagram Address space Physical system Domain name Noise (electronics) Dependent and independent variables Mathematical analysis Sampling (statistics) Plastikkarte Bit Motion capture Cartesian coordinate system Open set Loop (music) Sample (statistics) Function (mathematics) Buffer solution Speech synthesis Right angle Row (database) Asynchronous Transfer Mode Data buffer
Slide rule Functional (mathematics) Process (computing) Term (mathematics) Multiplication sign Source code Sound effect Endliche Modelltheorie Address space
Point (geometry) Game controller Context awareness Functional (mathematics) Greatest element Backup Code Multiplication sign Motion capture Mereology Rule of inference Revision control Frequency Semiconductor memory Energy level Software testing Diagram Personal identification number Module (mathematics) Noise (electronics) Cellular automaton Floating point Sampling (statistics) Bit Volume (thermodynamics) Maxima and minima CAN bus Particle system Process (computing) Befehlsprozessor Spherical cap Buffer solution Interrupt <Informatik> Whiteboard Object (grammar) Row (database)
Point (geometry) Suite (music) Group action Open source Code Length Connectivity (graph theory) Multiplication sign Fraction (mathematics) Goodness of fit Pi Semiconductor memory Computer hardware Authorization Energy level Analytic continuation Domain name Noise (electronics) Email Pattern recognition Physical law Planning Maxima and minima Bit Time domain Signal processing 10 (number) Commitment scheme Internet service provider Universe (mathematics) Speech synthesis Quicksort Whiteboard Reading (process)
Pattern recognition Multiplication sign Musical ensemble Video game Speech synthesis Representation (politics) Time domain Speech synthesis
thank you very much I any can all hear me OK so this session is
uh beginners session how would a
walk through the design of uh and audio application using micro Python and apply board so will be constructing a continuous listen repeated audio application something like you might find draw you talk to in a talk back to you that this is the key for beginners so I'm not gonna assume anything in terms of programming and stuff like that and we're using obviously micro Python but the plyboard and it's uh orders skin a so micro Python and unvoiced there's another track at the moment which is introducing you to market place insomuch that experts on a probably there but it's a lean implementation of Python 3 you optimize for microcontrollers but with numerous modules for hardware control um the board itself is really quite small I've I've got a copy here but luckily a broader a bigger version so it's basically just a microcontroller with a few additional bits and pieces like LED reasons which is it's very small in terms of capabilities and in terms of RAM processing power but it does come with a really cool bits of hardware and Python all you have to do is import y b and you've got access to pretty much all hardware quite simply Christine Spindler's running a poster session in um sponsors whole so she can tell you a lot more about the hardware and its capabilities we also need an audio skin the ability to record and play back the board itself doesn't come with a microphone or a loudspeaker but you can basically attach an audio skin go 1 here on an aluminium casing that gives you basically uh a microphone and a loudspeaker and the ability to add your own microphone if you want them but be prepared to uh the skin comes in bits so in order to use it you're going to need a good solving skills in a very good so ring on about what we're gonna be doing is recording and playing back speech so I suppose we need to understand a little bit about what speech is I I have up on a spectrogram a frequency-domain recording of the utterance 6 here and basically you go axis which is showing you frequency so there your high-frequency SST parts of your 6 and this axis is effectively time in terms of frames richer about 10 ms long that's 350 ms with speech if you want to record high quality speech you need to root out sample uh very high frequency because human speech goes up to 14 kilohertz but over the phone you basically restricted to select 3 to 4 kilohertz because they discover that's all you need to intelligently understand what's being spoken so how do we go from analog to digital well we're going to use an ADC of course which is on the board and we have to consider a few items and that is effectively the sampling frequency sampling frequency um and their basic standards here Nyquist-Shannon tells us that in order to capture an analog signal accurately we have to sample at twice the speed of the maximum frequency here we've got an if we don't know what we've got here is sampling it's at 1 . 5 times the speed and you can see the sample points basically give you a rare and indication actually lower frequency and that's called aliasing so you have to sample quite quickly the 2nd consideration of course is that the resolution how many bits you're going to store each individual sample and the plyboard gives you devices which can uh uh recorded to 8 bits or 12 bits at their mind compact disk original formats were 16 bits of 44 . 1 kilohertz 44 . 1 kilohertz is an odd number but they're very good reason for the need to understand that the the frequencies capturing uh the capture resolution and how much time you need to to recorded speech so how do we recorded the plyboard
while we basically create an ADC object and and then we can
use 1 or 2 read methods we can call retimed or just read so we just imports have after we've import about board we create our ADC object it's on uh particular pain on your board connected to pin x 22 once you've done that you can um hand your reading you the process of capturing samples over 2 micro Python and it will effectively read at a block of data for you at a frequency you dictate so you can set up a buffer and say capture at 6 thousand samples per 2nd that's OK but you got no control over uh the real time aspect of this it captures and you have to wait it's finished so building a device that you want to be continuous and and be interactive that's not the best thing to use so you in my particular case I'm forced to read the ADC samples manually the and it's my responsibility then to recall them at a designated rated on so we can capture how do we
replay well that's using a digital-to-analog converter and again there's 1 of those on the board is quite simple to set up there are 2 parts to the attack you need to set the volume and there's a potentiometer on the i squared the bus and then once you've set the volume you just provide the DAC with your data buffer and it will play it back out for you so setting the volume is quite simple as thank you connecting to address 46 on the ice chrissy bus and once you've done that you can create a DAQ object you define the bit resolution that you want to play back at a you have a choice of 8 or 12 here and then you can use its right timed method provide a buffer provide a timer a built-in object in in micro Python and and that tells how to play the audio in this particular case normal means just played a buffer to it's finished and then stopped I think they have a circular option you can fill a buffer with a repeating pattern just get to play continuously so I have mechanisms
to recording playback at how do I know that is working uh sufficiently quickly it has to be real time because I have to capture the audio at a designated rates so i'm gonna reliant hardware components to do this but when I I'm also going to have software to do the actual signal processing I say signal processing that is not what you can do on applied board uh at this stage and but but there are 2 ways to do it you can either use an oscilloscope attach it to a pin on the board and you can you loops you can use a apply boat pain it give it a symbolic name and you can just set that pin high or low and your oscilloscope you can see an you can measure the air the duration of the code that the Church surrender depend control with alternatively you can use timer objects so you can create a timer had to captures to to counter 1 of 1 microsecond right and just read it so um the advantage of the
oscilloscope is a gives me the opportunity to put a little picture of so here's a screengrab over of the oscilloscope on the board that I've been using and this low period here this trough is effectively the duration of the capture function this is the method that is running at 8 kilohertz in this particular case it was recording done at 6 kilohertz and so this is all the time it takes to read from the ADC and put that sample into memory and do some crude calculations with it as you can see it actually takes quite a bit of time 104 microseconds but you can do clever things with the with the pin and discover that using a timer cost you 20 microseconds and actually doing the read itself is a relatively expensive 50 microseconds I mean Our select gives you a clear indication of how fast you can actually sample is at yeah you're not gonna gets 20 kilohertz because it can take you 50 markers seconds just to get the the value and you bought and
so initial setup is basically I need a buffer to write my data into I need a function to collect the data and I need a place play mechanism the so uh crude diagram here has my capture functions responsibilities just read from the ADC it's connected to a timer and I simply create a timer and attach a callback function so here I could have a sample frequency of 8 kilohertz so this function will get cold at a thousand times a 2nd is simply does a read puts it in a buffer let's finish my play function goes and provides that data to the DAC of the so I have I have some
recordings that I've made earlier I wish I'd said something a little more impressive than 1 2 3 in Italian but hopefully this work cultures playback and issue an initial recording at 8 kilohertz path out so that's no do it but you'll you'll notice it's very very noisy chardt hi how and that's that's a 12 bits 8 kilohertz when you
analyze the noise you get really quite work disturbed and disappointed that all of your 2 of 4 thousand samples 300 also or just noise is a very noisy signal on this board of of firm for a number of reasons but I'm losing quite a bit of information and just just in the noise added some pleasant unpleasant to to listen to so there are quick ways In order to reduce reduce the noise and the simple 1 I put together is something that just searches for periods of silence in recorded speech and you set them to 0 so to do that and I basically recording into my speech buffer was a call it and on the 2nd pass I go through this speech before knowing roughly where 0 is again this has to adapt to this is not a constant depending on your hardware 0 might not be in the middle so I go through the buffer looking for periods with 0 and mark their positions and then that gives me the opportunity to do 2 things I can then adjust my estimate is 0 because it might have changed but I can also go now go through the buffer and set all of the noise to the noisy silence to uh to real silence and I'll play back if we hear the original 1 cow few and then we apply this simple algorithm AP how that have so a lot cleaner they still noise in the speech signal and you can't simply just take away those bits because you this speech in the noise is still a valuable the
so I have recordings of ridiculous at 12 bits so I decided well there is so much noise so how about recording at 8 it's rather than 12 if I play that have cropped hair have have have have have so 8 bits give me an added advantage and I can store more speech on the board on very limited in the java heap of proper got about 100 K bytes minus whatever the operating system in the Python modules take away from me talking only recalled very short periods of speech but at a a bit talking about 8 to 10 seconds and and they euphoria there's a lot of problems with uh 6 and 7 say and set a and that's probably because I'm recording at 8 kilohertz so we are we're right on the edge of all of bandwidth here so a lot of the high-frequency stuff just gets lost district I also think so in order to compensate for that I decided well how about increasing the capture rate from 8 kilohertz to 10 kilohertz so here again and I have a cocker have have have have have have and you to tend to be the same setting will be a little bit at the time so the cat of made some improvements why can't I Sample of 44 . 1 kilohertz like I have with the compact disks was we've seen it to get it takes me too long to read from the ADC and also the power of the device uh at the site the memory of advice just will let me capture much data of 44 code kilohertz even if I could sample elaborated the I wanted some application refinements um so it's
OK just recording and playing back the purpose of this was to impress my nephew look here's a little device you could put in new cuddly toy talk to you talk back to you so it has to have some sort of uh and uh automatic word so that button automatic speech detection you have to press a button for it to to listen so I want to be continuously listening I wanted to be able to report to the SD card this device gives you the ability to plug in the fast SD card I wanted to play with the LED is to give you some indication that it's listening is recording its playing back and I want to be able at the very least to disable the device by pressing user button so it stopped listening so that we can have
we interact with the user button well there's a there to switches a reset button on the board which reboots it obviously and the user switch and these are brought out as little brass buttons on your element on your aluminium casing so in order to use a button or I have to do is provide a handle function the work alone to do when the bonds pressed I need a switch object which represents the switch and then I just attach the handler Bahcall by method so there's my work in this particular case under setting a control variable which just prevents the recording from taking place the I created which objects from the plyboard module and then I invoke and the callback method providing my function so every time I press that button that function runs is quite straightforward the driving the early days was pretty cool we like things that light up and flash so it's got for a ladies and fall colors so you can create much this which
Ukraine LED object and you can switch them on off toggle them and so on so again plyboard LED to 4 for the green LED and I can switch it on I can switch it off like in Toeplitz LED for supports an intensity method which allows you to provide and and uh 56 levels of brightness they all support the intensity call but only the blue LED allows you to do that the cool the throbbing tied start of effect and automatic speech
detection um OK so this requires a little bit of all the possible doesn't give you this is all part of my application domain code so basically I want to capture some speech continuously and only start recording when I think something's being spoken so this requires obviously and the ability to determine whether the speech on not and that's uh relatively simple in this implementation so I have I have 2 buffers I have my original speech before them according to but I have a very small but for this treated like a circular uh memory so as the capture function is running it in 2 modes 1 is listening for speech and to its and recording so why is listening is writing to the circular buffer a doing some very crude analysis on the signal level to determine whether it's silence or noise and because I've done some analysis of the noise I can understand roughly what level we sample start to look like speech so once it's discovered that their speech it then switches to a straight recording to this main buffer and then and then stops the writing to an SD card is useful for storing data in recordings the device does have some flash on board but that's a very very small and extremely slow so you have to instill in certain SD card and um discovering an SD card is my purchase require crude some might Michael Python guru might say 0 no you don't do that so I basically just look for um st in the system path if there's a st there's a st device and then I just do my basic Python open right closed the so
finally putting it all together and I I have sketched this the application diagram the green part so basically the micro hardware services the blue parts or the software modules are provided and the red is is basically just data so when we when we initialize aboard the capture playback main loop they're essentially set controls the capture function with some control variables so it'll tell it right start listening if I not press the please stop listening hold but the capture function then essentially listens for speech is capturing all the time once is detected speech she writes into this long buffer it conveniently lights the amber LED for me to tell me it's recording went once that is done it in their despair the and unlike that LEED the and sets a control variable so at least my outer capture loop now understands a recording has been made so it can set the um blue playback but playback LED and does 1 of a number of of the 1 of a number of things now remember that I have a little bit of speech this speech detection buffer so I need to attach that to my original buffer because in my crew DAC method I can only give you the address of a buffer and play it but also that is circular so we don't know where the start of it is the so the copy function unrolls this untangle this buffer puts it in head into their I then that once a copy is done by then run my crude clever attenuation method which adjusts that buffer it adjusts the value of 0 for the next capture etc cetera and then of course the play function once that's done it optionally dumps to the SD card if it exists and then the whole thing starts all over again and that effectively is the entire application the code is is all Python and I publish it on to on to get out so got to say now
is thank you very much for listening I was being useful you've got my contact details there and the address the source code and I assume that the effect of probably published the slides up to that address and later on I if I just play on thank you and for that wonderful tool about course so
we all the automated away my job that's not good at that at a yeah thank you Iittle there's some questions other things you may also it I'd say have a very small budget in terms of time process does that mean there are certain things that you shouldn't do it probably our models of article in functions and in general is there any way to make sure that no matter what called takes it would still be done within the budget you allocated yes indeed so an the
overriding rule is inside the capture function you have to do as the absolute minimum that's required a might capture function essentially just cause the ADC checks the sample volume puts it in the buffer although processing that you'd ever want to that to that signal I can't do in a buffer because of measured the time the you're cool lack function apart from other interrupts are going on the board you you probably a fairly safe in assuming you go control of the CPU here your Python on vessel so apart from other interrupts the recurring you probably don't have a lot to do you could and yes I don't know what I mean if you owe if your capture function lasted too long and you could probably detect that by setting some sort of like a local variable the 1 thing you can't do in a back is you can't create any Python objects and you can't do floating-point processing in Python because that in itself creates Python objects so you really are at the bottom level of code just moving bits and bytes around memory that's why I have to put in the this the pin function in the timers to to make sure there is some just you can see that when the board is running but as long as you within qualify the percentage point of Europe and the allowed period you probably can't say that thank you goes fascinating you mentioned you you think at the beginning and have you use this with children or young people in the educational context it's high like hell out there not yet the the problem that the particle has is that when I pull cable there's no built-in battery back-up so it has to be powered by something waiting for them to produce a battery module so I could put this in the you know the the stuffed toy to allow them to do that so think of these things are uh incredibly popular now they're all over the place you squeeze tall you press the press the hand to the no I haven't I'm having go back to the of of just released version 1 cell thank you so the question is
and the and have you done testing with the clear signals so you can tell which part of the noise that the hearing comes from the recording and which 1 comes from the playback because usually those those speakers are often locally yeah so uh no I haven't but the the pipe ought to climb module does come we schematics and diagrams so 1 of the
things you can do is obviously not use the built-in microphone and there's a uh an ability to perform provider on signal that the past to the uh and what components here is very short that is the next step is to provide a a reference signal 1 here to see where the noise is coming from is not unusual to find noise in devices like this it not exactly professional-quality and all components they need proper grounding in an Earth planes to separate them from the digital noise but no that's the next step is to provide some sort of reference signal in this point OK so 1st you makes the so
it seems like a really good way money getting to embed but also on learning Python companies by the inflammasomes authority quick question subtly firstly commitment that embedded background can you come to the ADC in the background so for example in your in your code you were waiting for the ATC to get back to you and you buffering that data can canniest polar and then have an interrupt compact you will focus on new double-buffered approach like you do with all the yeah you as so the question most university would remain the buy was they had you go 1 here but no um that was the point that the ADC READ method obviously is like Python's open source so we could go in there and have a little what's going on and that the read time just is not very useful because you don't know where it is and and you that each real-time suit that up at the end of the day you have to do whatever you want once every 8 thousand times a 2nd if you can't guarantee that then there's a maximum length of time that you can recall so the the only way the only what I've done is is is the absolute minimum which is read from the ADC in essentially put it into memory but I and I am checking the signal level I could I could connect 2 boards together maybe connect this to something more powerful like a Raspberry Pi and do a lot more signal processing because using re really clever things when you turn your and continuous time domain signal into frequency domain because then you can walk into speech recognition and action understand what people are saying so but no you really have a very limited what you can do but yet not chief as pi 0 and I don't know whether micro Python runs on the applies here I would guess that it does the beauty of this board is this school so much Hardware ready and you just have a microphone and a loudspeaker OK yet social may be questions no really to reduce to just now our for the what's that vantage refusing Michael by on on on priority verses using 1 of the you know the law in a press the prior 1 of these boards of more powerful and don't use the long title and you don't all the powers and you were so that the main the main reason I looked to this device is its physical size it comes with headers speaking by the board without any headers so if you go to attach this to a small component like a tall he its size is important and and also is the power consumption I think this is only drawing uh uh a fraction of the tens of millions whereas the Raspberry Pi just consume too much time is too bulky a such a trade-off between power and in CPU and battery consumption and and and things like that so and this is about half the size of the pie 0 it's called but it's it's loaded kind of quite a bit of hardware to play with so the reason why 2 it's not cheap I think it's 35 euros of something says not terribly cheap compared to the pi 0 but it's got some really cool hardware features and it's it's Python on methods so the school and makes noises the OK so maybe I ever last questions the winners elixir of running on the so I can see well OK this in the world and as a say the problem is getting from the continuous domain signal to the frequency domain so you need to get to something
like that once you want you're able to get the data that you can do your own speech recognition and that of them that in my past life but Alexa they have an API so you can give it data so that the just needs time OK so let's thank him again
if you had