Web Audio API
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 199 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32657 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 201427 / 199
2
3
10
11
12
13
14
17
18
19
23
24
25
27
29
45
46
47
48
49
50
51
52
55
56
57
58
65
67
68
70
72
74
75
76
77
78
79
81
82
84
86
87
89
90
93
94
100
101
102
103
104
106
107
109
110
111
112
113
114
115
119
122
123
124
126
127
128
129
137
139
141
143
145
147
150
154
155
158
161
163
164
165
166
168
171
174
175
176
179
182
183
185
188
190
191
192
194
195
196
00:00
Function (mathematics)Software developerWorld Wide Web ConsortiumHypermediaAxiom of choiceGame theoryMusical ensembleComputing platformFront and back endsArithmetic meanState of matterTheoryMixed realitySpacetimeHand fanWeb 2.0Presentation of a groupBookmark (World Wide Web)GodLecture/Conference
02:29
Order (biology)Multiplication signGame theoryComputer fileGame controllerState of matterMusical ensembleWeb browserTracing (software)CodeGraph (mathematics)Server (computing)Process (computing)Suite (music)SoftwareVolume (thermodynamics)Rule of inferenceGroup actionDemo (music)Summierbarkeit2 (number)Perspective (visual)QuicksortContext awarenessExtension (kinesiology)Source codeFile formatWeb serviceInferenceImplementationHookingPoint (geometry)Structural loadRow (database)Functional (mathematics)Raw image formatINTEGRALRight angleBuffer solutionStudent's t-testCodierung <Programmierung>Web 2.0Computing platformAudiovisualisierungSpacetimeBenutzerhandbuchKeyboard shortcutHypermediaFeedbackGraphical user interfaceDirected graphCartesian coordinate systemMereologyWorld Wide Web ConsortiumDifferentiable manifoldNoise (electronics)Level (video gaming)Type theoryCASE <Informatik>Function (mathematics)Survival analysisPosition operatorSlide ruleElectronic visual displayWritingAverageMP3Normal (geometry)Complete metric spaceSoftware testingVideo gameReverse engineeringSound effectTouchscreenScheduling (computing)Lecture/Conference
10:23
Buffer solutionFrequencyNonlinear systemFunction (mathematics)FeedbackPoint (geometry)Loop (music)Sampling (statistics)FreewareOscillationShape (magazine)Graph (mathematics)MereologyVolume (thermodynamics)Planar graphSource codePosition operatorUtility softwareHypermediaConvolutionProgrammschleifeStreaming mediaWritingProcess (computing)Doppler-EffektRoboticsElement (mathematics)VideoconferencingObject (grammar)SineSquare numberDynamical systemComputing platformWorld Wide Web ConsortiumFourier transformDemo (music)CoprocessorBitHarmonic analysisScripting languageMP3outputTouchscreenContext awarenessDifferent (Kate Ryan album)Row (database)MeasurementRight angleComputer fileHookingImplementationVisualization (computer graphics)Game theoryCharacteristic polynomialMultiplication signCalculationBit rateFourier seriesView (database)Cellular automatonConnected spaceInverse elementGame controllerWeightExecution unitOrder (biology)Digital photographyAdditionDistortion (mathematics)CodeAsynchronous Transfer ModeWaveformSystem callState of matterType theoryWeb 2.0Uniformer RaumEqualiser (mathematics)Statement (computer science)Set (mathematics)Thread (computing)Text editorLecture/Conference
18:17
Profil (magazine)System callGraph (mathematics)WeightSound effectPrice indexPlanningRevision controlNumberCodeCASE <Informatik>Real numberSoftware bugLie groupGoodness of fitWorld Wide Web ConsortiumBit rateLink (knot theory)Limit (category theory)Perspective (visual)Software developerPredictabilitySocial classUltraviolet photoelectron spectroscopyImplementationFunction (mathematics)MassControl flowReading (process)Process (computing)Library (computing)BitEvent horizonAxiom of choiceWordDemo (music)Direction (geometry)Mobile WebContext awarenessRoboticsPhysical systemPower (physics)VideoconferencingDependent and independent variables1 (number)Heegaard splittingNortel NetworksCondition numberPattern languageNetwork topologyPhysical lawFunctional (mathematics)Computer fileEndliche ModelltheorieScripting languageOpen setMedical imagingMultiplication signObject-oriented programmingStreaming mediaFilter <Stochastik>ConvolutionCoprocessorOperator (mathematics)2 (number)Computer virusGraphical user interfaceProjective planeWeb 2.09K33 OsaLecture/Conference
26:10
XMLUML
Transcript: English(auto-generated)
00:24
Okay, so now let's talk about Web Audio. Thank you for coming. The presenter is simply one of my favorite employees in Mozilla. Because he's very, very nice.
00:41
And I just regret the fact that he has decided to have a bird, but that's his choice. So yes, bird, sorry, it's not obvious for me. So Paul is a media developer at Mozilla. He primarily works on the audio output backends for various platforms and on the Web Audio.
01:04
And he's a big, big fan of all what he's dealing with music. So please welcome Paul. Let me check for a second if my audio is working because I'm supposed to make sound.
01:27
I mean, I'm playing an MP3, so yeah, that's the problem. Can you hear something?
01:45
Yeah, so this was VLC. It's brilliant. So today I'm going to talk about Web Audio because that's basically what I've been doing
02:03
since a year ago, I guess. I've been doing mostly that. So a year ago I was there in this exact same room at the exact same place. And I was saying stuff like, so we lost a spec game we have to implement. Oh, no, God.
02:26
Yeah, don't do video. Sorry, I'm not too good. Yeah, Linux that works. Brilliant. No, actually it doesn't.
02:42
Oh, yeah, full screen. All right, so I was in this room and I was telling everybody that we lost the spec games. Basically people write a spec, you also write a spec, but people don't like your spec,
03:03
so you lose. So you have to implement the other guy's spec. So we had the audio data API spec, which was extremely simple. You could write PCM data in JavaScript to the speakers basically, nothing more. People kind of liked it, but people preferred Google API, which was the Web Audio API.
03:24
So we were like, oh, yeah, so we have to implement it. So we looked at the spec, it was extremely sad because it was the user manual. It was not a spec. So it was like, oh, there is a function, it does kind of that. We had no clue how we should do it.
03:40
So we had to read a lot of WebKit source code and stuff like that to figure out and reverse engineer the spec, which was interesting. So we had it behind the pref. It was doing absolutely nothing a year ago. I said, like, we were close to actually output sound, which seems pretty important,
04:01
but it didn't work. The DOM plumbing was all right, was kind of working. We could pass buffers around and did nothing. Oh, no, it's not the right slide deck. All right. So when was it? Friday, August.
04:21
Yeah, last day of August 2012, Ehsan pushed the first commit, which was basically, oh, there is a new API you can call and it's called Moz Audio Context. 800 commits after, we have a feature complete implementation that works and runs a bunch
04:43
of tests and most of the demos out there. So yeah, as you can see, so this is the timeline of the commits. This is where we really started working on it. And this is where we shipped, like, learned everything and go home.
05:01
So yeah, that was pretty cool. So it's been shipped for, I guess, two releases, maybe three now, and it's pretty good. It runs most of the demos without a problem, most video games. So talking about games, what are the use cases for Web Audio API before talking on
05:22
how to do it? First and foremost, game developers were frustrated using only audio tags because it was not great. Like, you couldn't do special positioning effects and stuff like that. It was, I mean, you couldn't write a decent game without such an API.
05:42
Music visualization is basically you play an MP3 and you have a little graph moving and say, oh, there is a lot of bass and stuff like that. Application feedback, we use that in Firefox OS. For example, when you hit the keyboard, it goes click, click, click, click, click, like a typewriter.
06:02
Musical applications, people have done crazy things where you just, you have browsers and they do music like you would use on other softwares. But since you also have WebRTC and servers and network and like everything, you can do crazy stuff. So fancy streaming music player.
06:22
So maybe you have heard of MEGA, the old MEGA, the new MEGA upload kind of stuff. So they have a new music streaming service that uses Web Audio in interesting ways to be able to play FLAC, the format on the web, which is not possible normally. But they run the decoder in a worker and then pipe that to Web Audio API.
06:45
It's insane, but it works. It's insane though. It's probably not nice on the battery though, but who cares. So yeah, the proper way to do that would be to use the media source extension, but
07:01
it's only available on Chrome, I guess, maybe Safari or WebKit stuff, but I'm not too sure. So we have most of the media source extension, but it's not enabled right now because it's not finished. And the specs are finished as well. And ideally anything that should make noise, but there might be some problem along
07:23
the way. So what can you do with Web Audio API? You can take an array buffer that contains some buffer that is a NOG file or an MP3 file or whatever format your browser supports, decodes it to an array buffer
07:40
so you can get the WAV PCM file out of an MP3 file, which is pretty cool. Precise timing of sound clips. So say you're in a game and you want to trigger of like a shotgun or whatever. And with the Web Audio API, you have an extremely accurate timing control over the
08:00
timing, which is important. You can also layer sound in a simple, accurate manner. So if you need exactly two sounds to be triggered at the exact same time, then it's very easy to do. Arbitrary audio routing and mixing, as we will see in the future. I've got demos coming up. Effects transition scheduling.
08:20
So if you have, for example, if you have a gain node, we'll see in a minute. A gain node lets you change the volume of something. You can say, oh, from time T to T plus 30 seconds, I want this gain to ramp in an exponential fashion from zero to one, for example, to make a sound that goes up with
08:42
a nice smooth curve. So think of it as CSS animation or CSS animations, but for audio, for example. You can analyze audio. So get a Fourier transform for those in the know. Get a Fourier transform and display it like a little bar graph, like display the average volume and stuff like that.
09:03
Integration with the web platform. So we will see how we can hook up WebRTC or get user media or media recorder to Web audio API is extremely easy, thanks to people who design nice APIs and low latency playback. So admittedly, game developers and basically everybody needs low latency
09:26
playback at some point. And it was kind of possible before, but not really. Web API ensures that the sound goes from the buffer to your speakers the fastest possible.
09:42
So how does it work? So I'm coding the spec. Basically, you have nodes, you connect the nodes and the output of a node goes in the output of the other node. And then at the end, you have a sync, which is the speakers. Basically, we can, we will see that there is other types of sync, but yeah, basically,
10:03
yeah, you have sources, you have a processing in the middle, six at the end, and then you hear sound or you can record it or send it to WebRTC or something. So it goes like this. I've got a Web audio graph on the left. I've got the code that produces it on the right. So it's extremely easy, always the same stuff.
10:21
You get node context like you would do for any API, basically. I show you how to decode a NOG file. You got it from an XHR or whatever. You put it to decode your data. It does the magic, gives you a buffer back using a callback. So it's asynchronous. It happens on another thread, doesn't kill your performance.
10:45
So it's just nice. Then you create a buffer source. You put the buffer in the buffer source, and then you do a mishmash of connecting and setting parameters, and you end up with this graph. And at the end, you can say start, and it will start to output sound and it will
11:02
go and get processed by the delay node, go into the gain node, go back in the delay node, so you have a feedback loop. And then it will eventually, at some point, go in the destination node. And at this point, you will have sound. So I'm actually not sure what, so there is a data loop there. This will be terrible, but it'll hurt, but nonsense.
11:23
But anyway, there is demo where I can show you after that. So source node, what you can do, like where the sound can come from, basically. So it can come from WebRTC, get user media. So you can say, for example, your microphone, or the microphone of the guy at the other end of the WebRTC connection, piped up into WebAudio.
11:44
For example, you can give yourself a robot voice, which is kind of cool. You can play a buffer. So this is basically, you have a sample and you want to play it. This is easiest stuff. Because the buffer is decoded in advance, it will be extremely fast.
12:05
So we don't have to do anything. So we just play the samples. Oscillator node, it basically gives you a sine wave, square wave, sawtooth, whatever. You can set an arbitrary waveform on it.
12:20
Script processor node lets you write arbitrary JavaScript and write into a buffer and then gets played. And it gets called back whenever it needs new data. But we will see why it is broken. And media element audio source node, which is a mouthful. Yeah, you basically take a video audio element, pipe it into WebAudio. When you do that, the sound is not outputted on the speakers.
12:43
You basically bypass the normal output of the media element and pipe it into WebAudio. So you can actually apply processing or something like that. Then you can process the inputs you have. For example, the easiest is apply a gain.
13:00
If you have a gain of 0.5, then it's like you have half the volume, basically. You can set it to more than one if you want to boost the volume. Delay node basically delays in time the input. So if you have a delay of one second, you write some sound. The delay node waits a bit, waits one second, and then outputs the sound,
13:21
which is extremely useful, as we will see. Script processor node, again, it's broken, so don't use it too much. I mean, it's broken for spec, it's not an implementation problem. But the script processor node lets you write arbitrary JavaScript to process data. Planar node, extremely useful for games, lets you set two 3D points
13:42
and the sound and their speed as well. So the sound gets panned in your speakers or headphones, respectively, to the relative position of the two points. And if the points are moving, you get Doppler effect for free and kind of stuff. It's extremely interesting.
14:01
And that's basically for games. And then channel splitter, channel measure. So that's when you have a stereo file, but you want to process left and right channel in two different ways. So you split the file and you can hook up to different parts of the graph.
14:20
So yeah, that's pretty useful. Utility node. Converal node, so one-dimensional convolution. Basically, it's like if you're in a church and you talk and then there is reverb. That's basically what it is. You can do all this stuff with it, but that's the most... Yeah, that's the most natural way of putting it. Again, there will be demos.
14:40
A wave shaper, you can do a distortion with it. Le fancy way of saying it's non-linear wave shaping. But again, it's pretty hard to understand what it is if you don't actually know what it is. By-code filter is a low-pass, high-pass. You can implement an equalizer, a graphic equalizer on it. Basically, if you want to remove the bass, then you can cut the bass with this node.
15:03
Dynamic compressor node, it's again a bit hard to explain, but it makes the low sound be more loud and the loud sound be more quiet. So, it kind of harmonizes the audio stream.
15:24
That's extremely useful on the voices, drums, a bunch of stuff. And then open nodes. Once you've got your processing and it's all nice, you can have a media stream destination node. So, media stream is basically the object in the modern web platform
15:41
that has video or audio in it and you can consume it and pipe it to a lot of different APIs. For example, you can pipe it into a media recorder and get it uncoded to Opus or Org or MP3 or whatever. You can pipe it into WebRTC, pipe it into another audio context.
16:03
So, that's basically the way to connect all the APIs on the web platform. Audio destination node, that's basically your speakers or headphones. So, you connect. So, if you want sound to be output, then you use the audio destination node. Screen processor node, you can basically use it without outputs to get the input data
16:25
and do some calculation on it. That can be pretty useful sometimes to measure stuff or to do animation. And analyzer node basically gives you the characteristic of your input data maybe using a transform like the Fourier transform.
16:41
So, that's also very useful to do visualization and to measure what's actually going on in your audio graph. So, demos. So, I had an awesome demo which is this one where it's a 2D game where there is a little guy running
17:01
and whenever you guys would scream in my microphone, it will jump. But unfortunately, it's on my desk at Mozilla in Paris. I forgot the code, so I'm pissed, but anyway. So, I've got this pretty cool demo. Sorry, you have this demo right now.
17:22
And maybe we can show it like in the corridor right after that. So, let's do that. So, we're going to get an audio buffer source. We're going to set a drum sound and then we're going to do the easiest stuff. Put a gain node here, connect this there and put that there.
17:43
And then I will boost the gain maybe and make it so it loops and play. And I can change the volume if I want. So, as you can see, it loops perfectly,
18:01
like there is no gap in the sound, unlike the media element. I can put a delay node, so I will have another copy of the sound.
18:22
So, it sounds gross, but it works. Anyway, and then what can I do? I can add a convolver. I'm actually going to see a little reverb effect.
18:50
What can I do? A bit bi-quad. Oh, sorry. Gonna do that and then bi-quad that.
19:05
Oops. So, you see, there is no high frequency at the beginning and this is the normal sound.
19:24
Think of it as a DG effect, kind of. That's what DG do all the time. What can I do? I can put an oscillator.
19:42
It's boring. I can get an aligner. Name's problem, like the engine works perfectly, but they don't call the right function, so it doesn't work. So, I'm personally crawling GitHub manually and open issues on people's projects and say,
20:04
oh, your demo is great, but it doesn't work in Firefox. Just do this and this and it will work. And most of the time they fix it because people are nice. A bunch of bugs to fix in Gecko. Exactly 96, last I checked. Some bugs to fix in WebKit.
20:21
Blink, because we changed the spec. And yeah, massive amount of stuff to fix in the spec because it is far from done. And we would like very much to have a decent spec so people can actually make other implementations and don't go into trouble. Like, I had to read a lot of Blink's code and it wasn't great.
20:43
So, possible direction. Audio contacts in workers, as we heard in previous talks, we want to do everything in workers, so why not Web Audio? It will fix script processor node because it is broken per spec.
21:03
It's written all over the web, so I won't repeat it why right now. Maybe you want promises for all the asynchronous operation because we are modern people. And who knows, I don't know, contribute to the spec. And to the code. Write demos and tell us if it breaks.
21:21
All right, I think I'm pretty much done. So, if you have maybe one or two questions. Otherwise, I will hang out in the corridor and we can talk more. Sure, where is the microphone?
21:45
Yes, I have a question. Where are you? Yes, over here. All this audio processing is amazing and really looks cool, but how will systems response if you ask to make much of them, like all the filters you put over them, the processing?
22:01
How will it respond on a mobile device if processing power isn't there? Yeah, so basically you can do a lot of stuff before it breaks because it's all written and written optimized C++ and SSE2 code and neon on the arm.
22:21
If it breaks though, because it's pretty easy to do, you get a limited number of nodes. So, if it breaks, what you will hear is glitches in the audio output. We are trying to devise a plan to automatically reduce the needed processing power without glitching, but that's like nobody does that.
22:43
Link, don't do that, we don't do that. But we have bugs open to decide of a plan, like basically a first spec paragraph that says, if you shorten resources, back off this stuff first and then this stuff and then this stuff. For example, if you do Panner using HRTF, it's very expensive,
23:03
but you can probably get away with doing Panner using equal power, so you back off in quality and you won't glitch. Maybe you will glitch for one second. There are plans to do that, but nothing is implemented yet.
23:20
Yeah, it's just glitching. We have plans to do that as well, to have a little indicator, say this is costing you like 30% of the graph or something like that.
23:41
So profiler, if you will. Hey, I have a question. I'm way back here. Yeah, hello. I was just wondering about the choice to include the nodes with specific filtering capabilities and so on.
24:04
As opposed to asking people to implement JavaScript libraries to do the same thing. If I want to implement my own filter, it's going to perform a lot worse than the ones in the API. What was the thought process there? So what happened? What happened is that there were two specs.
24:21
There was Roberto Callahan media stream processing spec that had absolutely no leverage despite having a working implementation. And the audio API had a lot of leverage from basically game developers. So that's the way it works, right? We have a bunch of code in Firefox we don't use because of that.
24:42
Well, you can't access it, but anyway. So it was like before getting the API sorted out, it was a call for use cases at the W3C. People told other people the use cases they wanted.
25:01
And it was very user-driven and so that's what we have now. So the last question and after if you have another question, Sir Paul is going to be outside waiting for you and answering to everything you want. Do you plan to support multi-channel output?
25:22
Sorry? Do you plan to support in the API multi-channel output more than stereo output? This is not supported right now, right? No, this is supported only if you have a canary build of Chrome on Mac OS.
25:43
Yes, that's extremely precise, but that's the way it is. We have plans to do that if people ask for it. Right, that's basically what it is. But you can mix and process an arbitrary number of channel inside the graph. The output has to be stereo for now. That's the thing.
26:00
You can output also multi-channel Opus file for example. Thank you, thank you so much.