We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Lightning talk - Speech Recognition in the Browser

00:00

Formal Metadata

Title
Lightning talk - Speech Recognition in the Browser
Title of Series
Number of Parts
133
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Mouse and keyboard are not the only ways how we can communicate with our devices. We already talk to our phones, gaming consoles, and some desktop applications. However, there are very few web applications that understand our speech. The bleeding edge versions of most popular web browsers are starting to support the most natural user input. Let’s take a look at Web Speech API, and learn how to add voice commands to web applications using pure JavaScript. Maybe it would be also worth to respond to users with voice using Web Synthesis API?
Software developerWeb pageVideoconferencingCartesian coordinate systemData managementSingle-precision floating-point formatJSONXMLUMLComputer animation
Software developerVideoconferencingWeb pageSingle-precision floating-point formatCartesian coordinate system
Scripting languageData typeSoftware developerInformationSpeech synthesisWeb browserPattern recognitionBlogWeightWindowComputer animation
Software developerWindowGraphical user interfaceSpeech synthesisLibrary (computing)Pattern recognitionWeb browserGoogol
Software developerFocus (optics)Scripting languageRing (mathematics)MeasurementMassOpen setLine (geometry)ResultantSystem callExterior algebraTerm (mathematics)Bit rateGraph coloringDecision theoryGroup actionMereologyLengthRow (database)AdditionConfidence intervalObject (grammar)Category of beingWeb 2.0Pattern recognitionFunctional (mathematics)Poisson-KlammerSpeech synthesisSet (mathematics)Menu (computing)Event horizonWebsiteBitLibrary (computing)Subject indexingDefault (computer science)Asynchronous Transfer ModeMessage passingSoftware development kitMultiplication signFormal languagePoint (geometry)QuicksortExtremwertstatistikFlagRight angleLetterpress printingScripting languageParameter (computer programming)Error messageComputer animation
Software developerFocus (optics)RootSpeech synthesisLibrary (computing)Computer animation
Focus (optics)Software developerWeb browserRight angleSystem callSpeech synthesisFunctional (mathematics)Pattern recognitionGoodness of fitQuicksortSymbol tableWrapper (data mining)CodeComputer animation
Function (mathematics)Ring (mathematics)Web pageFunctional (mathematics)WebsiteLecture/Conference
Software developerWebsiteMultiplication sign
Software developerBookmark (World Wide Web)Computer animation
Software developerArchaeological field surveyGoodness of fitWebsiteSingle-precision floating-point formatComputer animation
Software developerForcing (mathematics)Library (computing)View (database)Link (knot theory)WebsiteGoogolHTTP cookieSpeech synthesisWeb 2.0Computer animation
Speech synthesisSoftware developerSpeech synthesisWeb 2.0Computer animation
Software developerFocus (optics)Pattern recognitionSpeech synthesisWebsiteGoogolWindowComputer animation
Software developerGraph (mathematics)Metropolitan area networkoutputSpeech synthesisWeb pageVotingWeb browserComputer animation
Software developerSpeech synthesisGraph (mathematics)Link (knot theory)Tap (transformer)WebsiteLibrary (computing)EmailComputer animation
Touch typingSoftware developerComputer animation
Transcript: English(auto-generated)
I work on the Azure management portal. I have a session about how we build this one of the largest single page applications in the world yesterday, so you can check out the video if you didn't have a chance
to attend my session. You can find more information about me on my blog, jg09.net. I encourage you to check it out and subscribe. And today I want to tell you about speech recognition in the browser, because probably many of you are using already Siri on your iPhones or more legit Cortana on a Windows phone, but you
might not realize that in Google Chrome you can actually do speech recognition in the browser without any libraries. So I'll just open Google Chrome and I'll just open command line.
Some error here, and that's okay. And I'll create new WebKit speech recognition object. And this object has on the result callback function that I need to define.
And this function has an event parameter, and the event parameter has results property on it. And results is an array of the results of the recognized speech. And I need to say transcript at the end.
And then if I say start, hi, my name is Jacob. Ooh, that's bad. Transcripts. Thank you.
Oh, yeah. You're right. Script. All right. Thank you very much. And this print out my whatever I said. And you might notice that there is a two dimensional array, and you might wonder what
is that, and I will get to that. But in addition to transcript, you also have confidence. So for example, you can say confidence here. And if I say something now, it says I don't know what you already said.
My name is Jacob. It will give me like 95% of confidence that this is whatever I said. Another thing what I can do on WebKit speech recognition object, I have the property called interim results, and by default it's false.
So it's saying that it will only call this callback function when there is final result, but I can also see the intermediate results. So if I say interim results true, and if I say something now, it should print up whatever
I'm saying at the time. So as you see, this is at some point in time, I can only say and, and then there are in other words, and at the end, yeah, it sort of recognizes it, yeah, 89% of confidence. Let me switch it back to false.
There's another thing called max alternatives. By default it's one, so it's giving only one alternative, the best guess they do. But I can say max alternatives 10, and then I need to modify my callback a little bit because the alternative is actually this index of an array. So here, I can say event, and because this is actually not a real array, so I need to
do this trick for each, and call, and EVT.results zero, and here I need to create
a callback and say transcript, and I will print out confidence as well, and I hope I have enough brackets here.
So now it should give me the alternatives, and you see I have like different things what I said. I wasn't really clear, and I guess none of these are right. NDC London Rocks, boom, okay, so 78%, I said NDC London Rocks, you could say NDT London,
or whatever. What's that? NDC? Is it bad?
Okay, I will turn this off for now. I'll say one, and if you notice, you know, when I say something here, let me actually revert my callback to the original one, okay, this one is good, so you know, if I say something,
for example, I start the sentence, oops, I'm fine, what did I do wrong, transcript, if I start the sentence, okay, that's cool, but you could see that it's, after I recognize
something, it stopped listening, so there's also a flag called continuous, which is set to false, if I set it to true, then this will add more to, it will append the results to the array, and this is the first index, so I didn't want to do that, so the second
index is for alternatives, first index is for the next recognized parts of whatever I said, so if I want to, for example, listen for the last thing that I said, I can say
here EVT.results.length minus one, and I'll put the same in here, boom, I hope it's true,
and I start listening, now I start my sentence, and then I can continue talking, and it will be still recognizing my speech, until I say stop. So this is like another mode, yeah, sometimes it's funny, probably there is some issue
about it here, let's hope for that, it's still experimental feature. So this is speech recognition, but what you can also do is you can emit voice, so we can create a new speech synthesis, and I can say here NDC London is great, and then
I need to say speech synthesis, speak, and pass this message into here, click enter, do we have audio?
Do we have audio? Okay, one more time. And you could notice some German accent here, right? So I can modify the message, there is language, and you see the language is not set, for example, for sake of correctness, I will set it to British, much better, right?
You can also do US accent, and what is even cooler, you can also do Polish accent. So you can play around with that.
And, you know, when I was playing mainly with speech recognition, because I was inspired in this idea that it would be so cool to add the voice commands to the website, and I put up the small library called voice commander, and you can find it on GitHub, it's just
a wrapper on top of the WebKit speech recognition API, and what it allows you to do, you can install it with NPM or Bower, you can just add a command to your code with simple JavaScript, one minor, so you just say command is whatever I said, right? And the callback function is whatever you have to do when you add this command.
I also put up the small website called book slip, and I hope you can see it. So if I say start, you see? It's asking me if I'm giving the permission to allow the microphone. If you're running the website on HTTP, it will ask you every time when you do
reload. If you don't want to do that, you can run over HTTP, it will just ask for permission once. So here I will say allow. Books, favorites, favorites, top ten, search JavaScript.
So this is, like, an idea how you can put this into your website, and here I have, you know, on top you can notice I have this start and stop, and I also have get single command, so if I click get single command, it will just get one command and stop listening, so I say get single command, and it will ask me to allow again, fortunately.
Home. Okay. Probably didn't recognize very well. Top ten. And go to command and stop listening.
So please check out my voice commander library, you can find it on GitHub, maybe you find it useful, and there are some links when you can see the demos on website about web speech API, there is even web speech API spec, for now it's supported only by Google
Chrome. If you want to have this in Microsoft Edge, you can do it. All you have to do is Microsoft, Microsoft Edge, speech recognition API, user voice,
I think, Google for this, and there is user voice on the Windows website, and you can see we have, like, 500 votes now, so if you add all of your votes, you can give maximum three votes per person, so you can, like, get another 100 out of this room,
I think, and I think if we get closer than Microsoft Edge team will be more encouraged to implement this feature in Microsoft Edge browser, too. So here is a link to my voice commander library on GitHub, and there is also this
book slip website I showed you, so thank you very much, and send me an email if you have any questions about that. Thank you.