Lightning talk - Speech Recognition in the Browser
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 133 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/48812 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Software developerWeb pageVideoconferencingCartesian coordinate systemData managementSingle-precision floating-point formatJSONXMLUMLComputer animation
00:11
Software developerVideoconferencingWeb pageSingle-precision floating-point formatCartesian coordinate system
00:24
Scripting languageData typeSoftware developerInformationSpeech synthesisWeb browserPattern recognitionBlogWeightWindowComputer animation
00:38
Software developerWindowGraphical user interfaceSpeech synthesisLibrary (computing)Pattern recognitionWeb browserGoogol
00:54
Software developerFocus (optics)Scripting languageRing (mathematics)MeasurementMassOpen setLine (geometry)ResultantSystem callExterior algebraTerm (mathematics)Bit rateGraph coloringDecision theoryGroup actionMereologyLengthRow (database)AdditionConfidence intervalObject (grammar)Category of beingWeb 2.0Pattern recognitionFunctional (mathematics)Poisson-KlammerSpeech synthesisSet (mathematics)Menu (computing)Event horizonWebsiteBitLibrary (computing)Subject indexingDefault (computer science)Asynchronous Transfer ModeMessage passingSoftware development kitMultiplication signFormal languagePoint (geometry)QuicksortExtremwertstatistikFlagRight angleLetterpress printingScripting languageParameter (computer programming)Error messageComputer animation
09:02
Software developerFocus (optics)RootSpeech synthesisLibrary (computing)Computer animation
09:13
Focus (optics)Software developerWeb browserRight angleSystem callSpeech synthesisFunctional (mathematics)Pattern recognitionGoodness of fitQuicksortSymbol tableWrapper (data mining)CodeComputer animation
09:32
Function (mathematics)Ring (mathematics)Web pageFunctional (mathematics)WebsiteLecture/Conference
09:42
Software developerWebsiteMultiplication sign
10:10
Software developerBookmark (World Wide Web)Computer animation
10:29
Software developerArchaeological field surveyGoodness of fitWebsiteSingle-precision floating-point formatComputer animation
11:16
Software developerForcing (mathematics)Library (computing)View (database)Link (knot theory)WebsiteGoogolHTTP cookieSpeech synthesisWeb 2.0Computer animation
11:30
Speech synthesisSoftware developerSpeech synthesisWeb 2.0Computer animation
11:46
Software developerFocus (optics)Pattern recognitionSpeech synthesisWebsiteGoogolWindowComputer animation
12:03
Software developerGraph (mathematics)Metropolitan area networkoutputSpeech synthesisWeb pageVotingWeb browserComputer animation
12:22
Software developerSpeech synthesisGraph (mathematics)Link (knot theory)Tap (transformer)WebsiteLibrary (computing)EmailComputer animation
12:43
Touch typingSoftware developerComputer animation
Transcript: English(auto-generated)
00:10
I work on the Azure management portal. I have a session about how we build this one of the largest single page applications in the world yesterday, so you can check out the video if you didn't have a chance
00:23
to attend my session. You can find more information about me on my blog, jg09.net. I encourage you to check it out and subscribe. And today I want to tell you about speech recognition in the browser, because probably many of you are using already Siri on your iPhones or more legit Cortana on a Windows phone, but you
00:47
might not realize that in Google Chrome you can actually do speech recognition in the browser without any libraries. So I'll just open Google Chrome and I'll just open command line.
01:03
Some error here, and that's okay. And I'll create new WebKit speech recognition object. And this object has on the result callback function that I need to define.
01:23
And this function has an event parameter, and the event parameter has results property on it. And results is an array of the results of the recognized speech. And I need to say transcript at the end.
01:44
And then if I say start, hi, my name is Jacob. Ooh, that's bad. Transcripts. Thank you.
02:01
Oh, yeah. You're right. Script. All right. Thank you very much. And this print out my whatever I said. And you might notice that there is a two dimensional array, and you might wonder what
02:21
is that, and I will get to that. But in addition to transcript, you also have confidence. So for example, you can say confidence here. And if I say something now, it says I don't know what you already said.
02:42
My name is Jacob. It will give me like 95% of confidence that this is whatever I said. Another thing what I can do on WebKit speech recognition object, I have the property called interim results, and by default it's false.
03:03
So it's saying that it will only call this callback function when there is final result, but I can also see the intermediate results. So if I say interim results true, and if I say something now, it should print up whatever
03:22
I'm saying at the time. So as you see, this is at some point in time, I can only say and, and then there are in other words, and at the end, yeah, it sort of recognizes it, yeah, 89% of confidence. Let me switch it back to false.
03:41
There's another thing called max alternatives. By default it's one, so it's giving only one alternative, the best guess they do. But I can say max alternatives 10, and then I need to modify my callback a little bit because the alternative is actually this index of an array. So here, I can say event, and because this is actually not a real array, so I need to
04:10
do this trick for each, and call, and EVT.results zero, and here I need to create
04:24
a callback and say transcript, and I will print out confidence as well, and I hope I have enough brackets here.
04:44
So now it should give me the alternatives, and you see I have like different things what I said. I wasn't really clear, and I guess none of these are right. NDC London Rocks, boom, okay, so 78%, I said NDC London Rocks, you could say NDT London,
05:12
or whatever. What's that? NDC? Is it bad?
05:23
Okay, I will turn this off for now. I'll say one, and if you notice, you know, when I say something here, let me actually revert my callback to the original one, okay, this one is good, so you know, if I say something,
05:41
for example, I start the sentence, oops, I'm fine, what did I do wrong, transcript, if I start the sentence, okay, that's cool, but you could see that it's, after I recognize
06:02
something, it stopped listening, so there's also a flag called continuous, which is set to false, if I set it to true, then this will add more to, it will append the results to the array, and this is the first index, so I didn't want to do that, so the second
06:26
index is for alternatives, first index is for the next recognized parts of whatever I said, so if I want to, for example, listen for the last thing that I said, I can say
06:43
here EVT.results.length minus one, and I'll put the same in here, boom, I hope it's true,
07:01
and I start listening, now I start my sentence, and then I can continue talking, and it will be still recognizing my speech, until I say stop. So this is like another mode, yeah, sometimes it's funny, probably there is some issue
07:26
about it here, let's hope for that, it's still experimental feature. So this is speech recognition, but what you can also do is you can emit voice, so we can create a new speech synthesis, and I can say here NDC London is great, and then
08:04
I need to say speech synthesis, speak, and pass this message into here, click enter, do we have audio?
08:20
Do we have audio? Okay, one more time. And you could notice some German accent here, right? So I can modify the message, there is language, and you see the language is not set, for example, for sake of correctness, I will set it to British, much better, right?
08:44
You can also do US accent, and what is even cooler, you can also do Polish accent. So you can play around with that.
09:02
And, you know, when I was playing mainly with speech recognition, because I was inspired in this idea that it would be so cool to add the voice commands to the website, and I put up the small library called voice commander, and you can find it on GitHub, it's just
09:21
a wrapper on top of the WebKit speech recognition API, and what it allows you to do, you can install it with NPM or Bower, you can just add a command to your code with simple JavaScript, one minor, so you just say command is whatever I said, right? And the callback function is whatever you have to do when you add this command.
09:43
I also put up the small website called book slip, and I hope you can see it. So if I say start, you see? It's asking me if I'm giving the permission to allow the microphone. If you're running the website on HTTP, it will ask you every time when you do
10:03
reload. If you don't want to do that, you can run over HTTP, it will just ask for permission once. So here I will say allow. Books, favorites, favorites, top ten, search JavaScript.
10:43
So this is, like, an idea how you can put this into your website, and here I have, you know, on top you can notice I have this start and stop, and I also have get single command, so if I click get single command, it will just get one command and stop listening, so I say get single command, and it will ask me to allow again, fortunately.
11:03
Home. Okay. Probably didn't recognize very well. Top ten. And go to command and stop listening.
11:21
So please check out my voice commander library, you can find it on GitHub, maybe you find it useful, and there are some links when you can see the demos on website about web speech API, there is even web speech API spec, for now it's supported only by Google
11:42
Chrome. If you want to have this in Microsoft Edge, you can do it. All you have to do is Microsoft, Microsoft Edge, speech recognition API, user voice,
12:02
I think, Google for this, and there is user voice on the Windows website, and you can see we have, like, 500 votes now, so if you add all of your votes, you can give maximum three votes per person, so you can, like, get another 100 out of this room,
12:21
I think, and I think if we get closer than Microsoft Edge team will be more encouraged to implement this feature in Microsoft Edge browser, too. So here is a link to my voice commander library on GitHub, and there is also this
12:40
book slip website I showed you, so thank you very much, and send me an email if you have any questions about that. Thank you.