Accessible input for readers, coders, and hackers
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Untertitel |
| |
Serientitel | ||
Anzahl der Teile | 275 | |
Autor | ||
Lizenz | CC-Namensnennung 4.0 International: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/52101 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
Chi-Quadrat-VerteilungMaschinencodeComputerTelekommunikationEin-AusgabeTouchscreenSystemprogrammierungSoftwareJensen-MaßProgrammierungSprachausgabeEin-AusgabeHackerTelekommunikationComputerBildschirmmaskeNormalvektorTouchscreenCASE <Informatik>TypentheorieMustererkennungWeg <Topologie>Schreib-Lese-KopfBitMaschinencodeRechenschieberVirtuelle MaschineProgrammierungSystemprogrammierungJensen-MaßDifferenteÄquivalenzklasseSprachausgabeProgrammSprachsyntheseLokales MinimumSoftwareResultanteWort <Informatik>ComputersicherheitArithmetisches MittelMomentenproblemMaschinenschreibenPhysikalisches SystemExtreme programmingDruckverlaufZellularer AutomatSchlüsselverwaltungMAPComputeranimationXML
03:47
Extreme programmingTelekommunikationIntelPrototypingSprachausgabeInterpretiererHardwareSoftwareBeobachtungsstudiePunktWort <Informatik>TaskSystemaufrufEin-AusgabeGruppenoperationVollständigkeitWiderspruchsfreiheitExpertensystemEin-AusgabeMereologieMustererkennungIndividualsoftwareHumanoider RoboterWort <Informatik>MaschinencodeAbgeschlossene MengeHardwareKünstliches LebenE-MailDatenverarbeitungssystemFamilie <Mathematik>SprachsyntheseSISPTeilmengeSprachausgabeDifferenteZweiNormalvektorBitBrowserTelekommunikationNatürliche ZahlApp <Programm>Mapping <Computergraphik>SystemprogrammierungGruppenoperationCursorSmartphoneTrennschärfe <Statistik>PortscannerSchlüsselverwaltungVorhersagbarkeitMathematikProjektive EbeneArithmetisches MittelVollständigkeitSystemaufrufWiderspruchsfreiheitRechter WinkelInformationGrundraumPhysikalismusComputeranimation
07:55
EnergiedichteHalbleiterspeicherMusterspracheWiderspruchsfreiheitGruppenoperationSoftwareentwicklerMultiplikationsoperatorMomentenproblemSchnelltasteKurvenanpassungSprachsyntheseInterface <Schaltung>ExpertensystemTouchscreenOrdnung <Mathematik>TypentheorieBesprechung/Interview
08:56
VollständigkeitEin-AusgabeThreadBrowserDatensichtgerätInformationWeb-SeiteSchnelltasteSynchronisierungPen <Datentechnik>Elektronisches BuchTablet PCSoftwareSprachausgabeSoftwareentwicklerServerWort <Informatik>DatenmodellEin-AusgabeSprachsyntheseSchnelltasteComputerApp <Programm>Web-SeiteMustererkennungHumanoider RoboterDifferentePen <Datentechnik>TouchscreenBrowserServerExpertensystemMinimumFächer <Mathematik>Interface <Schaltung>ForcingEinsSISPVollständigkeitVersionsverwaltungIndividualsoftwareImplementierungZweiInformationsspeicherungElektronisches BuchBitZustandsmaschineOpen SourceGüte der AnpassungInstallation <Informatik>Lesen <Datenverarbeitung>Ganze FunktionProgrammierungE-Book-ReaderWeb-ApplikationPlug inGraphische BenutzeroberflächeBetrag <Mathematik>VerschlingungEinfach zusammenhängender RaumTeilmengeTwitter <Softwareplattform>SoftwarePhysikalisches SystemQuick-SortGoogolZahlenbereichRechter WinkelProgrammbibliothekEndliche ModelltheorieMultiplikationsoperatorFormale SpracheTelekommunikationRechnernetzPunktDatenverarbeitungssystemBitrateSchlüsselverwaltungSchnitt <Mathematik>DatenbankCodecGraphDigitaltechnikFamilie <Mathematik>Formation <Mathematik>Web SiteComputeranimation
16:00
Web-SeiteICC-GruppeE-MailGarbentheorieSpiraleKreisbewegungMusterspracheMereologieWeb-SeiteMAPDatensatzEinfach zusammenhängender RaumDienst <Informatik>Humanoider RoboterApp <Programm>ServerLesen <Datenverarbeitung>Computeranimation
16:47
Endliche ModelltheorieKonstanteMobiles InternetWeb-SeiteGruppenoperationMAPEndliche ModelltheorieWeb-SeiteMaschinencodeE-Book-ReaderSharewareSoftwareentwicklerMomentenproblemSoftwareGoogolVerschlingungInformationsspeicherungMustererkennungElektronisches BuchHumanoider RoboterComputeranimation
17:53
ICC-GruppeInformationHIP <Kommunikationsprotokoll>FiletransferprotokollBildschirmschonerMagnetblasenspeicherDatenverarbeitungssystemComputerExpertensystemSystemprogrammierungBrowserPlug inNatürliche ZahlOrbit <Mathematik>GraphiktablettMaschinencodeSprachausgabeOpen SourceFormale GrammatikStandardabweichungRechnernetzClientServerHardwareKonfiguration <Informatik>Wort <Informatik>SharewareImplementierungVersionsverwaltungPunktExpertensystemSharewareWeb-SeiteFormale GrammatikStandardabweichungServerWeg <Topologie>ForcingEndliche ModelltheorieFächer <Mathematik>ClientApp <Programm>OrtsoperatorOpen SourceEigentliche AbbildungZweiTablet PCDatenverarbeitungssystemBrowserHumanoider RoboterCASE <Informatik>SchlüsselverwaltungVollständigkeitTypentheorieWort <Informatik>ARM <Computerarchitektur>SprachausgabeNormalvektorMathematikMailing-ListeBitComputerarchitekturAbgeschlossene MengeFacebookMaschinencodeBildschirmfensterPhysikalisches SystemQuick-SortSchnelltasteInformationNachlauf <Strömungsmechanik>HardwareSprachsyntheseSchnittmengeComputerMultiplikationsoperatorBitrateMAPSystemprogrammierungStrömungswiderstandVirtuelle MaschineEinfügungsdämpfungNeue MedienFamilie <Mathematik>Inverser LimesQuellcodeBAYESProgramm/QuellcodeComputeranimation
22:50
SharewareSchnelltasteVersionsverwaltungSinusfunktionFormale GrammatikPunktGamecontrollerZahlenbereichComputerOffice-PaketWort <Informatik>Minkowski-MetrikUniformer RaumHochdruckSpitze <Mathematik>StandardabweichungSchnelltasteProgrammierspracheStreaming <Kommunikationstechnik>MAPWellenpaketFächer <Mathematik>Projektive EbeneBootstrap-AggregationTermVersionsverwaltungKurvenanpassungProgrammBitDämpfungRauschenElektronischer FingerabdruckMaschinencodeHash-AlgorithmusOffene MengeRechter WinkelRadikal <Mathematik>
24:54
SchnelltasteVersionsverwaltungSinusfunktionSprachsyntheseMustererkennungPhysikalisches SystemFormale GrammatikHardwareWellenpaketProjektive EbeneBefehlsprozessorFächer <Mathematik>Rechter WinkelVideokonferenzNormalvektorGüte der Anpassung
25:35
Web SitePunktGamecontrollerSichtenkonzeptMultiplikationsoperatorComputeranimation
26:06
Innerer PunktEin-AusgabeElektronisches BuchWeb-SeiteSprachausgabeVollständigkeitGruppenoperationWiderspruchsfreiheitFormale GrammatikStützpunkt <Mathematik>SprachsyntheseEin-AusgabePhysikalisches SystemGruppenoperationFormale GrammatikMustererkennungWeb-SeiteElektronisches BuchProgrammierumgebungHumanoider RoboterTypentheorieMultiplikationsoperatorGüte der AnpassungLeistung <Physik>VollständigkeitExpertensystemWiderspruchsfreiheitLesezeichen <Internet>FreewareEindeutigkeitLesen <Datenverarbeitung>Computeranimation
28:02
SprachsyntheseSystemprogrammierungMustererkennungEin-AusgabeApp <Programm>ComputerKartesische KoordinatenPhysikalisches SystemStatistikTwitter <Softwareplattform>MultiplikationsoperatorVerschlingungWeb-SeiteZahlenbereichBitVideokonferenzInformationsspeicherungBitrateUnrundheit
30:13
ZweiIdentitätsverwaltungBitrateWort <Informatik>SprachsyntheseSoftwareBitElektronischer FingerabdruckPhysikalisches SystemTouchscreenFormale GrammatikZeichenvorratSubstitutionSchnelltasteÄhnlichkeitsgeometrieProjektive EbeneBenutzerbeteiligungProgrammierungMultiplikationsoperatorInteraktives FernsehenMathematikElektronisches BuchGüte der AnpassungBandmatrixWeb-SeiteAutorisierungHumanoider RoboterSystemprogrammierungWiderspruchsfreiheitAusnahmebehandlungOnlinecommunityMagnetbandlaufwerkInhalt <Mathematik>Mailing-ListeBildschirmfensterE-MailSoftwareentwicklerMinimumElektronisches ForumMaschinencodeVerschlingungMetropolitan area networkDateiformatTropfenSchwingungWeb SiteJensen-MaßGesetz <Physik>GefrierenMeterComputeranimation
35:39
Finite-Elemente-MethodeSchlussregelComputeranimation
36:09
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:03
All right, so again, let's introduce the next talk, Accessible Inputs for Readers, Coders and Hackers, the talk by David Williams-King about custom, well, not off the shelf, but
00:27
custom accessibility solutions. He will give you some demonstrations, and that includes his own custom-made voice input and eyelid blink system. Here is David Williams-King.
00:40
Thank you for the introduction. Let's go ahead and get started. So yeah, I'm talking about accessibility, particularly accessible input for readers, coders and hackers. So what do I mean by accessibility? I mean people that have physical or motor impairments. This could be due to repetitive strain injury, carpal tunnel, all kinds of medical conditions.
01:06
If you have this type of thing, you probably can't use a normal computer keyboard, computer mouse, or even a phone touchscreen. However, technology does allow users to interact with these devices just using different forms of input.
01:21
And it's really valuable to these people because, you know, being able to interact with a device provides some agency, they can do things on their own, and it provides a means of communication with the outside world. So it's an important problem to look at. And it's one I care about a lot. Let's talk a bit about me for a moment.
01:41
I'm a systems security person. I did a PhD in cybersecurity at Columbia. If you're interested in low-level software defenses, you can look that up. And I'm currently the CTO at a startup called Alpha Secure. I started developing medical issues in around 2014.
02:02
And as a result of that, in an ongoing fashion, I can only type a few thousand keystrokes per day. Roughly 15,000 is my maximum. That sounds like a lot, but imagine you're typing at 100 words per minute. That's 500 characters per minute, which means it takes you 30 minutes to hit 15,000
02:21
characters. So essentially, I can work the equivalent of a fast programmer for half an hour. And then after that, I would be unable to use my hands for anything, including preparing food for myself or opening and closing doors and so on. So I have to be very careful about my hand use.
02:40
And I actually have a little program that you can see on the slide there that measures the keystrokes for me so that I can tell when I'm going over. So what do I do? Well, I do a lot of pair programming, for sure. I log into the same machine as other people and we work together. I'm also a very heavy user of speech recognition. And I gave a talk at that about voice coding with speech recognition at the Hope
03:05
11 conference. So you can go check that out if you're interested. So when I talk about accessible input, I mean different ways that a human can provide input to a computer. So ergonomic keyboards are a simple one.
03:21
Speech recognition, eye tracking or gaze tracking. So you can see where you're looking or where you're pointing your head and maybe use that to replace a mouse. That's head gestures, I suppose. And there's always this distinction between bespoke, like custom input mechanisms and somewhat mainstream ones.
03:40
So I'll give you some examples. You've probably heard of Stephen Hawking. He's a very famous professor and he was actually a bit of an extreme case. He was diagnosed with ALS when he was 21. So his physical abilities degraded over the years because he lived for many decades
04:00
after that. And he went through many communication mechanisms. Initially, his speech changed so that it was only intelligible to his family and close friends, but he was still able to speak. And then after that, he would work with a human interpreter and raise his eyebrows to pick various letters. And then keep in mind, this is like the 60s or 70s, right?
04:22
So computers were not really where they are today. Later he would operate a switch with one hand, just like on off, on off kind of Morse code and select from a bank of words. And that was around 15 words per minute. Eventually, he was unable to move his hand. So a team of engineers from Intel worked with him and they figured out they were trying
04:44
to do like brain scans and all kinds of stuff. But again, this was like in the 80s. So there was not too much they could do. So they basically just created some custom software to detect muscle movements in his cheek. And he used that with predictive words, the same way that a smartphone keyword will predict
05:04
like which word you want to say next. Stephen Hawking used something similar to that, except instead of swiping on a phone, he was moving his cheek muscles. So that's obviously a sequence of like highly customized input mechanisms for someone and very, very specialized for that person.
05:22
I also want to talk about someone else named Professor Songmuk Lee, whom I've met. That was me when I had more of a beard than I do now. He's a professor at Seoul National University in South Korea. And he's sometimes called like the Korean Stephen Hawking because he's a big advocate
05:43
for people with disabilities and whatnot. Anyway, what he uses is you can see a little orange device near his mouth there. It's called a sip and puff mouse. So he can blow into it and suck air through it and also move it around. And that acts as a mouse cursor on the Android device in front of him.
06:00
It will move the cursor around and click when he blows air and so on. So that combined with speech recognition lets him use mainstream Android hardware. He still has access to, you know, email apps and like web browsers and like maps
06:21
and everything that comes on a normal Android device. So he's way more capable than Stephen Hawking was because Stephen Hawking could communicate but just to a person at a very slow rate, right? Part of it's due to the nature of his injury, but it's also a testament to how far the technology has improved.
06:42
So let's talk a little bit about what makes good accessibility. I think performance is very important, right? You want high accuracy, you don't want typos, low latency, I don't want to speak and then five seconds later have words appear, it's too long. Especially if I have to make corrections, right? And you want high throughput, which we already talked about.
07:03
Oh, I forgot to mention, Stephen Hawking had like, you know, 15 words per minute. A normal person speaking is 150. So that's a big difference. The higher throughput you can get, the better. And for input accessibility, I think, and this is not scientific, this is just what I've learned from using it myself and observing many of these systems.
07:22
I think it's important to get completeness, consistency, and customization. For completeness, I mean, can I do any action? So Stephen, or Professor Sungwook Lee, his orange mouth input device, the sip and puff is quite powerful, but it doesn't let him do every action.
07:43
For example, for some reason, when he gets an incoming call, the input doesn't work. So he has to call over a person physically to like tap the accept call button or the reject call button, which is really annoying, right? If you don't have completeness, you can't be fully independent. Consistency, very important as well.
08:00
The same way we develop motor memory for a muscle memory for a keyboard, you develop memory for any types of patterns that you do. But if the thing you say or the thing you do keeps changing in order to do the same action, that's not good. And finally, customization. So the learning curve for beginners is important for any accessibility device,
08:21
but designing for expert use is almost more important because anyone who uses an accessibility interface becomes an expert at it. The example I like to give is screen readers, like a blind person using a screen reader on a phone. They will crank up the speed at which the speech is being produced.
08:40
And I actually met someone who made his speech 16 times faster than normal human speech. I could not understand it at all. It sounded like brrr, but he could understand it perfectly. And that's just because he used it so much that he's become an expert at its use. Let's analyze ergonomic keyboards just for a moment, because it's fun. You know, they are kind of like a normal keyboard.
09:02
They'll have a slow pace when you're starting to learn them. But once you're good at it, you have very good accuracy, like instantaneous low latency, right? You press a key, the computer receives it immediately, and very high throughput. As high as you are on a regular keyboard. So they're actually fantastic accessibility devices, right?
09:20
They're completely compatible with original keyboards. And if all you need is an ergonomic keyboard, then you're in luck because it's a very good accessibility device. I'm going to talk about two things, computers, but also Android devices. So let's start with Android devices. Yes, the built-in voice recognition in Android is really incredible.
09:42
So even though the microphones on the devices aren't great, Google has just collected so much data from so many different sources that they've built better than human accuracy for their voice recognition. The voice accessibility interface is kind of so-so. We'll talk about that in a bit. That's the interface where you can control the Android device entirely by voice.
10:02
For other input mechanisms, you could use like a sip and puff device, or you could use physical styluses. That's something that I do a lot, actually, because for me, my fingers get sore, and if I can hold a stylus in my hand and kind of not use my fingers, then that's, you know, very effective. So and the Elicom styluses from a Japanese company are the lightest I've found,
10:23
and they don't require a lot of force. So the ones at the top, there are like 12 grams, and the one at the bottom is 4.7 grams, and you require almost no force to use them, so very nice. On the left there, you can see the Android speech recognition. It's built into the keyboard now, right? You can just press that and start speaking, and it supports different languages,
10:43
and it's very accurate, it's very nice. And actually, when I was working at Google for a bit, I talked to the speech recognition team and I was like, why are you doing on-server speech recognition? You should do it on the devices. But of course, Android devices are, they're all very different, and many of them are not very powerful, so they were having trouble getting satisfactory speech recognition on the device.
11:05
So for a long time, there's some server latency, server lag, right? You do speech recognition and you wait a bit. And then sometime this year, I just was using speech recognition, and it became so much faster. I was extremely excited, and I looked into it, and yeah, they just switched, on my device at least, they switched on the on-device speech recognition model,
11:22
and so now it's incredibly fast and also incredibly accurate. I'm a huge fan of it. On the right-hand side, we can actually see the voice access interface. So this is meant to allow you to use a phone entirely by voice. Again, while I was at Google, I tried the beta version before it was publicly released, and I was like, this is pretty bad.
11:40
Mostly because it lacked completeness. There would be things on the screen that would not be selected. So here we see show labels, and then I can say like four, five, six, whatever to tap on that thing. But as you can see at the bottom there, there's like a Twitter web app link, and there's no number on it. So if I want to click on that, I'm out of luck. And this is actually a problem in the design of the accessibility interface.
12:05
It doesn't expose the full DOM. It exposes only a subset of it. And so an accessibility mechanism can't ever see those other things. And furthermore, the way the Google speech recognition works,
12:21
they have to re-establish a new connection every 30 seconds. And if you're in the middle of speaking, it would just throw away whatever you were saying because it just decided it had to reconnect, which is really unfortunate. They later released the app publicly, and then sometime this year they did an update, which is pretty nice. It now has like a mouse grid, which solves a lot of the completeness problems.
12:41
Like you can use a grid to narrow down somewhere on the screen and then tap there. But the server issues and the expert use is still not good. Okay, if I want to do something with the mouse grid, I have to say mouse grid on, six, five, mouse grid off. And I can't combine those together.
13:01
So there's a lot of latency, and it's not really that fun to use, but better than nothing, absolutely. I just want to really briefly show you as well that this same feature of like being able to select links on a screen is available on desktops. This is a plugin for Chrome called Vimium, and it's very powerful because you can then combine this with keyboards
13:22
or other input mechanisms. And this one is complete. It uses the entire DOM and anything you can click on will be highlighted. So very nice. I just want to give a quick example of me using some of these systems. So I've been trying to learn Japanese, and there's a couple of highly regarded websites for this, but they're not consistent when I use the browser show labels,
13:40
like the thing to press next page or something like that, or like I give up or whatever, it keeps changing. So the letters that are being used keep changing, and that's because of the dynamic way that they're generating the HTML. So not really very useful. What I do instead is I use a program called Anki, and that has very simple shortcuts in its desktop app,
14:02
one, two, three, four. So it's nice to use and consistent. And it syncs with an Android app, and then I can use my stylus on the Android device. So it works pretty well. But even so, as you can see from the chart in the bottom there, there are many days when I can't use this even though I would like to because I've overused my hands
14:21
or I've overused my voice. When I'm using voice recognition all day, every day, I do tend to lose my voice. And as you can see from the graph, I lose it for a week or two at a time. So same thing with any accessibility interface. You've got to use many different techniques, and it's never perfect. It's just the best you can do at that moment.
14:44
Something else I like to do is read books. I read a lot of books, and I love e-book readers. The dedicated e-ink displays, you can read them in sunlight, they last forever battery-wise. Unfortunately, it's hard to add other input mechanisms to them. They don't have microphones or other sensors,
15:01
and you can't really install custom software on them. But for Android-based devices, and they're also like e-book reading apps for Android devices, they have everything. You can install custom software, and they have microphones and many other sensors. So I made two apps that allow you to read e-books with an e-book reader. The first one is Voice Next Page. It's based on one of my speech recognition engine
15:23
called Silvius, and it does do server-based recognition. So you have to capture all the audio, use 300 kilobits a second to send it to the server, and recognize things like next page, previous page. However, it doesn't cut out every 30 seconds. It keeps going. So that's one win for it, I guess.
15:41
And it is published in the Play Store. Huge thanks to Sarah Leventhal, who did a lot of the implementation. Very complicated to make an accessibility app on Android, but we persevered and works quite nicely. So I'm going to actually show you an example of Voice Next Page.
16:01
Over here, this is my phone on the left-hand side, just captured so that you guys can see it. So here's the Voice Next Page. And basically, the connection's green. I can do the servers up and running and so on. I just press start, and then I'll switch to an Android reading app and say next page, previous page.
16:20
I won't speak otherwise because it will chapel everything I'm saying. Next page. Next page. Previous page.
16:41
Center. Center. Foreground. Stop listening. So that's a demo of the Voice Next Page, and it's extremely helpful. I built it a couple of years ago along with Sarah,
17:02
and I use it a lot. So yeah, you can go ahead and download it if you guys want to try it out. And the other one is called Blink Next Page. So the idea for this, I got this idea from a research paper this year that was studying eyelid gestures. I didn't use any of their code, but it's a great idea.
17:20
So the way this works is you detect blinks by using the Android camera, and then you can trigger an action like turning pages in an ebook reader. This actually doesn't need any networking. It's able to use the on-device face recognition models from Google. And it is still under development, so it's not on the Play Store yet,
17:40
but it is working. And please contact me if you want to try it. So just give me one moment to set that demo up here. And so I'm going to use the main problem with this current implementation
18:00
is that it uses two devices. So that was easier to implement. And I use two devices anyway, but obviously I want a one-device version if I'm actually going to use it for anything. Here's how this works. This device, I point at me, at my eyes. The other device I put wherever it's convenient to read.
18:23
And if I blink my eyes, the phone will buzz once it detects that I've blinked my eyes, and it will turn the page automatically on the other Android device. Now I have to blink both my eyes for half a second. If I want to go backwards, I can blink just my left eye.
18:41
And if I want to go forwards, like quickly, I can blink my right eye and hold it. Anyway, it does have some false positives. That's why I like, you can go backwards in case it detects that you've accidentally flipped the page. And lighting is also very important.
19:00
Like if I have a light behind me, then this is not going to be able to identify whether my eyes are open or closed properly. So it has some limitations, but very, very simple to use. I'm a big fan. Okay, so that's enough about Android devices. Let's talk very briefly about desktop computers.
19:24
So if you're going to use a desktop computer, of course, try using that show labels plugin in a browser for native apps. You can try dragon naturally speaking, which is fine if you're just like using basic things. But if you're trying to do complicated things, you should definitely use a voice coding system. You could also consider using eye tracking
19:41
to replace a mouse. Personally, I don't use that. I find it hurts my eyes, but I do use a trackball with very little force and a welcome tablet. Some people will even scroll up and down by humming, for example, but I don't have that set up. There's a bunch of nice talks out there on voice coding. The top left is Tavis Redd's talk from many years ago
20:02
that got many of us interested. Emily Shia gave a talk there about like best practices for voice coding. And then I gave a talk a couple of years ago at the Hope 11 conference, which you can also check out. It's mostly out of date by now, but it's still interesting. So there are a lot of voice coding systems.
20:24
The sort of grandfather of them all is Dragonfly. It's become a grammar standard. Caster is if you're willing to memorize lots of unusual words, you can become much better, much faster than I currently am at voice coding.
20:40
Ania is how you originally used Dragon to work on a Linux machine, for example, because Dragon only runs on Windows. Talon is a closed source program, but it's very, very powerful, has a big user base, especially for Mac OS. There are ports now. And Talon used to use Dragon, but it's now using a speech system from Facebook.
21:04
Silvius is the system that I created. The models are not very accurate, but it's a nice architecture where there's client server. So it makes it easy to build things like the Voice Next page. So the Voice Next page was using Silvius. And then the most recent one, I think on this list is Kaldi Active Grammar,
21:21
which is extremely powerful and extremely customizable. And it's also open source. It works on all platforms. So I really highly recommend that. So let's talk a bit more about Kaldi Active Grammar. But first for voice coding, I've already mentioned you have to be careful how you use your voice, right? Breathe from your belly. Don't tighten your muscles and breathe from your chest.
21:40
Try to speak normally. And I'm not particularly good at this. Like you'll hear me when I'm speaking commands that my inflection changes. So I do tend to overuse my voice, but yeah, I just have to be conscious of that. The microphone hardware does matter. I do recommend like a Blue Yeti on a microphone arm that you can pull and put close to your face like this. I'll use this one for my speaking demo.
22:02
And yeah. And the other thing is your grammar is fully customizable. So if you keep saying a word and the system doesn't recognize it, just change it to another word. And it's complete in the sense you can type any key on the keyboard. And the most important thing for expert use or customizability is that you can do chaining.
22:21
So with a voice coding system, you can say multiple commands at once. And it's a huge time savings. You'll see what I mean when I give a quick demo. When I do voice coding, I'm a very heavy Vim and Tmux user. You know, there've been, I've worked with many people before, so I have some cheat sheet information there.
22:41
So if you're interested, you can go check that out. But yeah, let's just do a quick demo of voice coding here. Turn this mic on. Desk left two. Control Delta. Open new terminal. Charlie Delta space slash Tango Mic Papa enter. Command Vim. Hotel, hotel point Charlie Papa Papa enter.
23:04
India hash word include space Langle. India Oscar words stream wrangle. Enter, enter. India noise Tango space word main. Nope. Mike arch India noise. Len Ren space lace enter, enter race up tab.
23:20
Word print Fox. Scratch nope. Code standard. Charlie Oscar. Uniform Tango space. Langle, Langle space. Quote. Sentence hello voice coding bang. Scratch six. Delta India. Noi golf bang backslash. Noi quote.
23:40
Semi-colon act. Sky Fox. Mike Romeo. Noi Oscar. Word return space number zero. Semi-colon act. Vim save and quit. Golf plus plus space. Hotel, hotel tab minus Oscar space. Hotel, hotel enter point slash hotel, hotel enter. Desk right two.
24:03
So that's just a quick example of voice coding. You can use it to write any programming language. You can use it to control anything on your desktop. It's a very powerful. It has a bit of a learning curve, but it's very powerful. So the creator of Cal Deactive Grammar is also named David.
24:20
I'm named David, but just a coincidence. And he says of Cal Deactive Grammar that I haven't typed with a keyboard in many years. And Cal Deactive Grammar is bootstrapped in that I have been developing it entirely using the previous versions of it. So David has a medical condition that means he has very low dexterity.
24:43
So it's hard for him to use a keyboard. And yeah, he basically got Cal Deactive Grammar working through the skin of his teeth or something and then continues to develop it using it. And yeah, I'm a huge fan of the project. I haven't contributed much, but I did give some of the hardware resources
25:02
like GPU and CPU compute resources to allow training to happen. But I would also like to show you a video of David using Cal Deactive Grammar just so you can see it as well. So the other thing about David is that he has a speech impediment
25:20
or a speech, I don't know, an accent or whatever. So it's difficult for a normal speech recognition system to understand him. And you might have trouble understanding him here, but you can see in the lower right what the speech system understands that he's saying. Oh, I realized that I do need to switch something in OBS so that you guys can hear it. There we go.
25:56
Anyway, you get the idea and hopefully you guys are able to hear that.
26:06
If not, you can also find this on the website that I'm going to show you at the end. Oh, one other thing I want to show you about this is David has actually set up this humming to scroll, which I think is pretty cool.
26:24
Of course, I've gone and turned off the OBS there, but he's just doing, and it's understanding that and scrolling down. So something that I'm able to do with my trackball, that he's using his voice for. So pretty cool. So I'm almost done here.
26:42
In summary, good input accessibility means you need completeness, consistency, and customization. You need to be able to do any action that you could do with the other input mechanisms. And doing the same input should have the same action. And remember, your users will become experts. So the system needs to be designed for that.
27:01
For ebook reading, yes, I'm trying to allow anyone to read, even if they're experiencing some severe physical or motor impairment, because I think that gives you a lot of power to be able to turn the pages and read your favorite books. And for speech recognition, yeah, Android speech recognition is very good. Silvius accuracy is not so good,
27:20
but it's easy to use quickly for experimentation and to make other types of things like voice next page. And please do check out Cali Active Grammar if you have some serious need for voice recognition. Lastly, I put all of this onto a website, voxhub.io, so you can see voice next page, BlinkNext page, Cali Active Grammar, and so on,
27:41
just instructions for how to use it and how to set it up. So please do check that out. And tons of acknowledgments, lots of people that have helped me along the way, but I want to especially call out Professor Sung-Wook Lee, who actually invited me to Korea a couple of times to give talks, a big inspiration. And of course, David Ziro has actually been able to bootstrap into a fully voice coding environment.
28:02
So that's all I have for today. Thank you very much. All right, I suppose I'm back on the air. So let me see. I want to remind everyone before we go into the Q&A that you can ask your questions for this talk on IRC.
28:23
The link is under the video, or you can use Twitter or the Fediverse with the hashtag rc32. Again, I'll hold it up here, rc3two. And wow, thanks for your talk, David. That was really interesting. Thanks for your talk, David.
28:42
I think we have a couple of questions from the Signal Angels. Before that, I just want to say, I recently spent some time playing with the voiceover system in iOS, and that can now actually tell you what is on a photo, which is kind of amazing. Oh, by the way, I can't hear you here on the mumble.
29:04
Yeah, sorry, I wasn't saying anything. Yeah, no, so I focus mostly on input accessibility, which is like how do you get data to the computer, but there's been huge improvements in the other way around as well, the computer doing voiceover. We have about, let's see, five, six minutes left at least for Q&A.
29:21
We have a question by Toby++. He asks, your NextPage application looks cool. Do you have statistics of how many people use it or found it on the App Store? Not very many. The Voice NextPage was advertised only so far as a little academic poster.
29:40
So I've gotten a few people to use it, but I run eight concurrent workers, and we've never hit more than that. So not super popular, but I do hope that some people will see it because of this talk and go and check it out. That's cool. Next question. How error-prone are the speech recognition systems at all? E.g. Can you do coding while doing workouts?
30:05
So one thing about speech recognition is very sensitive to the microphone. So when you're doing it, you don't see any mistakes, right?
30:40
That's the thing about having low latency. You just say something and you watch it and you make sure that it was what you wanted to say. I don't know exactly how many words per second, words per minute I can say with voice coding, but I can say it much faster than regular speech. So I would say at least like 200, maybe 300 words per minute. So it's actually a very high bandwidth. That's pretty awesome. Question from Peppy J. N. Divos.
31:01
Any advice for software authors to make their stuff more accessible? There are good web accessibility guidelines. So if you're just making a website or something, I would definitely follow those. They tend to be focused more on people that are blind,
31:21
because that is, you know, it's more of an obvious fail. Like they just can't interact at all with your website. But things like, you know, if Duolingo, for example, had used the same accessibility access tag on their next button, then they would always be the same letter for me.
31:40
And I wouldn't have to be like, Fox Charlie, Fox Delta, Fox something. Changes all the time. So I think consistency is very important. And integrating with any existing accessibility APIs is also very important. Web APIs, Android APIs, and so on. Because, you know, we can't make every program out there like voice compatible.
32:02
We just have to meet in the middle where they interact at the keyboard layer or the accessibility. Merrick N. has a question. Wonders if these systems use similar approaches like stenography with mnemonics or if there's any projects working having that in mind.
32:21
A very good question. So the first thing everyone uses is the NATO phonetic alphabet to spell letters, for example. Alpha, bravo, Charlie. Some people then will substitute letters for things that are too long, like November, I use noi. Sometimes the speech system doesn't understand you.
32:41
Whenever I said alpha, Dragon was like, oh, you're saying offer. So I had changed it. It's arch for me. Arch, brav, char. So and also most of these grammars are in a common grammar format. They are written in Python and they're compatible with Dragonfly. So you can grab a grammar for, I don't know, for a Nia and get it to work with caliactive grammar with very little effort.
33:02
I actually have a grammar that works on both a Nia and caliactive grammar, and that's what I use. So there's a bit of lingua franca. I guess you can kind of guess what other people are using, but at the same time, there's a lot of customization, you know. Because people change words, they add their own commands,
33:21
they change words based on what the speech system understands. Ellie B asks, is there an online community you can propose for accessibility technologies? There's an amazing forum for anything related to voice coding.
33:40
All the developers of new voice coding software are there. Sorry, just need a drink. So it's a really fantastic resource. I do link to it from voxhub.io. I believe it's at the bottom of the caliactive grammar page. So you can definitely check that out.
34:02
For general accessibility, I don't know, I could recommend the accessibility mailing list at Google, but that's only if you work at Google. Other than that, yeah, I think it depends on your community, right? I think if you're looking for web accessibility, you could go for some Mozilla mailing lists and so on. If you're looking for desktop accessibility,
34:22
then maybe you could go find some stuff about the Windows speech API. One last question from Joe Nielsen. Could there be legal issues if you make an ebook into audio? I'm not sure what that refers to. Yeah, so if you're using a screen reader
34:43
and you try to get it to read out the contents of an ebook, right? So most of the time there are fair use exceptions for copyright law, even in the US, and making a copy of yourself for personal purposes
35:03
so that you can access it is usually considered fair use. If you were trying to commercialize it or make money off of that, or like, I don't know, you're a famous streamer and all you do is highlight text and have it read it out, then maybe, but I would say that that definitely falls under fair use.
35:22
All right, so I guess that's it for the talk. I think we're hitting the timing mark really well. Thank you so much, David, for that. That was really, really interesting. I learned a lot and thanks everyone for watching and stay on. I think there might be some news coming up. Thanks and everyone.