We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Generate a DeepSpeech model with the help of your community

00:00

Formal Metadata

Title
Generate a DeepSpeech model with the help of your community
Subtitle
How to get fun with teamwork
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The story of how Mozilla Italia added the Italian language to Common Voice and after an year generated the language model. With the help of a lot of people in the various related project, developing tools and scripts, find and gather the sentences, do promotion and finally generate the model for Italian. A common issue in Common Voice is how to join and involve a community instead of doing all the tasks alone. Discourse is full of the same questions but there is no story or tutorial that show how this can be a way to work together for the same result, to benefit all the country/region. Mozilla is an inclusive community but the user case or the story behind from a specific community often is hidden and this is a problem, because Mozilla is open to everyone but the bureaucracy or the missing experience can be a blocker. Please note that this replaces the talk '(re)Activating the Common Voice project at a local level' by Redon Skikuli.
Source codeWordCorrelation and dependenceOpen sourceWebsiteDigital photographyLocal GroupInternationalization and localizationInternetworkingBitMultiplication signCore dumpDifferent (Kate Ryan album)Source codeFormal languageOpen sourceLocal ringProjective planeData managementCASE <Informatik>Software developerUsabilityJoystickPRINCE2CausalityArithmetic meanPattern recognitionSpeech synthesisCondition numberWebsiteLine (geometry)Computer animationMeeting/Interview
Multiplication signEndliche ModelltheoriePlanningFormal languageSet (mathematics)Speech synthesisProjective planeKeyboard shortcutMereologyPattern recognitionPublic domainGodArithmetic meanLevel (video gaming)Rule of inferenceReading (process)ECosGreatest elementLine (geometry)
Rule of inferenceFormal languageEvent horizonSlide ruleFormal languageSlide ruleRule of inferenceRow (database)Modul <Datentyp>Event horizonVideo gameComputer scienceArithmetic meanWordSoftware testingDifferent (Kate Ryan album)Multiplication signPoint (geometry)FeedbackMaxima and minimaUniverse (mathematics)Speech synthesisVirtual machineFamilyMaize40 (number)Model theoryComputer animation
PlastikkarteVirtual machineWater vaporAsynchronous Transfer ModeEquals signSlide ruleInternet forumTrailExecution unitAbelian categoryGroup actionInclusion mapLocal GroupFeedbackAreaCopyright infringementMereologyLeakCategory of beingVirtual machineInclusion mapRepetitionStaff (military)Latent heatCASE <Informatik>Multiplication signAssociative propertyTwitterLattice (order)Arithmetic meanOpen sourceFeedbackDifferent (Kate Ryan album)Internet forumVideo gameFormal languageGreatest elementMathematicsProjective planePlane (geometry)Point (geometry)ComputerSpeech synthesisTheory of relativityFacebookHypermediaSystem callWebsiteSoftware developerInformationHome pageData managementMonster groupTelecommunicationRow (database)Group actionDecision theoryAreaMachine learningPlanningWordStatisticsLink (knot theory)Online helpSlide ruleSound effectVideoconferencingYouTubeComputer animation
Android (robot)WebsiteScripting languageFeedbackData modelVirtual machineCompilation albumData managementOpen setComputer-generated imageryHeat transferTrailModel theoryVideoconferencingPlanningOnline helpMoment of inertiaEvent horizonLattice (order)Hacker (term)Endliche ModelltheorieFormal languageGreatest elementProjective planeGoogolGroup actionMultiplication signCASE <Informatik>Speech synthesisVideoconferencingProcess (computing)Moment (mathematics)MereologyKey (cryptography)Event horizonObservational studyArmLatent heat5 (number)HypermediaMobile appTouch typingHoaxSoftware developerOrdinary differential equationMassCondition numberCore dumpBus (computing)Matching (graph theory)Level (video gaming)BitSoftware testingForm (programming)Machine learningGoodness of fit1 (number)Medical imagingAreaAndroid (robot)RobotStapeldateiRow (database)Shift operatorHeat transferLetterpress printingText editorScripting languageServer (computing)Software repositorySoftwareSoftware maintenanceShared memoryCommutatorDifferent (Kate Ryan album)Data managementMathematicsPatch (Unix)Local ringData structureComputer animation
Data modelRevision controlMereologyMachine visionProfil (magazine)DialectForm (programming)Formal languageRow (database)Hand fanLatent heatCASE <Informatik>Group actionInternet forumRevision controlDifferent (Kate Ryan album)Set (mathematics)Slide ruleBlogData managementLink (knot theory)Bounded variationRegular expressionComputer animation
Endliche ModelltheorieMereologySimilarity (geometry)Sampling (statistics)Multiplication signPublic domainMathematicsBounded variationHeat transferPhase transitionFormal languageRow (database)Speech synthesisSoftware testingVolume (thermodynamics)Open sourceWave packetOpen setValuation (algebra)Logic gateVideo gameSource codeLecture/Conference
Point cloudOpen sourceFacebookLecture/Conference
Transcript: English(auto-generated)
Hey, welcome everybody. So just let's talk about a story About how we did it in our my community without so much As I can say trust about that we can do it
So before to start I want to talk about a little about me just to say that I have a bit of experience now Inside Mozilla the Mozilla community is not my first FOSDEM is not my first FOSDEM as speaker I usually talk in the Mozilla room every time about a different topic Because I'm involved in different kind of things. I have a lot of experience inside the community management
Also because I'm involved in the WordPress community as contributor core blah blah blah these kind of things So all this kind of experience was helpful to achieve these in show in our work of two years And so I've written a book about is free and open source about my experience of contributing to the source and this talk Use some of this stuff that already written so just talk about what is happening
So first of all, we need to understand that is a project from the Mozilla Italia community And what is a Mozilla Italia community? Well, it's kind of a many of the other Mozilla community that exists around the world There are also few people here, but you can find it online That's working in our case in the
For the linguistic so everything that involved Italian language But that's the local country because there are community that are involved with that So in many other countries because the language is spread from in different countries So we're one with the in localization support participation Development in many things and just you can see from the photo below be together in a lot of things
So maybe the next year will be your face in this light So maybe in the your community side is light that we have an idea that exists. So we love to work together And to work together we work in a plan of two years because the speech recognition the Mozilla world is split
mainly in two steps First of all the data these Mozilla is doing with the common voice project or voice dot Mozilla dot org project the where Communities or people can unlock their language there. So people can start recording and review Sentences Read it, of course
So we will have in a future a data set of the of public domain of any language It is a big project, of course, and they require a lot of time in a lot of ways When you have the data The next step is to use it. So Jumping on the last part is this speech this speech is another project when Mozilla is written in Rust with a lot of bindings
Etc. I don't have to explain what it is You can find it everything online that just need your data to generate a voice recognition model now Mozilla released also the text-to-speech project buzzed on that so you can use it in a lot of ways
So the first year we worked on starting up the project to gather data then in the next year We started working on using it So what we did the first year to succeed on the first goal Well, we unlock the language on the common voice portal this mean that we gathered a lot of sentences and
Define it some define some rules as an example How much need to be longer because we discovered when we tested this because we gather the like 11,000 sentences and we saw when we did the first events about Recording that we have some problems that like long sentences because Italian has long words
So we didn't test them and see that we don't have enough time to say everything So with this event the Italian I can come three years ago and with a lot of the Mozilla swag that we have we put Price just to get the most maximum amount of people testing it like we did. I don't know for
20 recordings you will get a Mozilla poster the problem that we got so many people that want to get swag that we have to Double that and we finish it anyway everything. So we tested what we did and we proved our rules The next step was experimenting promotion with zero fundings because we are community. We are volunteer
We do something else in the life. So we don't have so many pony to put on that So we experimented different ways Like organizing events in different cities with different people as speakers so we can have something in the ground They know better the place and know what is the best point to explain to get some
Feedback from the people to do with that. We did the Italian slide deck of like 40 slides with a lot of things To just to be to let to the speaker to have only the slides that they need as an example I have to go to speak an university in computer science might be much machine learning this speech is very interesting to explain
But if I go to a microspace, maybe they doesn't care of that. They have no knowledge about that So we need something more simple in that way with this kind of modular slides We got a lot of speaker also from outside the community promoted for us what we was doing it
These are kind of examples of the slide This is the common voice website just explain where you have to press the button these kind of things This was one that explain what is machine learning. We had more of them, but you have to consider was thinking But from the bottom to the top so for any kind of audience, so we explaining what is machine learning the most easy way
in this case So the next year what we did well We plan working a lot on marketing and promotion because we are a small community and we need feedback Italian is a language with 60 million speakers because this is the country so we think that might be there is people outside
That we don't know So we tested new social tactics to get promotion First of all was to write a big kind of weekly Post on the forum with a lot of updates from international and national about common voice in the speech This because we got always new people joining on our channels asking always the same questions
So we said maybe we need to save time To reply to these people doing something else because we have too many things to do And this worked very well because during the time appreciated a lot when we passed the new year how many things we did and We shared this post in a lot of new plays that we never tried during the years
Like a Redditor in the Italian subreddit about it a word and that's a specific Facebook Italian groups where we wasn't There were there are taking to statistics So not just developers open source people we need to find new people just to get the most amount of feedback that we can
and Also with the help of Mozilla we go to the promotion inside about home page of Firefox There are the snippets I don't know how many of you use so but there was a specific snippet for Italian news about try common voices for us was kind of incredible because We did it in common voice in five months 10 hour 10 hours of recordings
We two weeks with this promotion We got 40 hours just to say that for us was a is kind of incredible what we did with this Kind of a bit of promotion very good Also, we created a category on the discourse of Mozilla that is public is international you can find that also Mozilla employed
You can find all the community and there is a common voice category and also the speech and we got an Italian categories We're to speak Italian just for us because the common voice website referred to this course so there was the only link where people want to try to find us and
Now so we tested with specific kind of tweets but not from our Mozilla Italia social channel official But also ours with different kind of keywords a stack just to see what was working better One of the things that we did was to keep updated as alive because time comes there are changes project evolving
So we need to update it and we go to the new people that was using it And we have no idea that contributed back to improving it Next also, we created specific kind of activities For people with zero knowledge of Mozilla what works? What is the project and also with few time to contribute we saw?
Looking at the people that was joining us What was the most interested the things for them? What was the most uncommon? Background this kind of things just understand what the people was joining what we can offer just to them to keep them Part of what we are doing
The second part was to thinking about documentation and planning so mainly community management We define a draft in English this time not Italian and is public of course like the rest of staff About our needs from Mozilla and the project common voices mainly and we shot and we started to update
This document during the year. We know what we did what we cannot did without the problem etc and the same time We shared that because was public outside the community with these new channels that we found Again to be sure that everyone was informed But that's to get the most feedback to understand or to explain what we are doing because it's kind of new
or everything and We plan in the meeting with Mozilla employed that follow common voice Just to share this our needs from the community and people outside that have an idea. What is common voice? And to get answers and fix some things
So for us was very important to have this one-on-one relation with Mozilla and this is possible Also Review and gather ideas from all over the communities not maybe just me or another volunteer that say, okay We have to do this No we was open to everyone just to have the best effect that we can do with what we have a tiny community and
And we did we do what so usually a monthly video call that is open to everyone and the we publish a youtube after the call But it's open to everyone. So we got people that was just joining them just to be to saw what we are doing just you know people and
Define goals for any kind of area that want to do something later when we saw our people our volunteers What they can do we saw the priority So just now say we are these are the area but just to say for every area We want this as a priority and let us see who can do it everything
So for us was very important so that every decide decisions was open So everyone can say something because for Mozilla we need the transparent. We need inclusion of everyone and that's how we extended our internal ideas toward the community because we open at the
Treading the international category on this course to see us or the community What how was doing something that maybe we have an idea might be someone already did and often happens But there is no communication of them The third part was maybe the most one that we are interested as developer like me is development and experimentation
first of all, we created a repo that's buzzed on the scripts from The French community of Mozilla because they have a Mozilla employee that is Alexander Alexander Alexander, that is watching me right now probably and That is a working on this picture and if written a lot of bash script and local image
To generate the model for French. So we said well we can fork it Change what we need just to use Italian stuff like fights Corpus this kind of things and this was By just one person me that doesn't have so many experience about how works this speech
I am a developer. I was bash these kind of things, but they have an idea how works machine learning in the details So I started just because it was time the cool part that when I did everything We use a co-fix at the batch the batch script to do for Italian. I said to the world Community to all these new
Social groups these kind of things we need the testers when I say that a lot of people join it on testing it on their company with server of West Azure that are very a Cost so they just to try everything and contributed back to fix what wasn't working What was can be proved excited this kind of things?
In that way we got a new valid maintainers more than me skilled of this kind of things Then now are pushing on this kind of things Also for me was very important But that's so intriguing and thrilling because after seven years in said Mozilla community we got for the first time Companies that was contributing to us to our projects in this case
We got patch server support experimentation in this case the most simple is was a telegram bot Let's compare the recording with Google's picture and our model for us was really now we can play with something without Styling speech because it's not easy
And now me because the first person is just doing project management. There's something about development of this without following that part because I want to say that because it's not so scary getting a model from Who doesn't have knowledge you just need to start and ask because the community specific of a language
It's big and you can find more easily later talking with our community we built an Android app for common voice because there wasn't because it's our website because for the kind of very kind of reason
also to spread at must we can we Did a new left flat because we have this rebel about in Italian with the left left about other know Mozilla What is Firefox this kind of thing? So when there is an event by the know a Linux social group in a city and we don't have anyone there they can printing and share it and we did one about common voice at the speech and
During these years we contributed a lot of different common voice tools and project buzzed on our needs on problems that we find So This was the first big achievement for us a fist public official this speech model online built in October
With the Docker image to generate the model so you can do it But in two different data set there are not so many hours This is a problem that we are addressing but what we need to do now after to have the model Well, we have the model, but we don't have any structure or to use it
So we need to do a bit of documentation Also a week ago. There was a new release of common voice data So we our data set now is 90 hours and is still growing So we need to update the model and that's with a lot of things like transfer learning this kind of things and testing
Cool things was these because I joined a KDA Academy in Milan just because I'm a KDA user Sometimes contributor so I made the shift editor of this magazine Printed and He interviewed me about common voice of what we was doing in the end the first time after Firefox first moment
we got something in a Magazine talking about what the community was doing it There is also a video interview uploaded on YouTube, but that's in the DVD attached to the magazine So for us was very interesting because we didn't need any effort on doing this kind of marketing Just going around for us was very helpful because every idea that I got this for the community
So what is next this year for Italy will be very important because we will have two big international National international events and the past year. There was nothing just to understand What is the situation and we will join them with different ideas about testing it improving etc
We will want to create a so Italian videos about how to create To use common voice because we have people that always ask the same questions again Like what is the best microphone I have to do a review a sentence as an example So after a while is better to do a video that is more easy to see and instead of long documentation
Also, we need to do tutorials about using our model and we want to do it in Italian But there are a lot of projects that already implement this picture and release their tutorials but we want to do it in Italian for our model that way we can gather more people and
That's all getting ready for the Linux day because in Italy we don't have the free the software freedom day We have the Linux day These usually in 70 cities in the same day and as commute as Mozilla We have usually five or six cities with the Mozilla talks. We want to be sure to extend these our area
So closing it there is a long blog post version with all of the links. That's in the forum Because the Mozilla employees that's working on common voice asking me to write my experience And edit it and it also the slides so you can find a lot of links ideas
Etc there for any question, of course, I'm here instead if you're Italian want to more to know more We have telegram groups you can reach us with this both In case for the odors non-italian speakers, of course, I am here and finish it
Do people have questions? No questions good. Okay good to go This is a cool so the question is
How do you how do you manage variation with languages and the example was used as French?
well when there's accent between Regions and they don't use the same expressions and stuff This is a good example because if you see the French scripts, they're using different data sets like also for African But the good part is that common voice as when you are register a profile You can set what is your accent for specific languages like French and English mainly for Italian is complicated
So in the data set is reported for every recording is anonymous the user but you know, what is the accent? So if you want to be there's an example just use common voice for French but from people from poor French as we can say you can do it and
What is the quality when you mix different data set in this like this case? Well, this depends on the amount of hours more amounts of hours you put there might be the quality is better So it's kind of experiment in this a part of a part of the answer is
You need diversity in the recordings If you listen if you participate a bit and go to voice at Mozilla.org and you don't want to use your voice You can just listen you'll hear that. It's mainly male That are my age talking There's little Variation there. So if you have a girlfriend that wants to participate that helps already
If you have a girlfriend that's from the south with the proper south accent that helps already, etc, etc, etc the more Diverse training data the model has the more It's going to be able to extend and understand every accent and not usually not talk these accents
But at least understand them. So it's about the amount of it's it's about the volume of Samples the more the merrier exactly as one of the experiments that we want to do is for transfer learning use the Spanish model that a volunteer built and see how much change the quality of the Italian one is not the same language
But there are similarities and we are now in a phase of experimentation So we transfer learning is possible to test this kind of things and of course again I mean the big amount of hours you can put on the speech is better The voice that you need to tiny everything just to test the quality of the Italian model right now is kind of
Horrible Because we did it just in time for the Linux day at the time But we need more people that can help us out, but you can try for French and see what happens I don't speak French. I don't know the quality of the other one So is for any kind that we need more testing and experimenting because now
I think we are at the first steps to see what will happen The good part that is everything open source or public domain so you can use it for everything you want Other questions, thank you very much. That's it