Translate All The Things!
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 542 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/61404 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2023370 / 542
2
5
10
14
15
16
22
24
27
29
31
36
43
48
56
63
74
78
83
87
89
95
96
99
104
106
107
117
119
121
122
125
126
128
130
132
134
135
136
141
143
146
148
152
155
157
159
161
165
166
168
170
173
176
180
181
185
191
194
196
197
198
199
206
207
209
210
211
212
216
219
220
227
228
229
231
232
233
236
250
252
256
258
260
263
264
267
271
273
275
276
278
282
286
292
293
298
299
300
302
312
316
321
322
324
339
341
342
343
344
351
352
354
355
356
357
359
369
370
372
373
376
378
379
380
382
383
387
390
394
395
401
405
406
410
411
413
415
416
421
426
430
437
438
440
441
443
444
445
446
448
449
450
451
458
464
468
472
475
476
479
481
493
494
498
499
502
509
513
516
517
520
522
524
525
531
534
535
537
538
541
00:00
Local ringRight angleProjective planeINTEGRALClient (computing)Computer programmingTranslation (relic)NeuroinformatikMereologyUltimatum gameOpen sourceSemiconductor memoryRoundness (object)Natural languageDifferent (Kate Ryan album)BitServer (computing)YouTubeGoogolVariable (mathematics)SoftwareSlide ruleInternet service providerDiagramLecture/Conference
01:11
Endliche ModelltheorieNatural languageComputer hardwareCloningProjective planeMereologyEndliche ModelltheorieScripting languageMedical imagingVolume (thermodynamics)NeuroinformatikPower (physics)Software developerInstallation artOpen setNatural languageTranslation (relic)Scaling (geometry)WindowVirtual machineComputing platformSoftwareVariable (mathematics)Electronic mailing listSource codeRepresentational state transferComputer hardwareService (economics)Wave packetMaschinelle ÜbersetzungComputer programmingVideo gameComputer animation
03:01
Integrated development environmentDefault (computer science)Translation (relic)Social classVideo game consoleComputer animation
03:24
WebsiteWindowWorld Wide Web ConsortiumTranslation (relic)Online helpHydraulic jumpComputer programmingPhysical systemExterior algebraUser interfaceSoftware testingWeb pageSoftwareRight angleMachine visionComputer animation
04:09
WikiHyperlinkCompilerMereologySoftwareTranslation (relic)Software developerDependent and independent variablesComputer virusCASE <Informatik>Natural languageParsingComputer animation
05:09
RootWikiReal numberSpeciesTranslation (relic)Markup languageProgramming languageData managementLimit (category theory)Bit rateLocal ringNatural languageTranslation (relic)Physical systemLimit (category theory)FreewareBit rateEstimatorQuicksortPlanningPersonal digital assistantComputer programmingWeb 2.0Point (geometry)HypermediaPower (physics)Server (computing)WordMatrix (mathematics)Open sourceSoftware bugGroup actionSlide ruleOffice suiteProgramming languageOpen setComputer fileFile formatInstance (computer science)Arithmetic meanContext awarenessWeb browserElectronic mailing listSoftwarePoint cloudMaxima and minimaSubsetKey (cryptography)Flow separationGoodness of fitElectric generatorCodeMarkup languageComputer animation
08:13
Endliche ModelltheorieNatural languageTranslation (relic)Endliche ModelltheorieData managementComplete metric spaceComputer fileUser interfaceInference engineOpen sourceComputer architectureType theoryMathematical analysisWordCore dumpState of matterTransformation (genetics)Programming languageModule (mathematics)Computer programmingMultiplication signDistribution (mathematics)CodeProjective planeSoftwareDifferent (Kate Ryan album)Bridging (networking)GoogolCartesian coordinate systemGroup actionSubject indexingInstance (computer science)FamilyComputer animation
10:23
Endliche ModelltheoriePivot elementProgramming languageProcess modelingMeta elementVirtual machineSource codeGoogolSimilarity (geometry)Error messagePerformance appraisalInternet forumTranslation (relic)Uncertainty principleView (database)SoftwareAndroid (robot)Client (computing)Extension (kinesiology)Endliche ModelltheorieNatural languageContext awarenessWave packetTranslation (relic)Computer fileMenu (computing)Term (mathematics)Point (geometry)Set (mathematics)CASE <Informatik>Open setSource codeResultantPerspective (visual)Different (Kate Ryan album)Valuation (algebra)40 (number)SoftwareProcess modelingObservational studyGroup actionRight angleInformationMeasurementBitPhysical lawProjective planeGoogolSimilarity (geometry)Computer programmingMoment (mathematics)Programming languageRepository (publishing)Self-organizationOpen sourcePivot elementExtension (kinesiology)Labour Party (Malta)Type theoryLibrary (computing)1 (number)Client (computing)Process (computing)Bookmark (World Wide Web)Traffic reportingINTEGRALMathematical singularityForm (programming)Network topologyWordProblemorientierte ProgrammierspracheExterior algebraLevel (video gaming)Performance appraisalMetadataKeyboard shortcutElectronic mailing listHidden Markov modelComputer animation
16:57
Client (computing)Plug-in (computing)SoftwareAndroid (robot)Extension (kinesiology)GoogolFreewarePoint cloudInstance (computer science)CodeEndliche ModelltheorieExplosionKey (cryptography)DistanceNumberMultiplication signExtension (kinesiology)Natural languageProgramming languageElectronic mailing listSlide ruleEndliche ModelltheorieWeb 2.0INTEGRALMultimediaOpen sourcePairwise comparisonExterior algebraOnline helpForm (programming)Position operatorTranslation (relic)PlastikkartePlug-in (computing)SoftwareInternet forumTraffic reportingDifferent (Kate Ryan album)Local ringProjective planeMaschinelle ÜbersetzungClient (computing)Process modelingCodeData structureSimilarity (geometry)FreewareVirtual machineServer (computing)User interfaceComa BerenicesUniform resource locatorSource codeOffice suiteCategory of beingGoogolPower (physics)Functional programmingProgrammer (hardware)Software bugComputer animation
23:31
Local ringComputer fileWave packetRight angleLine (geometry)ResultantInternetworkingFile formatSource codeMoore's lawNatural languageProgramming languageData managementRepository (publishing)Online helpDialectGoodness of fitProcess modelingEndliche ModelltheorieGame theoryMultiplication signSpring (hydrology)LaptopNumberIterationSoftware maintenanceTranslation (relic)Computer architectureData structureInheritance (object-oriented programming)Greatest elementCASE <Informatik>MathematicsShared memoryComputer programmingPoint (geometry)Video gameFlow separationTerm (mathematics)Cloud computingInstance (computer science)Set (mathematics)Sampling (statistics)NeuroinformatikPower (physics)Ocean currentWordoutputAttribute grammarLecture/Conference
30:05
Program flowchart
Transcript: English(auto-generated)
00:05
So, let's dive right into it. LibreTranslate is a software that's a bit like Google Translate, but open source. It is AGPL3 license, so it's a strongly open source. In fact, we're going to keep it that way forever,
00:20
and let's you do natural language translation. It runs on your computer. This is one of the goals of the project. There are several other projects in the open source realm that have aimed to provide natural language translation. Except sometimes they require very large servers or a lot of memory, and our goal is to have this running on something as low as a Raspberry Pi.
00:43
So, that is very important to the project. The program has lots of clients and integrations. We'll cover some of those in the upcoming slides. And like many projects, it's available on GitHub, so you can go and check it out. But we're going to give you today a brief overview of how to
01:01
get started and start using it today. Let's talk briefly about why we decided to create it, and there was a need for the project to exist. We could not find a project that had all the variables that LibreTranslate can offer. These are a simple and open REST API that you can use to programmatically do translations,
01:22
so help automate part of the translation work that we need for the work. It offers a pre-trained and openly licensed language models. There are other projects that do machine translation, but again, sometimes they do not make the models, the AI models for the translation
01:41
openly available, and we have that. Finally, it runs again on commodity hardware, so it does not require server scale power to make the software work. And finally, it is very easy to get started, as you will see. So, talking about getting started, there are primarily two ways that you can get LibreTranslate to work on your computer.
02:01
The first one is if you have Python, you can simply run a pip install command, LibreTranslate, and afterwards, you run the program. And that's it. If you have Docker, which many developers like to use, we also have an option for that. We pre-build images for LibreTranslate that you can use, and we have a convenient script that will run it for you
02:24
and take care of a few details that let you have things like persistent volumes for downloading language models and some technical stuff. But to get started, all you need to do is go on GitHub, get a copy of our source code, and press run. We also have scripts for Windows and Mac OS and Linux,
02:44
so we try to support all major platforms. We're hoping to get other platforms in there as well, so things like FreeBSD and others are on the to-do list. So, we'll get there. So, let's actually try to run it. And I'm always a little scared of doing live demos,
03:01
but bear with me, we're gonna try it. What could go wrong? So, here is a console. I'm gonna quickly activate a Python environment where I have LibreTranslate already installed, and I'm gonna try to run it. And on Mac OS, I have to specify a different port than the default 5000.
03:22
I'm gonna try to run it. Okay, it seems to be working. So, I'm gonna jump back right into Chrome, and if I refresh the page, you will be presented with a friendly user interface that you can use to test the system, and even use it. It allows programmatic access to the software via an API,
03:44
but you can also use it as an alternative to Google Translate if you want to. So, we're gonna try to say something. Okay, so obviously, English to English is not gonna be helpful. How about French? Okay, so we translated hello world, bonjour le monde,
04:03
and it worked. But that's not too impressive, right? Like, okay, hello world. Let's try to look at something a little more realistic. And you can, before looking at something more realistic, you can also, of course, use it from an API.
04:23
In this case, I can invoke a cuda command and ask Libre Translate to perform a translation. I want it to automatically detect the language, where the translation is coming from. And finally, I want to translate into something,
04:41
the target language. And I get a JSON response. Everything in the API is JSON-based. So, that will be familiar with many developers. But let's look at a more realistic example. In this case, we have a longer piece of text, and it also contains HTML. And the software is capable of translating
05:02
the parts that need translation while leaving the HTML part intact. So, things like hyperlinks do not get mistakenly translated, which would be really bad. And this code that we saw here roughly gets represented as this piece of HTML in a browser.
05:22
And the translation is pretty good, kind of. This word should have been filidai. It decided to keep the translation in French. We will improve that with time. But otherwise, the context and the meaning of the sentence is pretty darn good.
05:42
We will look at accuracy in the upcoming slides. So, as a overview of the list of features, it can do text translation. It can do markup translation. That includes HTML, XML, and other formats that use markup. It can do several formats for file translation.
06:01
So, you can upload things like OpenOffice, LibreOffice, Word documents, and PowerPoint slides, and it'll be able to translate those as well. It can perform language detection. So, you give it a piece of text, and it will give you an estimate
06:21
of which language the program thinks it is. It also has a built-in system for doing rate limiting. If you're planning to host this on a public server, you will find out that it's a very useful feature because people really like free resources, and it's difficult to give everything for free
06:42
without some limits. So, if your translation instance up in the cloud gets really popular, having some sort of limit by saying, do a maximum of 60 translation per minute will come really handy, and it's all built-in into the software. You can further issue API keys to give to people that can change those limits.
07:02
So, you can set up the system in a way where you allow anonymous users to translate up to, I don't know, 20 translations per minute, and you can allow a subset of people that you've issued API keys to, to have however many they want. You decide those limits. It also has a localized UI.
07:22
We're using Web Later to do that, which is awesome, and it has been currently translated into four languages, and we're looking to verify and add more. One cool, neat feature is that Libre Translator has a ability to translate itself, roughly.
07:41
So, we have done that, of course, but we haven't displayed all the languages that it has tried to translate itself. We are waiting for a native speaker to review the actual translation and correct it. But if you run it in debug mode, you will see all the work that it has done, which is kind of neat. So, it translates itself, kind of,
08:01
or at least it helps. It finally has the ability to monitor itself. So, it can generate usage metrics, so you can monitor the usage of the server using Prometheus and Grafana. These are tools to do monitoring that are very popular. Inside the software, there is really just a few packages,
08:22
so it's very lightweight. Most of the translation work is done by another package called Argos Translate. This is really the core engine that performs the hard work in the translation, which is an awesome project, and we collaborate with them on Libre Translate.
08:41
Inside Argo Translator, there is also other software, which is built on the shoulder of giants. C-Translate, which is an inference engine that does neuro-translation using transformers models, which is a state of the art. It's the same type of architecture that ChatGPT-3 uses. There is a sentence piece,
09:00
which is a piece of code from Google that does word tokenization, and the stanza, which comes out of Stanford, which does sentence analysis. And Argos Translate uses all these three to perform the translation work. Now, that's not all it does. Argos Translate also takes care of the very important Argos package manager index.
09:23
This is where all the language models are handled, installed, and distributed. So, the first time that you run Libre Translate, Argos Translate will take care of querying the Argos package manager index, and will download the languages that you need. This allows us to also create instances
09:42
where, say, you only need to translate between French and English. You do not need to download the entire 26 gigabytes of models. You can simply say, I just need those two models, and the program will download simply those two models. We also have a small module that does the file translation,
10:03
which connects, again, to Argos Translate, that's the Argos Translate files package, and then some common Python packages that allow us to put the web interface and coordinate the application as a whole. So, it's really an ecosystem that's built with other open source software,
10:21
and together it creates this complete translation solution. Talking about language models, we have 58 of them. That gives you translation support for about 30 languages. It does automatic pivot via English. We are currently looking to transition
10:41
to using multi-language models, but for the moment when you translate, say, from Italian to French, the program will automatically do the pivoting via English. So, it will translate Italian to English, and English to French. If there is a language missing, there is a very cool repository
11:01
under the Argos Open Tech organization, which builds Argos Translate, called Argos Train, and that is a repository that has very good instructions on how you can train your own models. So, if a language is missing, go check it out. It has very clear instructions,
11:20
and you could contribute a language that is missing, and you want to see integrated into the software. Speaking of the models, when a model is downloaded, it has a Argos model extension, and these are simply zip files. Each zip file inside has a little bit of metadata.
11:43
It has a folder that contains the C-translate model. It has the sentence piece model, and finally the stanza model. So, it has the information for all the three packages that we discussed earlier to perform the translation. It's very interesting to check it out. Let's talk a little bit about accuracy, right?
12:01
Like the question, like, okay, translation's something, but how good is it, really? And for that, there is a metric that can be used to assess roughly the accuracy of the translation, and it's called a Blue Score, acronym for bilingual evaluation under study. And it measures the similarity of text to a reference corpus.
12:26
And it has values that go from zero to one, or if you express it as a percentage from zero to 100, the best translators in the world, human translators, do not get a score of 100, ever. So, anything that is above a 40
12:42
is considered understandable to good. And something that is above 50 tends to be very high quality, sorry, up to 50 is high quality, and above 60 is very high. And we had a community contributor actually go, and a few weeks ago he ran the evaluation
13:04
on our different models, and we found that 83% of the models currently in Libre Translate are scoring above 40%. So, 83 of them are good. Now, to make it into perspective,
13:20
when people ask me directly how good is Libre Translate, I like to tell them that it's roughly as good as Google Translate was four years ago. So, I wanna make the expectations clear at this stage in the project that it is not as good as some of the proprietary alternatives. But we are improving, and we will continue to improve.
13:41
And yes, and the way to improve it lies into mostly getting better training data. So, as we find more and more sources of open data that can be used for translation, we include those into the training of the models, and that results into better models. This is also an interesting point to note,
14:03
is that because the project is open source, and we have a way to train models, you can also train models that are specific to a certain domain. For example, in the context of software translation, you could imagine the case where, instead of training the data on a general corpus,
14:20
like Wikipedia, or the EU Parliament translation documents, you could train a model that is specific to software. For example, you could take a set of existing translations from existing software that has licensed the translation work under an open permissible license,
14:43
and train a model onto those existing translations. Because we have the knowledge, a lot of software has commonalities in terms. When you have a file menu, it's always called file, and then edit. So those menus could be, are specific to a context,
15:00
and by training models that are specific to a context, you could get a, for example, software translation model that is more accurate in the context of software, rather than, say, poetry. So it's a very interesting thing to think about. One more thing about accuracy,
15:21
we do have the occasional rare quirk. This is something that we're aware of, and we are working to fix it. We like to call it the salad issue. And we joke, I will demonstrate this slide, because it always sparks a little bit of a giggle. And it's a little bit rare, but it happens.
15:43
So in Spanish, the word for salad is ensalada. Now, let's try to translate the word for salads, plural. So I'm gonna type ensaladas, okay? So in French, that's saladas, is that correct? Any French people in the room?
16:00
Fantastic. Okay, now let's try the singular form, okay? I'm gonna remove the S, and it crunches for a little bit. And in a second, hmm. So it really likes salad. Salad, salad, salad, salad, salad.
16:22
This is a quirk. We are aware of it. It's very rare, but we've found a few reports here and there, and we're working to fix it. Just something to be aware of. But yes, we really like salad. Me too. Let's talk a little bit about integrations.
16:40
You can find the client libraries for about 11 programming languages. That includes the most common ones, like Java, Python. Whatever your favorite language is, it's probably in the list of bindings. And if it's not there, adding new bindings for Libre Translate is fairly easy. So we welcome contributions, of course.
17:01
As far as software, Libre Translate has found adoption in several existing open source softwares that you may recognize. Mastodon recently added support for translating topics using Libre Translate. Weblate has the ability to use Libre Translate to suggest and help translators perform translations as an alternative to using proprietary software.
17:22
The Forum Software Discourse has a plugin that lets you make your forum software accessible from different locales and lets you translate the posts on the fly. Libre Office, I found, has an extension. I didn't know this until a week ago when I was looking who has integrated stuff with Libre Translate.
17:41
But somebody wrote an extension to Libre Office where you can translate documents on the fly using Libre Translate. There is an add-on for the multimedia software, Kodi. There is an add-on also for Firefox. And there's probably a lot of other things that I haven't found myself, but a lot of people seem to be finding the API useful and they're doing integration work, which is fantastic.
18:02
And there's finally client applications that you can use Libre Translate with without using the web UI. And we have clients for Android, iOS, and desktop. And there's more being built by the week. As far as comparison to proprietary alternatives,
18:21
you can see that there is a clear monetary advantage aside from the philosophical reason for why you might want to use open-source software, of course. But it also could be a really sustainable way to perform translations. In that, people often ask me,
18:40
why should I use Libre Translate? I can use Google Translate for free. I just go on translate.google.com and it doesn't charge me anything. So why should I care? Google Translate is free so long as you're using it by hand. If you want to do any automation work and you have to tap into their API, you're gonna pay dearly. And you can see here a list of the prices.
19:00
And I can assure you that one million characters seem like a lot, that's six zeroes, but they actually run pretty fast. And so could the bill on your credit card. So if you have a lot of text to translate, Libre Translate could really help in that regard.
19:21
As far as funding goes, the project is on the path to become fully self-funded. And we really care about this because we want the project to continue living on. We of course accept the sponsorships and donations, but honestly, we would rather prefer that you get something back if you decide to contribute financially to the project.
19:41
This is why if you are in the position where you say I have some finances to spare and help support the project, you also get something back. And we do that in the form of offering you an API key to use a host distance at libretranslate.com. So you are free to run the infrastructure
20:01
on your own server, on your Raspberry Pis, on any machine that you'd like. If you don't want to handle that, you can just get an API key and you can support the project at the same time. So it's really a good way to contribute back. And we found that that model has been helping us grow and sustain the project.
20:21
So we hope to continue growing as much next year. Again, to get involved, I'll give you a few quick numbers. We've had about 70 people contribute to the code base over the last few years. The project is still very young, but it has really received a lot of attention, so we're very excited about that.
20:41
You can help with code. If you're a Python programmer, if you know HTML, CSS, any of the technologies that we use, you're welcome to contribute. We are open to everybody and all ideas. You can also help us translate. If you understand English and you don't see your language in the list of languages that we currently support
21:02
for your user interface, you are welcome to contribute. It's on a web plate. You can simply translate and it will get included into the project every 24 hours. So that is really amazing. You can also help us train more language models. If your language is not available or a language that you care about is not available,
21:21
you can yourself create a new model for a language and add that into the list. So that is also another way that people can help. You can report bugs, of course. Salad, don't report salad, we're aware of it. Or just come say hi. We have a community forum that is quickly growing
21:41
and we love to hear what you're building with it, what you're using, or if you have any questions. So we're very open and we're excited to hear what you will do with it. That's it. This was the last slide. I think we have some time left over, right? So I will.
22:02
So thank you very much. I will open the floor for questions and discussion. So yes.
22:26
We're glad we could help. You're welcome. So how many unemployed, how many people do we have here?
22:49
70. Should be all over Europe, at least. Maybe South America. If we multiply this, it can go viral. Thank you very much. This is awesome work.
23:01
Thank you, appreciate it. Yes. So, but you speak of our language that have the same link, the same structure of the language. We have French, English, Spanish, Portuguese, maybe Russian, I don't know. We also have the same structure, but in a language not far away from here.
23:21
Dutch is a different structure. German also, just a similar structure for German. And so taking this in a call, it's not easy for one translator, automatic translator. It is not. So there's also maybe Proter with Chinese or Japanese. Correct. There are other problems.
23:41
I thought there was a thing I want to say. It's a dictionary in line or in the program to have the good word because it's not translated every time, the good word. And so I thought or showed them the most efficient remote language is Esperanto, not English. Oh, that is very interesting. Yes. Okay.
24:00
Yeah, that's a great insight. Yeah, thank you for sharing that. And you're completely right. Some languages don't share the same semantical structure. And Dutch, for example, currently doesn't score super high. It's actually one of the bottom 17% of the language models. In the blue score, Dutch scored around 38%.
24:21
So it's almost good, but we've had some Dutch-speaking people come to us and say, you know, it's like it could use improvement. So Dutch, yes, it is a language that needs improvement. And I talked to the maintainer of Argus Translate about the languages that need improvement. And he pretty much suggested that better training data
24:45
will help greatly. So it is mainly a problem, not of the architecture of the AI. It's a matter that we don't have sufficient quality, high quality data between say English and Dutch to get above 38% currently.
25:02
But again, nobody has really focused on Dutch as a language. If anybody has an interest in improving Dutch, we can do better. Surprisingly- You all speak English, yeah. Oh, fantastic. But as far as, for example, languages like German, Libre Translate currently does very well with German.
25:22
It's above 50, if I remember correctly. Is that because German has the similar language as Dutch? It is. That is because I believe, and I think PJ, that's the name of the maintainer of Argus Translate, because the German model has had a larger amount
25:41
of training data, and so it tends to perform better. Yes, that's a very good question. Dialects would probably, and that's my guess
26:03
because I've never inquired this myself, but I believe that a dialect to perform good as a target or source language for translation would also need its fair amount of training data, and that is the problem with dialects. I actually speak a local Italian dialect.
26:21
That is my first language, and I wanted to make a model for my dialect, and I started looking online for references of data that I could use to create a model for my dialect because it would be cool, and it was really challenging. Not being an official language, it really lacks the status of official languages,
26:40
and finding training data is extremely difficult. But it could be possible, right? If you gather enough people that can create a ground truth data set of examples in the dialect with sufficient samples, you could get good results, I believe, so it's a matter, again, of training data.
27:03
Yes? In terms of computing power, or in terms? Yeah, computing power. Computing power, so if I remember correctly what PJ told me about the cost of training the models,
27:20
it costs maybe a few, between $12 and $30. You can rent instances on several cloud providers. You do need a GPU to train these models, and it might take a few days for it to crunch and get sufficient number of iterations to train the model, but it's absolutely affordable.
27:41
Anybody can do it, and if you are willing to wait, and you just have a gaming laptop sitting at home, if you're okay waiting 20 days for it to finish, it will train the model for you. So I guess it could be free to you if you're willing to wait a sufficient amount of time, and if you have a gaming laptop laying around. Yes?
28:17
It is, yes. It has to be license on the a permissive license,
28:30
so Creative Commons, that also includes commercial use, and we give references and we give attribution
28:41
to all the sources that we use. If you're going to the Argos package manager repository, where all the models are hosted, we do give the appropriate licensing credits to all those, but yes, we can now go on, say, the internet and start scraping results, because everything, you just have to assume
29:02
that everything is covered by copyright until they tell you that you can use it freely, so it's only trained on openly available and freely licensed sources. It has to be translated, yeah, so very briefly,
29:22
the format of the input that goes into the training is a file that has the, say, the English sentences, and a separate file that has the translation on the same line, so it's very basic. Yeah, and somebody could do the work by hand, right? You start from the English translation,
29:41
and you start doing a translation, so it will take a lot of work, but it's doable, especially in a crowd form. Are we out of time? Okay, I'll be around if you have other questions. Our time is up, unfortunately. They're kicking me out, but the next speaker will deliver something awesome as well next talk,
30:00
so thank you again.