We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Tower of Babel or How to turn an elephant into a polyglot

00:00

Formal Metadata

Title
Tower of Babel or How to turn an elephant into a polyglot
Title of Series
Number of Parts
35
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
So my talk may be about how the community handles i18n and l11n right now, how can we extend the translation experience with new online platforms (and some others). Which is absolutely harmless to nowadays processes. It just may be used together with old good tools, e.g. POEdit, KBabel etc. This is one of the interesting topics which is poorly lit at conferences and meetups. I'm talking about localization of PostgreSQL and other community software (possibly even commercial). Right now we're using https://babel.postgresql.org/ to manage translations, but I had excellent experience with online translation tools, such as transifex.com and crowdin.com. Which is absolutely free for open-source. I've even created PostgreSQL organization already, so I'm the boss ;) https://crowdin.com/project/postgresql So my talk may be about how the community handles i18n and l11n right now, how can we extend the translation experience with this new platform (and some others). Which is absolutely harmless to nowadays processes. It just may be used together with old good tools, e.g. POEdit, KBabel etc. There a lot of benefits, e.g. painless new language introducing, one translation memory for all products, ability to pay for translation (commercial soft), team collaboration (translators vs reviewers) etc. With online translation, you may involve new people in the community, e.g. new Ukrainian translation is in progress by high-school teenagers under my mentoring. But we also have GSoC project for students. Translation is easy mentoring and funny to do. Agenda: 1. Intro to l11n and i18n 2. Nowadays state of affairs 3. Demo of the localization process 4. Overview of the online platforms for open-source 5. Demo of the online localization process 6. Questions
19
Thumbnail
42:43
29
34
Thumbnail
52:38
Program flowchart
Computer animationTable
Computer animation
Computer animation
Computer animation
Program flowchartComputer animation
Computer animation
Computer animation
Computer animation
Transcript: English(auto-generated)
So hello everybody, my name is Pavlo Golub. I live in Ukraine and nowadays probably everyone know where it is due thanks to HBO TV series Chernobyl. I'm working on cybertech and today I want to talk about translation, internationalization and localization process and how it is done.
And in what way we may improve our processes.
How I get to it? First, this year we started to build our Ukrainian community and I think that that task translation of Postgres itself and infrastructure and others programs
may help in building our community. So here I am. About my company, we do everything. Our headquarters in Austria, but we have offices in Estonia, Switzerland and Uruguay.
Our clients, some of them. Our services. So let's go. Some facts.
72% of consumers say they want to buy or use products with information available on their own languages. If content is offered only in one language, usually it's English, it can address at most 30% of total online population.
And to cover the whole Earth, we need to use 7,000 languages. But if we want to cover 80% of population, it's enough 83 languages.
I'm using here Wikidata, top 10 internet languages. I really like this precise numbers, you know.
Well, it's not easy to have idea how it is, so I created a chart. So as you see, 25% of internet users are using English, then 20% Chinese, 8% Spanish, 5% Arabic, Portuguese.
It means Portuguese-Brazilian and Portuguese-Portuguese, Malaysian, French, Japanese, Russian, German and others languages are 22%. Why anybody wants to translate products, especially Postgres open source?
First of all, you might want your product in the governance sector, in the military sector, in healthcare. In this way, you must be translated as, for example, Postgres Pro translated their product to fit the requirements.
Also, if you want to teach Postgres to your students, you also should translate it. Some corporations have restrictions on using untranslated applications.
And some kind of licensing may require this. When you want to translate, you probably want to translate if you want to build local communities, new local communities. If you want to improve user adoption, especially for students, for high school students,
if you're not open source but enterprise, probably you want to open new market in countries. Or you want to build credibility. If people see application in native language, they feel more comfortable.
So, why anybody want localization? So, localization helps us spread the Postgres. It helps us build a community.
And this process may be done by non-developers without any knowledge about internal architecture or structure, without any knowledge of programming languages. And this is the excellent area for beginners because our Ukrainian localization is done by a team of four high school students with my mentorship.
But it's working. What is internalization? This is the process of designing a software so then it can be adapted for various languages and regions without any engineering changes.
So, localization is the process of adapting already internationalized software for a
specific region or language or languages by translating text and edit specific content. So, any application may be internationalized but not yet localized.
And when we say about regions and languages, we mean that, for example, Canada has three languages. There's English, French and I forgot the third.
For example, we have different languages for Australia, USA and English. They are all English but with specific issues. Postgres and most of applications in our infrastructure uses gettext library.
Gettext is the library for internal localization. It supports many languages. I think that even Klingon may be imported there.
It supports plurals, genders and it supports context comments. So, developers may leave what exactly this string means to help translator to understand what exactly developer want to say.
When we use gettext, our translators no need any sources. All strings are gathered to the PO file. PO stands for portable object. It's just the text file.
You may edit it with any editor. After work on this PO file, it's compiled to MO file. MO stands for machine object. It's a binary file. Then during work, application loads MO file into memory and then each translated string is searched by hash and is used.
Just like example, it's a PSQL. As we see, fprintf want to have localized string.
Underscore is a macros for gettext function call just to make it more narrow. So, underscore means gettext. Gettext accepts string.
Then if it available in MO file, it find it and replaces with the translation. We are using xgettext to generate PO files.
This is P-O-T means template. When first we generate PO files, it called P-O-T, portable object template. Means that message strings are empty. Then if we want to introduce new translation, we rename it this file into PSQL underscore whatever UK if it's Ukrainian or a row if it's Romanian.
And working with it, filling this message strings using our simple text editor or special editors.
So, what is under the hood? Postgres uses gettext library. Developers are using gettext function which is macros for underscore. Underscore is macros for gettext function.
Then strings parsed on the source code and gathered into one PO file. Translators then work with these PO files. Then these PO files are compiled into MO files and then these MO files are used in during run.
Okay, so what we have now. So, the same top 10 internet languages but now let's check what version of Postgres. How much of that language is translated?
So, English is 100%, no problem. Chinese 90 plus, not bad. Spanish 100%. Arabic zero for all. Portuguese, but this is not Portuguese. This Portuguese Brazilian version, so 86%.
Indonesian, French, Japanese, Russian are 100% and German 99, not bad. Except for Arabic language. I have no idea if we have many customers or users in Arabic world, but I think they're not.
Yes, oh yes, yes. Probably. We have Hebrew translated for 20%, but yes, RTL, yes.
Okay, to start work with translation right now, these three links are enough.
First is the wiki where described all this scenario I will show. The second is Babel PostgreSQL org where one can download any PO file for translation.
Or if we are talking about new language, we will download POT file and rename it to proper PO file name and we'll work with it. And PGSQL translator mailing list where all this is discussed and where patches may be proposed.
However, we have also read mine ticketing system where also one may upload patches. So workflow is simple. We are going to Babel PostgreSQL, we choose language, choose release.
It should be the last stable release if we are starting from scratch for the new language. Or it may be the master branch which is uploaded to the Babel PostgreSQL org after the beta process is started for the new release.
We download POT or PO file. If we have PO file, if our language is translated for some part, we download PO file.
If we are starting from scratch, we are downloading POT file, rename it, work with it. There are several editors. The most famous is PO edit. It's fine.
kbabel, Emacs, Sublime. Sublime is probably need to be with some plugins. Then we review, check it and submit to mailing list or to read mine.
Then depends on how good our translation is. Is it compatible? Someone from committers may apply this patch. Okay. So let me show you how this...
So we are starting PO edit. Let's find... That how our Babel PostgreSQL org looks like.
It's just a table. As I said, the beta for 12th version is started. So we have master here. As you can see, it's absolutely white. White means that there is no or a little of translation.
The yellow means that this particular file is translated for 90% or more. And green means that this file is translated completely. And the red wine or in the red color, for example, PGA wine for Japanese contains some errors.
They need to be fixed. Okay. So let's... 27. Yes, it's 27.
Yes. Right now we have... But some languages are not well represented. For example, Hungarian, Afrikaans, Netherlands. I have no idea how they appear here.
So anyway, let's try to download. I don't know. Let it be PO TCL. Okay. So I save it. I save it like PO file and okay, let it be UK.
So I save it. Then in PO edit, we should open it. And that's how user interface looks like.
Oh, very bad. So we have a list of strings. Then we have a translation memo where we can enter our translation. And on the right, you can see that we may use translation suggestion from Microsoft Bing translation, I think.
But I know that we may use Google as well. And in the provision of with it, probably we may use the L translator and some others.
So, okay, let's... Processing parameter. So I don't know.
Oh, yes, maybe. No, no, no. Much better.
Okay. So processing parameter. Yeah. Okay. Something like this.
And as you see, we have S which will be replaced by some system argument when this function will be called. Okay. And we move to another.
And this process we made until the end. Then we save this file. Then we try to compile it, edit, compile it automatically. So let's see if it's... Yes. Now you can see we have PO file and we have MO file because edit compiles it automatically.
Okay. Then we may take this PO file and send it to translator mailing list or send it to remind ticket system.
Okay. Okay. That process works well and it's professional enough and it's cool.
Probably all translation we did for Postgres was done this way. But when we start with our young community, I understand that high school kids cannot use such a complicated process
and they need something light or something more familiar with. And I remember that nowadays we have several translation environments, online translation environments
which may be used for this purpose. So, and some of these files are too big to be translated by one person in a reasonable time. And we agreed on this that every day, every student will translate at least 10 strings.
So we will finish till the September before the Postgres 12 will be released. So the proposed workflow is we will use crowding.com translation service.
I registered the project there. So we go on this service, we choose language, we choose Postgres release, the same as on Babel.
We work with it online using only our own browser. Then after translation, someone let me in this case will review these strings. And after review, I download this PO file and upload it to remind ticket system or to my lead list.
So let me show how this should looks like.
Okay, so here we have a crowding. Can you see it? Maybe I should, it's better. This, I logged in on under my account. So this is admin view. The view for translator is a little bit simpler, but the main thing is the main idea is the same.
So we have a home dashboard where our languages available are listed. We see that some, we see two numbers, persons.
For example, for Ukrainian 49 and 14. 49 means that 49% of all strings are translated. And the second one, 14 means that 14% of all strings are reviewed. So let's see how the user interface looks like.
When we are choosing language, then we should choose the file on which we will work. As you can see, the structure of system is much more similar to what we have in Postgres sources.
That is because I uploaded these files using import from GitHub. And when you import using GitHub, crowding preserves all the structure of all catalogs, folders, etc.
But if you want to, if someone want to use just a list of files,
then the information about source structure will be destroyed and we will see only list of files. Okay, let's try it on.
Okay, so now we are in translation environment for, in translation user interface. It's pretty much the same as a PO edit except that all strings are on the left side.
And in the middle you have the information about string now translated. You see that some of the words are highlighted.
If the word is highlighted, that means that this particular term is in glossary. And it should be translated as it says in glossary. Moreover, in glossary there is a short description of what this term means, of what this term is.
Then we see context information. This is the source file and line and what format is it. Then we, here we have our edit where we should put our translation.
And let's see, untranslated something. Okay, so here we have untranslated table. As you see glossary, it have description, it has translation.
And here in the middle, at the bottom in the middle, there are three items. First, if there is some similar translations, if there is some similar translation.
Okay, let's return to the show O. Okay, if there are some similar translation, it will be shown here, will be shown who was the translator proposed it.
And is it approved or not?
Is it reviewed or not? You may delete it or it's for admin or for reviewer. Then we have TM means translation memory. Translation memory is the table or database which is used to store all translated strings.
And it looks in its table if this currently translated stream is similar for one already translated.
And if it does, then it shows here. And for this particular, we see that it's perfect match 100%. That means that this particular stream already being translated and it is in translation memory.
So we may use it and that's all. And the last but not least, we may use other languages to see how this stream was translated in other languages.
For example, if we are working with Ukrainian, we may use Russian, Czech or Polish or other Slavonic languages to have idea how other translators translated this stream.
And we may be more consistent with other translations. And on the right, we have comments for each stream. So if translator have no idea what this mean or he has some ideas or whatever,
he or she may leave a comment for this particular stream. If he or she translator thinks that this stream is wrong, is incorrect or has some typo or whatever,
he just should mark it as an issue and reviewer and administrator will be notified and may check what exactly wrong with this particular stream and check it, fix it.
Here we have translation memory, so let's try to find some table. Yes, we have table and we have plurals for tables and we have all tables, child tables, foreign tables.
So if you're stuck with some word, you may go through the list of this translation memory and see in what context this term or word were used and decide how to translate it. And the last is glossary.
So with the same as this floating window, that is the same on the right. So you may see what exactly terms from this particular stream are in glossary.
What do they mean and how they are translated. So that was translator. Let's view to proofreading mode.
Proofreading mode, not approved, okay.
As you can see there a little bit, it's much simpler interface for reviewer. So you are going through all translated strings and if you are fine with it, you just mark it as approved, approved, approved. There are hotkeys for this.
So if you think that something is wrong, you may change and then save
and then your version will be saved as translation and automatically will be marked as reviewed. Okay, so let's quit editor. After we worked on our file, we may download it.
Let's save it. Yes, I want to replace locally. And the process is the same as for our old schema. We send it on a translator's mailing list or we add it to our Redmind ticket system.
Okay, as for administration, Crowdine has integration with GitHub and other source control systems.
But since Postgres GitHub is only a mirror of the real control system, we cannot fully integrate it with the Crowdine translation environment.
However, I think that using files from Babel.postgresql.org is much better approach since we may be sure that our files are consistent.
So, for example, we may update, we may, admin may update Crowdine translation files weekly
from the Babel and be sure that there will be no forks between these files. What are the pros of online translation platform?
First of all, you don't need special editors, you don't need no third-party software, you don't need to know how to work with it. The second, it is the unified translation environment.
So, every translator, every reviewer has the same user interface. And if someone stuck, you may help him because you have the same user interface before you. For online translation environment, we have dashboard and notifications.
As I said, if someone, translator or reviewer, marked string as issue, then all reviewers and administrator will be notified about it.
You may be notified about completeness of some file translation. You may chat with each other inside this environment about some strings particular or files.
It supports history of all translations, so at any moment we know who, when and how translated this particular string, how it changes all over the time.
Of course, it has glossary, it's a huge help for everybody. Using glossary, your translation is consistent and you spend a much less time on it.
Remembering, oh, how did I translated that specific term, which I met only once and now need to remember. And of course, this environment has a system of suggestions. They, I mean, Crowdine may use Google, Crowdine may use Microsoft Translator, I mean, and Crowdine may use DeepL Translator.
However, DeepL is not free, so I don't know how to be here.
But it's the best, I tried it. So, it has instant saving, instant renewing on the same screen.
Yes, Crowdine is a paid platform for enterprise application. It's absolutely free for open source. So, if you are open source project, you gain all functionality as for the most costly plan. So, we may translate Postgres, we may translate PgAdmin, we may translate
Petrony, we may translate whatever open source projects we have in one place. And all these projects will share the same translation memory and the same glossary.
So, if it's, okay. And role system, we have admins, translators and reviewers and for each project we may specify who will be in charge, who will be translator.
We may grant some permission to translate and we may not. So, not anybody may translate this file, for example. So, before translator is capable of translating, it asks admin, may I translate this particular file for that particular language?
And if administrator is agreed, then it grant permission for translation. This is message from Louis Trindat.
And he asks, is it possible to translate Postgres to European Portuguese localization?
But this is our conversation. But what about Tomas van der Rohe? Accept. So, you have notifications.
Okay. New join request for PostgresQL project, Taki. He wants to participate in Hebrew translation. I would like to help translate PostgresQL in Hebrew and there will be accept, decline or chat.
So, this is just like social platform. So, some of them are employers of some companies and they translate on this side.
Some of them are professional translators and they are looking for help or to be paid for this. Just like Transifex, the same thing.
So, the crowding is interested to have as much as possible translators because then each translator has its own rank or, I don't know how to, karma or whatever. And the cooler you are, the more work you have and the more you paid.
So, if you translate open source project, it will raise your karma and you become the most valuable translator on this platform.
Yeah. Yes.
I think that it's even more convenient because this system supports not only PO format, but a SLIF, DOC and some more.
So, probably we may just upload SGML or HTML or what we have to just be translated as is and downloaded back. But I didn't check this. I only worked with PO files.
Okay. And the last, we may hire pro-translators if we want or if our sponsors want to hire anybody.
As I said, this is the social platform where translators are living. And if someone thinks that paying someone to make translation to any particular language is okay,
then it's fine, I think. Okay. Cons. Online translation. Of course, you should be online. Browser is not the fastest environment.
Hotkeys may interfere with browser and extension you have. And of course, the free translation plan is only possible for open source. So, if we want to translate Enterprise DB or Postgres Pro, we should pay money.
Okay. Questions? No questions? Then I'll show you what I discovered about PO edit. Turns out, it turned out that PO edit in the latest version, oh, sorry,
can connect to CrowdIn directly. So, you may use whatever file you want from your CrowdIn account, CrowdIn project.
Set okay. And this file is translated, you work with it offline or online. And after you hit Ctrl-S, save, all these translated strings are gone to CrowdIn automatically.
And I think this is fine. You download some file before your flight, you're working in airplane, then you land Ctrl-S and all your translations are on server. Yes.
Well, that is. Thank you.