We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Road to Intl.MessageFormat

00:00

Formal Metadata

Title
The Road to Intl.MessageFormat
Title of Series
Number of Parts
542
Author
Contributors
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
The internet is global, and its future is by no means written or spoken only in English. With the upcoming Intl.MessageFormat addition to JavaScript, we're making it easier than ever to write and maintain apps and systems that not only speak your language, but also the languages of your users. To do that, we're redefining how localisation really works, and building a system that's interoperable with all existing data formats, workflows and processes, as well as (hopefully!) all the ones we can't even imagine yet. Let me show you this new world, and where it might lead us.
ChainData managementLocal ringPhysical systemMessage passingInterior (topology)File formatDiagramComputer animationLecture/Conference
Web pageLocal ringFile formatDifferent (Kate Ryan album)ChainCodeContent (media)Multiplication signQuicksortElement (mathematics)InternetworkingVideo gameLink (knot theory)Software engineeringWeb pageMessage passingFormal languageAxiom of choiceBitString (computer science)WebsiteHyperlinkCartesian coordinate systemSource codeStandard deviationExtension (kinesiology)Translation (relic)Software developerTouchscreenState of matterComplex (psychology)Library (computing)Disk read-and-write headStaff (military)SoftwareBuildingPresentation of a groupAdditionInternet service providerRight angleSlide ruleData structureIdentifiabilityComputer animation
8 (number)AdditionMessage passingFile formatSoftware developerFormal languageData structureMereologyCASE <Informatik>Library (computing)Intermediate languageGoodness of fitLevel (video gaming)Spacetime2 (number)Different (Kate Ryan album)Variable (mathematics)Content (media)Translation (relic)Projective planeLimit (category theory)Instance (computer science)Axiom of choiceElectronic mailing listMultiplication signGroup actionLocal ringTelecommunicationBuildingDirection (geometry)QuicksortStandard deviationComputer configurationForcing (mathematics)Order (biology)ParsingString (computer science)Object (grammar)Representation (politics)Computer animation
BuildingMessage passingIntelFile formatUnicodeData structureLocal ringData modelFile formatSoftware repositoryMessage passingCASE <Informatik>Standard deviationSoftwareCategory of beingLevel (video gaming)Formal languageMereologyRepository (publishing)ParsingComputer fileExpressionGroup actionPhysical systemComplex (psychology)Term (mathematics)Slide ruleGreatest elementLink (knot theory)Software developerTelecommunicationInformation technology consultingTranslation (relic)ImplementationInstance (computer science)Arithmetic progressionJava appletComputer animation
Message passingFile formatLecture/Conference
NumberMessage passingLine (geometry)CountingMultiplication signType theoryData structureGenderObject (grammar)Functional (mathematics)Computing platformMatching (graph theory)QuicksortTheory of relativityInformationCASE <Informatik>Parameter (computer programming)Operator (mathematics)outputProjective planeTranslation (relic)Computer animation
Local ringData structureQuicksortCASE <Informatik>Message passingWeb pageSource codeProgrammer (hardware)2 (number)File formatOrder (biology)Representation (politics)Term (mathematics)Link (knot theory)Streaming mediaString (computer science)Translation (relic)Lecture/Conference
Message passingTranslation (relic)Matching (graph theory)CASE <Informatik>Complex (psychology)Representation (politics)BitWordData structureMessage passingInstance (computer science)SoftwareFormal languageFigurate numberQuicksortEquivalence relationLevel (video gaming)File formatComputer animation
File formatProjective planeLatent heatPoint (geometry)Formal languageRepresentation (politics)View (database)Extension (kinesiology)Multiplication signMessage passingNumberForm (programming)Variety (linguistics)Data structureBitInstance (computer science)QuicksortNatural languageCore dumpArithmetic meanLecture/Conference
Program flowchart
Transcript: English(auto-generated)
So hello everyone and thank you very much for resisting this late in the evening and we have our last talk and we have a Emily Haro and He worked on the localisation system and to chain management and he's going to have a talk on the road to
int message format. Intl dot message format. Oh, the intl dot message format. I should have checked that. It's okay. Hi So The last talk by Mathias Well, if you were here was a lot about where we are now what we can provide now already in pontoon
And otherwise how localisation is now happening at Mozilla. What I'm gonna be talking to you about is What's kind of coming up? What's what are some of the next things in localisation that we're working on and that we think are really quite important and
So, yeah, I I'm on the same thing with Mathias. I'm a staff software engineer But I've been doing this sort of stuff kind of for fun for ages. It feels like now full Turns out that when you get really into localisation in in JavaScript in particular
There aren't too many other people who are that into it and then Somehow you might end up hired by Mozilla to do the things you were doing for fun for pay So that's kind of nice hint hint, you know It's a good company. I In addition to to working just on code at Mozilla, I
Spent a lot of time in a bunch of different standards bodies working on the standards For localisation in particular and some of the work I'm presenting here is the is really the work That's going going elsewhere than just at Mozilla because we want to have we
Fundamentally want to make the world a better place the internet a better place for everyone not just Firefox users But you know, everyone's internet is better if they use Firefox, but you know, that's you know You're here so you might have heard this one before but yeah on localisation This is again covering a bit of what what Mathias was saying that
Quite often localisation is one of those aspects of how do you really build a an? Application or a or a site or or anything that's that comes up way too late You end up making some choices early on and then you end up needing to live with those choices Later, and they might not be the best stuff best ones
And and it the need for localisation comes after you've made the choices or you discover that hey this thing Oh good grief. We need to support Arabic now, that would be interesting and a lot of The sort of scope of localisation is interesting because there isn't necessarily one right answer
So, of course we're working on and New right answer and you know, there's an xkcd comic on that. I don't have it on these slides Don't worry, but you know the one I'm talking about so Things could definitely be better. So we're trying to
Make some of this improvements happen It should be easier to localize content and There should be a common way of doing this So that the experience and you use The benefits that you get from using software and libraries in one place can map to elsewhere right now
There's a lot of differences in how Localization Ends up depending on the format you use and the tool chains you use and all of this and that is not optimal And fundamentally a lot of actually when you start getting deep into it
UI and UX design ends up being limited to some extent by the fact that most of localization work is working around strings rather than the complex structures like HTML allows us to represent and another Aspects that make life more complicated. So we want to improve all of that. So
Let's start with this This is nominally something simple. Hopefully most of you can read HTML to figure out that here We have this small little span that says that Brussels is the capital of Belgium I've lived here. I know it's more complicated than that. Let's just go on and
Brussels here happens to be a link. So so how do we make this localizable? How do we know? How do we actually localize this in a way that works really? in the end for everyone and One way that we're trying to sort of build towards is something a little bit like this that you could add an
Identifier to the element there where you say that this is the Brussels message that we're really dealing with and include in the HTML Something like what we have for CSS now, where you say that here's this resource that's attached Here's a link to a resource that's necessary for figuring out
What's really the content of this page and then separately you have a message? Here in Finnish because you know I can and I could not pick between French and Flemish and because it gets complicated I've lived here. I know Bruce said on a belly on a back open key And here the format would that we're using I'm gonna get to that later
But there's a couple of interesting things here in particular that the fact that we're marking up the the bruce-cell Text there as the the contents of the text of a link so that we'll be able to map that to the link the a href that we have in the in the source document there in English and
Because it's you know of course a little bit more complicated than this Happens to be a link to Wikipedia so they in this particular case, but not usually at all We could allow the translator to say that hang on this link in Finnish should really go to the Finnish Wikipedia page
on Brussels rather than the English one and this is like I Can present to you you can see the screen you can kind of get what you're looking at here But honestly getting this to a state where you can get a translator Who's not a developer to see this and understand what they're supposed to do and not screw it up and
Provide useful things useful content in in all the languages. Well the languages that this translator is working on it gets kind of hard so we're trying to you know make that a thing and The the rest of this presentation is really gonna answer these three questions that I kind
I would have hoped some of you would be asking but you're not it. They're really questions in my head I wish you would be asking you might have as well But this is these are the questions of the theoretical guy in my head might be asking what's the format of this thing that we just saw and this is really gonna work like everywhere and How's this gonna make my life better now or do I need to start using this whole new thing?
And and that's gonna be a pain. So I don't want to do that to to tackle the first one Did the answer and ultimately to all of that is to standardize everything and The first thing we're gonna talk about standardizing is the message there itself
And and one particular thing that some of you might have noticed is that it had curly braces around the text there around the Bruce said on hella singing back up. I'm sorry Bruce said on Belguin back up on Kim Sorry And this is because it turns out that when you're building a message formatting language like this oh
Good grief other corner cases. Oh Good grief is it like hard like proper hard because you're trying to write a formatting language that developers understand and Then get the developers to write content in that language that translators understand
Without needing to have the developers necessarily understand how translators think so you need to find an intermediate language for the communication to happen that Explicitly limits and forces the communication to to work in a way that works and This is one of the reasons why some parts of this work have been in in the active standards body for like three years so far
But yeah, one reason for those curly braces there is that quite often? Messages get complicated because you need to vary different parts of them depending on different variables
In English for instance It matters is it a he or a she or a they who might have you know? Done the action here of sent an invite to a party So we need to to have a language message format to which I'm presenting to you here to to to enable this sort of a communication and
Of course it gets more complicated than this because you can have stuff like here. We have a Need to include something more in the message of the relative time like say three days ago That that's included here so the language needs to allow for
internal variables for this message to be definable in a way that translators can kind of see what's going on and Hopefully not touch it too much because hopefully they don't need to do that But still be able to do so if they really really need to So so this is about the space of what's possible in most current
No in some of the current message formatting languages at least Project fluent which we maintain and work with and maybe one or two of those But when it gets really more complicated than that This is this gets on the edges of not really even supported anywhere
When you have here what we have are multiple different variables being defined and then the matching on which of these messages Really the message we're building it depends on how many people as well as the Agenda of the host. So this isn't even a full listing of the whole
Set of possible when cases that could be selected here But this is all Possible in it quite often happens when you really want to formulate UX experience that is
that is approaching natural language and And this is again referring to what I mentioned earlier. A lot of this stuff just isn't is is is the choices that People are making now Regarding message formatting. How do they formulate it are driven by the Limitations of the technologies that we have all available for us. So UX itself is being driven in certain directions
Because message formatting is hard and you don't end up really having messages like this in your UI if you care about localization because whoever's you know filtering your messages before they go to the Translators the localizers is going to tell you. Yeah. No, you can't do that
They're not going to ever be able to work with it So, please fix and then you end up even maybe building the UI differently in order to accommodate these needs with message format 2 which this is I Kind of hope we can get beyond that have the possibility and the options of having
even richer content in the in everything that we're working with but the second question there Was about is this really going to work everywhere and yes, and we're doing that But by trying to make much of the work happen at the lowest possible appropriate level for the work
So a lot of this is happening in the Unicode consultium and then we've got work going on in TC 39 for JavaScript It's being added to to to the ICU libraries Provided by Unicode as well and eventually
We're hoping to get probably in what we G discussions ongoing about the structure of the the HTML stuff that I was showing you earlier because that doesn't exist either yet and One particular part of this I I'm my background is as a JavaScript developer
Is that this is the first time we're really adding something to the JavaScript language itself at the level of like JSON dot parse Where you have this string representation of a thing? That's not JavaScript and You get an object or a thing out of it. I think that's really cool, but we're still working on that and they
the part here that makes this Extra interesting is that we're not just talking about a new syntax, but effectively through the work we've been doing it's Looking an awful lot like
everything in every single message formatting language that currently exists and is in use somewhere that is you know, We that we can know about that is not like closed and proprietary is supported in the data model that we end up with for message format, too So for example to answer the earlier
Talks questions about how do you get support for something like fluent into? Software like translate toolkit one quite probably answer for the general case of this Is that what you'll be able to do is take messages that you have in in dot properties files fluent get text X live pretty much anything and
parse that into defined data model structure for message format to Then be able to work with that using tools runtime, whatever And possibly from there get it out in a different format altogether
That's then supported by by other tooling. So It's a lot of this work is trying to figure out that hang on messages aren't really all that complicated as data structures in the end or we can at least Express the level of their complexity So we should a enable hello again
So, yeah Think I was about done with this slide and going on
One one key part here is that All of this is already real so The what I showed you in HTML is not exactly What we use internally at Mozilla
But it's effectively the same as how Firefox is now already Translated we have by now literal years of experience of working with tooling like this and seeing how it empowers you are UX development of a relatively complicated piece of software like Firefox to to improve itself and
to enable easier and better communication between developers and translators and So we're bringing a lot of that knowledge and experience into what we're doing in the Unicode consultant when designing message format to Which is yes taking inspiration, but also
Learnings from fluent and many other systems that Make it honestly a better Better than fluent currently is for instance Which is why we're not pitching that as the really cool sexy thing even though I mean if you're interested
It is the currently coolest thing around that's real. This is still in progress So, you know you could be interested in that The as I mentioned the the syntax itself for messages is getting defined under the Unicode common language data repository
Technical committee, there's a working it gets complicated in these things and there's a implementation available in ICU 72 for Java and The JavaScript proposals there's two of them at stage one currently for this are
progressing in TC 39 which is the body that defines JavaScript effectively and There's a polyfill package for for JavaScript if you want to start playing around with what message format to looks like and How you can work with it? But yeah, all of this is of course completely public
The all of their repositories all of the work standards are being developed completely in the open and I mean honestly Localization is one of those weird places where we don't need to filter anyone on credentials for like anything because
In terms of who wants to actually participate in the standards actions and standards work It's enough that you show up and you show some level of interest and we let you in in all the like inside clubs and Because they're all tiny. Um the it's a community where Really you can if you're interested you should not be afraid of
Someone saying no, you don't belong here because you do we we need always more people participating Yeah, there's links to me as well And also this talk is available at the URL there at the bottom. It's also attached to the
talk on penta barf Yeah, that was me are there any questions
The question is or was what really makes message format to better than fluent and
one particular Example is is when you get to complicated stuff like this is having the Effectively enforcing the data structure that you get end up getting from this to be one that contains full
messages that you end up representing to translators the Other than this it gets into really nitty-gritty details the the other big benefit of
message from our to over fluent is that message message from our to is Becoming a Unicode a standard rather than effectively a project built entirely from within Mozilla
So so the question here is about seeing the sort of typing that you see the the colon number and the colon relative time and
Actually, the colon gender is the same sort of thing here What are those and are these custom or centrally defined and the answer is kind of yes and no and it's complicated Because what you're looking at here are Effectively functions that act a little bit like types, but they're not exactly like types
They're declaring for example that the count that we're getting let's handle it as a number But also let's in the value of it that we end up assigning to count other use an offset of one So it's an operation happening on the input argument count and on the third line in the match For the hosts gender we could imagine host being some complicated
Object that's defining a whole person and we're picking the gender information from that more complex thing But yes in many cases they work kind of like types influent These are the capital number capital date time and capital
Platform functions that can be used in this sort of way as well. Just be loud. I'll repeat your question So if I've understood the question is what happens when the
when you come from a when you have a complicated thing like a whole page that you're translating and in comparing the source locale and the target locale the target locale ends up having Very different structure that might you know Be go much deeper
I suppose than just a simple link that I'm showing in this example of how does this really work? The answer is it's complicated and it depends on your use case This work in particular is is Trying to to build tools that could enable that sort of representation within message format, too
so you could End up somewhere really complicated, but you probably don't want to you you're probably in that sort of a situation Needing to Build more tools that are more specific to the use case that you have when you have when you need to
reformat a whole page in order to do work with a specific locale, it's There is no universal answer to this. This is the closest thing, but I don't know where it's really gonna go we have a question in the live stream
Translators often are not programmers They already struggle when translating strings with HTML tags and other technical terms The message format curly braces syntax might be difficult to understand and error prone So here we're talking about something Let's take this example of if you put this in front of a translator
Yeah, you don't This is not really what we want to do what we want to do is is create a format that enables a like HTML a representation of something like a message in a way that is relatively readable, but is not necessarily
Easy to edit and modify for someone who doesn't exactly know what they're dealing with a little bit like What happens if you take JSON and put it into a word document and then you start editing it and then you have to
Figure out that oh, there's a curly Quote somewhere that ended up screaming this sort of thing can happen entirely Well when you end up dealing with complicated messages like this So the answer here is that you end up using tooling that gets this to not be presented as one thing to a translator but
three Yeah, in this case three or more different messages where you end up Asking a translator wants to to translate name invited you to her party on relative date and
and there is second to to ask them to translate name invited you to his party on relative date and In in Finnish allow a translator because at Finnish doesn't he and she translate to the same word So in Finnish the equivalent of this message would end up being effectively just a third case without the whole matching
Because the structure of the language works differently. So you do end up when working with messages of this level of complexity Effectively needed to rely on tooling but the wonderful thing about message format too is that We can transform this representation of this message into any other representation of this message
That's hopefully going to work with whatever tooling is then available for the actual translation work to happen in so x-lift 2 for instance or other targets that are commonly supported by software used for translation or
some really simple representation That can be mapped then back to this but still allows a translator to just See a simpler thing at once rather than a really complicated thing. I think there's more questions, but are we out of time?
Two minutes Guy in front yellow if I understand the question, right?
You're asking how do you make sure that this isn't really what seems to work for English in a couple of languages around English But hopefully all the languages or a sizable number of languages
The short answer here is that with fluent? We've already we're already doing exactly this using a representation of messages that is very close to this so for instance at Mozilla from this experience we can say that The simpler than this structure that we have for fluent ends up working in
All of the languages that we need to deal with Through fluent which is about a hundred for Firefox 200 overall for the whole proud all of the different Projects that we are currently translating separately from this The the work being done for message format 2 is by no means
Done really from an English language point of view The Often main contributors currently working on the specification My background is Finnish there's a Polish guy and then there's a Romanian then there's a Sri Lankan and
There's a couple of others who are on the periphery of this who are from a much wider variety of backgrounds than this so we are bringing and ensuring that these sorts of Considerations are actively being remembered to be taken care of
So to some extent we are relying on the expertise that we have to some extent we are relying on the Experience we have with working with similar formats than what we're presenting here, but also we're trying to build a form a core specification for message formatting that is
Sufficiently Small but modular and powerful to then enable the support later on that is required by human languages We're trying to limit, you know to just being able to support human languages But it might go a little bit beyond that too. I
Think we're at time I'm very happy to have people come and ask me questions after Thank you