We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Data internationalization in Django

00:00

Formal Metadata

Title
Data internationalization in Django
Title of Series
Number of Parts
50
Author
Contributors
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
DjangoCon US 2018 - Data internationalization in Django by Raphel Michel There is a multitude of options to translate database data in Django, for example django-parler, django-modeltranslation, django-nece, django-hvad, and django-i18nfield (which is my own). The interesting thing is that these libraries are not multiple implementations of the same thing, but they are all radically different in their design and there are good reasons for every one of them. The sometimes subtle differences might not be obvious to a beginner in the Django world. This talk will help them navigate through different solutions and make an informed decision.
VideoconferencingInstallation artTranslation (relic)Asynchronous Transfer ModeDemonFlow separationTable (information)Field (computer science)Data modelLibrary (computing)Term (mathematics)Computer fileCASE <Informatik>MathematicsTranslation (relic)Different (Kate Ryan album)String (computer science)Endliche ModelltheorieInternationalization and localizationAttribute grammarDatabaseFunctional (mathematics)Product (business)Category of beingINTEGRALProjective planeQuery languageElectronic mailing listField (computer science)Row (database)System administratorSocial engineering (security)Event horizonData managementFlow separationFormal languageFunction (mathematics)Form (programming)outputTable (information)MultiplicationData storage deviceSoftware developerOpen sourceRelational databaseObject (grammar)Front and back endsDatabase normalizationData typeCartesian coordinate systemMultiplication signInheritance (object-oriented programming)DebuggerDescriptive statisticsTemplate (C++)Goodness of fitAuthorizationSubject indexingHuman migrationSelf-organizationStandard deviationNatural numberBranch (computer science)Similarity (geometry)Theory of relativitySet (mathematics)Compilation albumComputer animation
Field (computer science)Data typeImage registrationObject (grammar)Internationalization and localizationLibrary (computing)GUI widgetInformationType theoryDifferent (Kate Ryan album)Object (grammar)Endliche ModelltheorieRevision control1 (number)outputDatabaseAdditionFormal languageField (computer science)Normal (geometry)Structural loadDemo (music)Attribute grammarForm (programming)Computer configurationMobile appMultiplicationUtility softwareTranslation (relic)BenchmarkDefault (computer science)CodeMeta elementImage registrationData storage deviceProcess (computing)Subject indexingCASE <Informatik>Operator (mathematics)Query languageSystem callInheritance (object-oriented programming)Multiplication tableLevel (video gaming)Multiplication signBitFlow separationContext awarenessString (computer science)Windows RegistryPattern languageNumberState of matterFunction (mathematics)Table (information)System administratorSelf-organizationBlock (periodic table)Local ringImplementationWrapper (data mining)Roundness (object)Category of beingInternationalization and localizationPairwise comparisonRight anglePresentation of a groupTwitterWorkstation <Musikinstrument>Data typeVideo gameCartesian coordinate systemNatural number
Coma BerenicesBoom (sailing)XMLComputer animation
Transcript: English(auto-generated)
Good afternoon from my side as well my name is Raphael I'm a software developer from Heidelberg in Germany and I've been the co-chair of DjangoCon
Europe this year and as this conference is coming to close shortly I will want to take the opportunity to thank the organizers for putting up this conference because once you've done something like this you get to really appreciate not having to do it and just sitting here
and enjoy the wonderful conference that they created for us. Thank you so much for bringing me here to see this. Now let's get to the the topic I want to be talking about which is data internationalization in Django and before we do that let's recap shortly what internationalization in Django
without the data could mean and we've had a talk I think yesterday or the features that allows you to translate your application into other languages
this means that you can use certain functions to mark strings in your code or in your templates to say okay this is a string that is different when my application is not run in English or not used in English but in another language and then you can use the get text toolkit which is like the standard translation toolkit in the whole Unix world to extract all of
those strings create a file with all the translations send it out to translators collect the translations again compile it and use it and that's great but sometimes it's not enough one of the open source projects I'm maintaining is pretext which is an open source ticket shop for events just
like this one and it allows you to create a shop that speaks to a multilingual audience so if you have attendees coming from different countries speaking different languages you might want to present your shop in
multiple languages at the same time and that means it's not sufficient if only the application is translated you need to translate data as well and by data I mean things that are entered by the administrator for the shop for example the names of the products or the description on how to get to the event and so on that's all something that needs to be entered in multiple
languages and then in the output must be shown in the language of the respective user so we need forms or some kind of input method that allow us to store data in multiple languages so it's not really suitable to use get
text for that because we would need to like every time someone changed something we would need to generate certain files and the translators and so on but that just doesn't work and so we cannot really use the tools provided by Django surely there is a third-party library that we just need
to install and that will solve this problem for us I've got good news and bad news for you the good news is there is such a library the bad news is I counted 23 of them until I stopped so who in this room has ever used one of these libraries okay that's not that much is there someone here who has
written one of these libraries okay that was more people at Django Europe last year and so disclaimer I'm the author of Django i18m field I put those seven at the tap who appear to be actively maintained by that I
mean they are they at least have a development branch that is compatible to Django 2.1 and they had commits like in the last six to twelve months so I will be focusing on giving you a short brief overview over those top seven libraries because the problem is there is not a single best library for
this use case because they are fundamentally different in their approach and fundamentally differently appropriate for the different use cases if I get something wrong about one of these libraries please feel free to correct me later on slack I haven't used most of them actually in an actual
product but I've played around with all of them so to compare these we want to look at different categories and we want to look at how the data is stored in the database we want to know how their Python API looks and how easy
it is to work with them what other features they might provide for example integration with Django admin integration with forms and so on and we are interested in if they have a significant performance impact and how large that is so to have an example to work with let's use a model where we
store a list of movies and for every movie we want to store the title of the movie and the year the movie was released obviously the year is something that is not really local dependent although it might be but the
title is certainly something that is different everywhere on the world even though it's the same movie so let's look at how different libraries try to represent that in a database and the first approach that we see see for example in jungle well in jungle Pali is to have a separate table that contains the translated strings so in our main table movies we have just the the
untranslated attributes with the ID of the movie and the year and then we have a second table where every row references an object in the main table and then says okay for English this is the name of the movie and for
Italian this is the name of the movie so this is in terms of relational databases this is a very clean approach it kind of fits the normalize until it hurts that we learned yesterday and if we are interested to you like build our shop front end and we want to have a little or our movie
list and we want to have lists of movies and we want to have the Italian movie Italian title for every movie then this is also very efficient because modern databases are very good at performing joints however for example in the back end where we want to have a list of the movies where we want to show every language per movie this gets really expensive because we need to
do either a lot of queries or we need to work with the query data a lot a separate approach seen in Django model translation or Django translated fields is to just have separate columns per language this way you don't need any joints and it's very cheap to get all languages at the same time
however every time you add a new language you need to do a database migration which can be very annoying and the third style that we see used in Django 18 and field Django nature and Django model trans is to to use a JSON like field this is less clean in terms of database normalization but we
don't do it need to do any joints we don't need to do any changes to our schema when we add languages and and it's all contained in one field as we had it before if we're on Postgres and if we use JSON at the Postgres
native JSON data type we can still retain the functionality of filtering by them or searching the name in a specific language or or sorting by name if we're not on Postgres we kind of lose the functionality to to index or query that data that might be a problem or might be totally fine for your use
case and Django model trans is a bit different than the other two it's it uses not one JSON column per per translated column but only one JSON column for the whole table no matter how many fields you translate but apart from that there's a semi pretty similar Nietzsche and model trans only
work in Postgres i18n field drops the the indexing and filtering possibilities in there but works on all databases I've I'll be working at the sprints on something that uses the Postgres data type when you are in Postgres and
gracefully falls back to your text field on all other databases next we want to look at how you define your models and there again a couple of different styles for example in Django well Django Pali and Django Nietzsche and you have a custom base class that you inherit your models form from
and they will change your query manager and change a lot of things and how your model work to like as automagically as possible build those joints for you or translate your queries for you sometimes you need to wrap your fields to in in some rapper object to tell the library which once you
translate sometimes you have an additional meta option but in the end it's the same thing the other style is that you have a custom field type and do not change the way the model in itself works at all for example in
Django i18n field or Django translated fields you just have a custom type that is a translated character field or translated text field whereas in Django model trends you like have per model you have one field that is
called i18n or whatever you want to call it and that stores the translation for all other fields the third style is to decouple it from the model definition process completely and have like a separate registry where you register those options this is like the the most I would call it the most
unclean style of doing it because you it's kind of not obvious where your code lives on the other hand this allows you to to enable translations for models that are not in the code that you control which might be
something you need here with such registration patterns and the third thing I want to look at in detail is how to interact with your model objects in some of the libraries you can only interact with one language at
a time this is mostly because of this this joins that they're performing if you pull the object from the database it will pull the information for one language like you need to specify that within your query and then the title attribute of the object will be populated with the Italian title and
then if you want to access the English title you need to change the language and will it will perform a new query or depending on the implementation or in case of Nietzsche it will not but it will change like the the state and title will now contain an English the English title the other option is to be
able to access all properties at once this comes naturally to the to the libraries where you have separate columns per language because you and you have your main attribute title that usually evaluates lazily to the value of the
currently active locale and you can directly access every other language by just using title underscore and the language code because that's just either because that's just the field that the library creates or because it
virtually creates it for you in Django i18n field it's a bit different the title attribute will always contain a special data type lazy internationalized string which is some some in some ways like what you get in return from you get text underscore lazy it will whenever you you cast it to a string
it will of course to the currently active locale but it is a special data object that contains the information on all language so you can pass that around as one object so to recap and to add the other features we have a couple
of different database layouts that that are in use we have the the version where we have everything in its own table like we have multiple tables to
store our model we have the version where we have a very wide table with multiple columns for every language that we have and we have the like embedded version within one column we have database support for most of the
libraries for all databases that Django supports but in the case of Django nichi and Django model trends they only run on PostgreSQL we can we have different levels of support for for filtering the the objects for example in those that use normalized database load is really easy although it
might be computationally expensive whereas in those that use the the PostgreSQL JSON field it should be easy they don't provide any utilities for you to make it even easier like querying in the currently default language you need to do that on your own but it it's conceptually
possible whereas in i18 field it's currently not really possible like searching somehow works but ordering or indexing is not possible we have the separate styles of defining the model either by defining a base class or by registration or by a custom field type and we have the separate styles of
object operation where you can either access one language at the time or all of the language at once I didn't talk in detail about form support some of them provide for most of them form support comes naturally by just like
when they generate that your your model fields that have a separate column per language then it will just when you use a model form it was will just automatically get generate that number of fields so in some of them
you will get in a model from you will get a field that just allows you to edit the currently active language which I don't think is useful in very many use cases in some you will like get different form fields for each one form field per language nature doesn't have form support at all it just gives
you a text widget where you can edit the the JSON blob and in i18 and field you will get a special form field type with a special widget that works like the compound daytime widget just has compound input fields within one widget to ask for the different languages some of them have very
elaborate support for the Django admin and do that very nicely others just present you with the JSON blob or just different fields below I did a small benchmark to have a look at their performance the benchmark is of course
not representative for real life application it just stores a lot of objects into the database pulls them out again and tries to access the attributes in various languages that obviously is rather slow on those that need to do joints and need to refetch although at least Django Pali can
make use of caching to reduce this while the others quite unsurprisingly the PostgreSQL JSON field back implementations are very very fast and the others also reasonably fast I've created a demo app that uses all of
the seven libraries and also contains the bank benchmark code in case you're interested in that and with that I'm a bit faster than I expected at the end of my talk and I would be happy if you have any questions on that subject
that was a great talk thank you just one question where did you get the inspiration to do the emoji feature comparison I don't remember I've seen a lot of emojis at the Django conferences I've attended so it's maybe Katie's fault but hi great thanks for the presentation right to left
support how easy is that to implement and how do the libraries you may need in context of storing the data I haven't tried I don't see any problems ahead
there because in the end those libraries are just about how to store these Unicode strings that uses input and we are that we output at a later station so I think on this on the on the model level it's not that important it might get more interesting if you have like if you're rendering a form and you need to render some of the widgets with left to right CSS and
others with right to left see this that might get interesting but on the database level I don't think it should be a problem let's give a big round to Raphael