Guidelines for Developing a Multilingual Interface

Gary Perlman

I wrote this page of guidelines to explain some translation problems that come up periodically. The target audience includes designers, developers, and product managers, particularly if they are unilingual. These guidelines use French examples because I know some French, but they apply to other languages. Some guidelines may refer to HTML, but still apply to other markup languages. A general guideline is to try to avoid making assumptions.

Further reading: Globalization book | Translation issues with OCLC FirstSearch | Wikipedia on Language Localisation

  1. Number and Gender Agreement for Articles and Adjectives
    Translate whole phrases, and sometimes, whole sentences.
  2. Word Order Variation
    Translate whole phrases, and maybe even whole sentences.
  3. Punctuation Variation
    Include punctuation in text to be translated because it can vary across language and location.
  4. Inserting Variable Values
    Do not concatenate text with variables to form continuous text. Allow translators to move variables.
  5. Inserting Variables by Appending
    Use "label : value" format when possible.
  6. Meaningful Variable Names
    Adopt a naming convention to convey to translators what a variable means and how it will be used.
  7. Embedding Code
    Try to avoid embedded code (e.g., HTML) because it might get altered.
  8. International Formats
    A simple universal date format might be easier and no worse than something more elaborate.
  9. Text on Images Considered Harmful
    Text "burned into" images creates an initial development hardship and eventually a maintenance nightmare.
  10. Language Length Variation
    Assume that non-English languages will take as much as 50% more space on average.
  11. Identification of Languages
    Present the option for a language in the language.

Number and Gender Agreement for Articles and Adjectives

Guideline: Translate whole phrases, and sometimes, whole sentences.

English has plurals but other languages have genders that modify the form of a noun and/or adjective. In French, a web site (un site Web) is masculine, but a web page (une page Web) is feminine. Adjectives must agree with the number and the gender of what they modify.
EnglishFrench
the good web page la bonne page Web
the good web site le bon site Web
the good web pages les bonnes pages Web
the good web sites les bons sites Web

Note that the word "good" assumes four different forms in French, depending on gender and number. Translating the word "good" by itself is "bad". And, note that the articles with the nouns also change.

From the English column, it looks like some money could be saved by translating "the good web", "page" and "site" and simply append an "s" when you needed to. In reality, it would drive translators crazy, the translation would be awful, and rewording would require code changes.

Word Order Variation

Guideline: Translate whole phrases, and maybe even whole sentences.

Word order is not necessarily the same in different languages. For example, English adjectives usually precede the nouns they modify (e.g., the large green icon), but usually follow in French, with the exceptions being the most common, (e.g., la grande icône verte).

In French, some adjectives can go before or after the noun they modify, with occasionally amusing / unfortunate results:

FrenchEnglish Meaning
Notre ancienne interface avait le menu du côté gauche. Our old interface had the menu on the left.
Notre interface ancienne avait le menu du côté gauche. Our antiquated interface had the menu on the left.

Punctuation Variation

Guideline: Include punctuation in text to be translated because it can vary across language and location.

In French, there is conventionally a space before a colon:
Auteur : Kurt Vonnegut
If the colon is appended by application code, then the application needs to know that a space is required. Many punctuation characters are handled differently. It might be easier to simply include the punctuation so that the translator can take care of any details.

In French, it is common to use italics to show an untranslated name (e.g., WorldCat), so some care must be taken when displaying marked-up translations because the markup might not get interpreted, e.g., in the window's title bar or in hover-text (hover your cursor over WorldCat and you may see uninterpreted HTML).

Inserting Variable Values

Guideline: Do not concatenate text with variables to form continuous text. Allow translators to move variables.

Parts of phrases can not simply be concatenated. For example, the English text that might accompany paginated results:
1-10 of 75
is translated into Chinese as:
Total is 75 and here is 1-10 of them
When values can not be appended, they can be inserted by allowing translators to move variables. To translate
Records 11-20 of about 526 for "multilingual user interfaces"
variables can be defined elsewhere and inserted when the text is displayed.
Here is an example with perl variables:
Records ${'recno'}-${'recno2'} of about ${'numrecs'} for ${'query'}
and here is one for an SGML-entity format:
Records &recno;-&recno2; of about &numrecs; for &query;
It is critical that the translators recognize the syntax of variables and know how to move them correctly. Otherwise, translators may translate a variable name or cause a syntax error. While the order may be the same for English and French, other languages (e.g., Chinese) might display the information more like:
Records returned &numrecs; Records displayed &recno;-&recno2; Query &query;

Inserting Variables by Appending

Guideline: Use "label : value" format when possible.

Inserting variables into a sentence can be tricky. To indicate "Your search found 10 records", it will not work to translate "Your search found" and "records" and insert the number found. Even in English, the method will produce grammatically incorrect text like "Your search found 1 records", or require the awkward term "record(s)". An alternative is to use a label and a value:
Records found: 1
This translated well into all languages (at least left-to-right languages), but note that the colon is preceded by a space in French, unlike other languages, so even that part must be translated.
Notices repérées : 1

When inserting values (numbers, dates, currency), keep in mind that different locales will expect them in different formats.

Meaningful Variable Names

Guideline: Adopt a naming convention to convey to translators what a variable means and how it will be used.

In OCLC FirstSearch language files, the name of a variable conveys its meaning, and the section of the language file conveys how it will be used. For example, the labels for fields (e.g., Title, Author, Year, Publisher) for bibliographic records are all in the same section (the [field] section) of the language file. The same variable names in an [index] section could show the names of search indexes (which might not match the field labels). Yet another section might give search examples for the named indexes.
[field]
title     = Title:
author    = Author(s):
year      = Date of Publication:
publisher = Publication
[index]
title     = Title
author    = Author
year      = Year
publisher = Publisher
[examples]
title     = Harry Potter and the Philosopher's Stone
author    = J. K. Rowling
year      = 1998
publisher = A. A. Levine Books
Avoid names that provide no information beyond the English version, especially if the word conveys little information, such as prepositions.
of = of
Instead, use names that are descriptive, possibly much longer than the values in a particular language, so that the meaning is clearer to the translator.
DurationOfVideo = Time:

Embedding Code

Guideline: Try to avoid embedded code (e.g., HTML) because it might get altered.

Sometimes it is necessary to embed some markup in the text so that the translator can move the marked-up text in a phrase.
Your search, <tt>&terms;</tt>, matched no records.
or
Hello, <strong>&username;</strong>, welcome to our service!
Even simple markup might get mangled a small percentage of the time, but including a lot of markup can make unintended changes more likely, which can result in a feature not working. For example:
<label for="terms">Find</label> <input type="text" name="terms" id="terms" onchange="validate(this)" class="text" title="terms" /> in results.
Translators may not know much if anything about coding and might inadvertently break the code by modifying some markup or by inadvertently translating some markup. In the example above, it may not be clear that the only text that should be translated is the title attribute "terms", and that translating other "terms" strings will break the code. And frankly, reordering complex HTML to match the target language grammar might be error prone even for programmers. Variables like the following might make the translator's job easier:
FindInResults = Find in results:
TermBoxHoverText = terms
and then the code can be hidden from the translator:
<label for="terms">&FindInResults;</label> <input type="text" name="terms" id="terms" onchange="validate(this)" class="text" title="&TermBoxHoverText;" />

System-wide variables: A special case of code to hide from programmers are system-wide variables, such as locations of files. These are best defined in one place and inserted as variables so that if they need to be changed, the change is made in one place, not in many language files, where they might be mangled by translators. And if the values of the variables vary by language, then a variable can be used to adapt them (assuming a regular naming convention; otherwise, there can be a separate store of variables for each language).

For the translator: helplabel = Help
Initialization file: helpurl = http://www.oclc.org/worldcat/help/&language;/
For the code: <a href="&helpurl;">&helplabel;</a>

If there is any code in translated text, it is a good idea to validate what comes back from the translator (valid HTML tags, correct syntax for entities). And from experience, it is a good idea to validate what is sent to the translators before it is sent, because any errors will probably be dutifully maintained in all translated languages, and then need to be fixed in many places.

International Formats

Guideline: A simple universal date format might be easier and no worse than something more elaborate.

Unless there is a compelling reason to do otherwise, use ISO 8601 date format (e.g., 2008-05-12). Besides being an international standard, it has the virtues of being unambiguous (unless you think someone would display years, then days, then months), relatively compact, and easy to implement. Note that hyphens in HTML can be broken across lines, so instead of using the effective but invalid <nobr> tag, you can use CSS:
<span style="white-space: nowrap">2008-08-08</span>

If you want to display more "natural" dates, please note that the punctuation, order of values, and value display all vary by locale (combination of language and location), so translating dates requires more than just translating months and maybe weekdays. For more natural date and time displays, see the Java formatting for Dates and Times. In addition to date formatting, there are Java classes for numbers and currencies (not everyone uses a decimal point or a comma to delimit thousands).

While on the subject of international standards, it should be noted that the United States is practically the only country in the world that is not on the metric system. With geographic information systems becoming more popular, distances should be reported in kilometers (or kilometres) as well as miles. The units can be based on the browser locale setting, with miles reported for en-US, and metric for others. Note that the United Kingdom (en-GB), while officially metric, conventionally uses miles for travel distances, as well as some other Imperial measures such as pints of beer.

Text on Images Considered Harmful

Guideline: Text "burned into" images creates an initial development hardship and eventually a maintenance nightmare.

Making a button with some text can be an attractive choice for full control over graphic design. The font, color, spacing, ..., are fully controlled. To translate text on images requires The whole translation process often takes about a week, if all goes well, but if there are several languages to translate, then there might be problems. A common bad result is that an image with English text is shown to all users. Even if the initial costs are acceptable, when it becomes desirable to change the text, the costs make text-on-image maintenance so unattractive that it is usually avoided.
If an interface must contain text on images, use the language code (e.g., ISO 639 codes such as en, fr, es, zh-CN) in the file name or path so a single expression can be used for all languages.
splashpage_en.jpg
or
/images/en/splashpage.jpg

In addition to translation problems with text on images, there can be cultural issues with graphics on images, particularly body parts such as hands and eyes.

Language Length Variation

Guideline: Assume that non-English languages will take as much as 50% more space on average.

English text is shorter than some languages, longer than others. Don't expect that careful editing to fit onto one line in English will translate to the same effect in other languages.

The acceptability of abbreviations varies across languages, locales, and translators, so you might not be able to count on translators using them. In WorldCat.org lists, pagination looks like this in different languages:

 首页    前页   4 5 6 7 8   下页   尾页
 First    Prev   4 5 6 7 8   Next   Last
 Eerste    Vorige   4 5 6 7 8   Volgende   Laatste
 Première    Précédente   4 5 6 7 8   Suivante   Dernière
Notes:

Identification of Languages

Guideline: Present the option for a language in the language.

Flags are not languages, wrote Steven Pemberton, when he explained that flags represent countries, not languages. UK flags Flag of Great Britain and American flags Flag of United States and even combined UK/US flags have been used for English, although there are several languages with origins in the UK, and there is an official flag of England. Spanish is from Spain Flag of Spain but it is spoken in many countries, and even within Spain, the Spanish language is not universal. Arabic is spoken in many countries, but it may be difficult and even politically naive to try to select a flag to represent a language.
When displaying a language option to a user, the user needs to be able to understand the option, so these would not be a good set of links to select a language:
A native Chinese speaker might not even recognize that the options are for languages, but most people can recognize their own language in their own language. In the following list, the languages are ordered by their two-character ISO 639 language codes. Notes: