Users expect our software to speak their language. Unfortunately for us, there will likely be more than one language involved. While doing simple string replacement isn’t too involved, correctly dealing with all the grammar issues can be tricky. After all, who wants to see “List 1 file(s)” from a program output?
But a real i18n solution needs to do more than just provide a means of achieving the correct output. It needs to make this process relatively error-proof, and easy for both the programmer and the translator. Yesod’s answer to the problem gives you:
Intelligent guessing of the user’s desired language based on request headers, with the ability to override.
A simple syntax for giving translations that requires no Haskell knowledge. (After all, most translators aren’t programmers.)
The ability to bring in the full power of Haskell for tricky grammar issues as necessary, along with a default selection of helper functions to cover most needs.
Absolutely no issues at all with word order.
-- @messages/en.msg Hello: Hello EnterItemCount: I would like to buy: Purchase: Purchase ItemCount count@Int: You have purchased #{showInt count} #{plural count "item" "items"}. SwitchLanguage: Switch language to: Switch: Switch
-- @messages/he.msg Hello: שלום EnterItemCount: אני רוצה לקנות: Purchase: קנה ItemCount count: קנית #{showInt count} #{plural count "דבר" "דברים"}. SwitchLanguage: החלף שפה ל: Switch: החלף
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
import
Yesod
data
App
=
App
mkMessage
"App"
"messages"
"en"
plural
::
Int
->
String
->
String
->
String
plural
1
x
_
=
x
plural
_
_
y
=
y
showInt
::
Int
->
String
showInt
=
show
instance
Yesod
App
instance
RenderMessage
App
FormMessage
where
renderMessage
_
_
=
defaultFormMessage
mkYesod
"App"
[
parseRoutes
|
/
HomeR
GET
/
buy
BuyR
GET
/
lang
LangR
POST
|
]
getHomeR
::
Handler
Html
getHomeR
=
defaultLayout
[
whamlet
|
<
h1
>
_
{
MsgHello
}
<
form
action
=@
{
BuyR
}
>
_
{
MsgEnterItemCount
}
<
input
type
=
text
name
=
count
>
<
input
type
=
submit
value
=
_
{
MsgPurchase
}
>
<
form
action
=@
{
LangR
}
method
=
post
>
_
{
MsgSwitchLanguage
}
<
select
name
=
lang
>
<
option
value
=
en
>
English
<
option
value
=
he
>
Hebrew
<
input
type
=
submit
value
=
_
{
MsgSwitch
}
>
|
]
getBuyR
::
Handler
Html
getBuyR
=
do
count
<-
runInputGet
$
ireq
intField
"count"
defaultLayout
[
whamlet
|<
p
>
_
{
MsgItemCount
count
}
|
]
postLangR
::
Handler
()
postLangR
=
do
lang
<-
runInputPost
$
ireq
textField
"lang"
setLanguage
lang
redirect
HomeR
main
::
IO
()
main
=
warp
3000
App
Most existing i18n solutions out there, like gettext
or Java message bundles,
work on the principle of string lookups. Usually some form of
printf
interpolation is used to interpolate variables into the strings. In
Yesod, as you might guess, we instead rely on types. This gives us all of our
normal advantages, such as the compiler automatically catching mistakes.
Let’s take a concrete example. Suppose our application needs to accomplish two simple tasks: saying “hello,” and stating how many users are logged into the system. This can be modeled with a sum type:
data
MyMessage
=
MsgHello
|
MsgUsersLoggedIn
Int
We can also write a function to turn this data type into an English representation:
toEnglish
::
MyMessage
->
String
toEnglish
MsgHello
=
"Hello there!"
toEnglish
(
MsgUsersLoggedIn
1
)
=
"There is 1 user logged in."
toEnglish
(
MsgUsersLoggedIn
i
)
=
"There are "
++
show
i
++
" users logged in."
We can write similar functions for other languages, too. The advantage to this inside-Haskell approach is that we have the full power of Haskell for addressing tricky grammar issues, especially pluralization.
The downside, however, is that you have to write all of this inside of Haskell, which won’t be very translator-friendly. To solve this problem, Yesod introduces the concept of message files. We’ll cover those in the next section.
You may think pluralization isn’t so complicated: you have one version for one item, and another for any other count. That might be true in English, but it’s not true for every language. Russian, for example, has six different forms, and you need to use some modulus logic to determine which one to use.
Assuming we have this full set of translation functions, how do we go about using them? What we need is a new function to wrap them all up together, and then choose the appropriate translation function based on the user’s selected language. Once we have that, Yesod can automatically choose the most relevant render function and call it on the values provided.
As we’ll see shortly, in order to simplify things a bit Hamlet has a special interpolation syntax, _{…}
, which handles all the calls to the render functions. To associate a render function with your application, you use the YesodMessage
typeclass.
The simplest approach to creating translations is via message files. The setup is simple: there is a single folder containing all of your translation files, with a single file for each language. Each file is named based on its language code (e.g., en.msg), and each line in a file handles one phrase, which correlates to a single constructor in your message data type.
The scaffolded site already includes a fully configured message folder.
So first, a word about language codes. There are really two choices
available: using a two-letter language code or a language-LOCALE
code. For
example, when I load up a page in my web browser, it sends two language codes:
en-US
and en
. What my browser is saying is, “If you have American English, I
like that the most. If you have English, I’ll take that instead.”
So which format should you use in your application? Most likely two-letter codes, unless you are actually creating separate translations by locale. This ensures that someone asking for Canadian English will still see your English. Behind the scenes, Yesod will add the two-letter codes where relevant. For example, suppose a user has the following language list:
pt-BR, es, he
What this means is “I like Brazilian Portuguese, then Spanish, and then
Hebrew.” Suppose your application provides the languages pt
(general
Portuguese) and en
(English), with English as the default. Strictly following the
user’s language list would result in the user being served English. Instead,
Yesod translates that list into:
pt-BR, es, he, pt
In other words, unless you’re giving different translations based on locale, just stick to the two-letter language codes.
Now what about these message files? The syntax should be very familiar after your work with Hamlet and Persistent. The line starts off with the name of the message. Because this is a data constructor, it must start with a capital letter. Next, you can have individual parameters, which must be given as lowercase. These will be arguments to the data constructor.
The argument list is terminated by a colon, and then followed by the translated
string, which allows usage of our typical variable interpolation syntax
#{myVar}
. By referring to the parameters defined before the colon, and using
translation helper functions to deal with issues like pluralization, you can
create all the translated messages you need.
We will be creating a data type out of our message specifications, so each
parameter to a data constructor must be given a data type. We use @-syntax
for this. For example, to create the data type data MyMessage = MsgHello |
MsgSayAge Int
, we would write:
Hello: Hi there! SayAge age@Int: Your age is: #{show age}
But there are two problems with this:
It’s not very DRY (Don’t Repeat Yourself) to specify this data type in every file.
Translators will be confused by having to specify these data types.
So instead, the type specification is only required in the main language file.
This is specified as the third argument in the mkMessage
function. This also
specifies what the backup language will be, to be used when none of the
languages provided by your application match the user’s language list.
Your call to mkMessage
creates an instance of the RenderMessage
typeclass,
which is the core of Yesod’s i18n. It is defined as:
class
RenderMessage
master
message
where
renderMessage
::
master
->
[
Text
]
-- ^ languages
->
message
->
Text
Notice that there are two parameters to the RenderMessage
class: the master
site and the message type. In theory, we could skip the master type here, but
that would mean that every site would need to have the same set of translations
for each message type. When it comes to shared libraries like forms, that would
not be a workable solution.
The renderMessage
function takes a parameter for each of the class’s type
parameters: master
and message
. The extra parameter is a list of languages the
user will accept, in descending order of priority. The method then returns a
user-ready Text
that can be displayed.
A simple instance of RenderMessage
may involve no actual translation of
strings; instead, it will just display the same value for every language. For
example:
data
MyMessage
=
Hello
|
Greet
Text
instance
RenderMessage
MyApp
MyMessage
where
renderMessage
_
_
Hello
=
"Hello"
renderMessage
_
_
(
Greet
name
)
=
"Welcome, "
<>
name
<>
"!"
Notice how we ignore the first two parameters to renderMessage
. We can now
extend this to support multiple languages:
renderEn
Hello
=
"Hello"
renderEn
(
Greet
name
)
=
"Welcome, "
<>
name
<>
"!"
renderHe
Hello
=
"שלום"
renderHe
(
Greet
name
)
=
"ברוכים הבאים, "
<>
name
<>
"!"
instance
RenderMessage
MyApp
MyMessage
where
renderMessage
_
(
"en"
:
_
)
=
renderEn
renderMessage
_
(
"he"
:
_
)
=
renderHe
renderMessage
master
(
_
:
langs
)
=
renderMessage
master
langs
renderMessage
_
[]
=
renderEn
The idea here is fairly straightforward: we define helper functions to support
each language. We then add a clause to catch each of those languages in the
renderMessage
definition. We then have two final cases: if no languages
matched, continue checking with the next language in the user’s priority list; or, if we’ve exhausted all languages the user specified, then use the default
language (in our case, English).
Odds are that you will never need to worry about writing this stuff manually, as the message file interface does all this for you. But it’s always a good idea to have an understanding of what’s going on under the surface.
One way to use your new RenderMessage
instance would be to directly call the
renderMessage
function. This would work, but it’s a bit tedious: you need to
pass in the foundation value and the language list manually. Instead, Hamlet
provides a specialized i18n interpolation, which looks like _{…}
.
Why the underscore? The underscore is already a well-established character for i18n, as it is used in the gettext
library.
Hamlet will then automatically translate that to a call to renderMessage
.
Once Hamlet gets the output Text
value, it uses the toHtml
function to
produce an Html
value, meaning that any special characters (e.g., <
, &
, >
) will be automatically escaped.
As a final note, I’d just like to give some general i18n advice. Let’s say you have an application for selling turtles. You’re going to use the word “turtle” in multiple places, like “You have added 4 turtles to your cart.” and “You have purchased 4 turtles, congratulations!” As a programmer, you’ll immediately notice the code reuse potential: we have the phrase “4 turtles” twice. So, you might structure your message file as:
AddStart: You have added AddEnd: to your cart. PurchaseStart: You have purchased PurchaseEnd: , congratulations! Turtles count@Int: #{show count} #{plural count "turtle" "turtles"}
Stop right there! This is all well and good from a programming perspective, but translations are not programming. There are a many things that could go wrong with this, such as:
Some languages might put “to your cart.” before “You have added”.
Maybe “added” will be constructed differently depending on whether the user added one or more turtles.
There are a bunch of whitespace issues.
So the general rule is: translate entire phrases, not just words.