NOTE: This is currently a WIP and several things ahve changed since it's initial writing
Internationalization of the lessons have always been a priority for The Carpentries given the fact that we are a global organization. There are three levels at which translations can be added:
The first issue is common for all websites and has several available solutions that exits in several languages.
At the moment, The Carpentries website supports l10n
in a low-rent manner by including a yaml dictionary in the _data
directory
and switching languages via the site.data.language
variable (as shown in
this example that translates "This content is open
source").
Though, how exactly this is accessible via the main site is not clear.
To make sure that the translations are compatible no matter what tooling we use
(R, Python, JavaScript, PhP), we should store the translations in the *.po
(portable object) so that each language can use its own gettext()
utility to
swap out the translations.
Because it will be associated with the lesson template itself, the *po
files
will live in the {varnish} package and be
used from R to translate messages when the website is being generated.
References for definitions is achieved via the {glosario} project where the glossary is formatted as a yaml file and there are python, and R libraries that can be used to extract specific translations for these glossaries.
This is a topic that is currently not well addressed and is quite hard to do
because translating prose is much harder than translating individual messages
because the context of an individual paragraph in a section is important. David
Pérez-Suárez has proposed to use a {gettext}
solution because this is a
standard for translating messages in several computer programs. He found a
python + BASH project called
po4gitbook that will convert
markdown content to po files for translation and back again. However, he's
finding that it breaks down a lot with parsing markdown elements like lists and
R chunks. I'm thinking that a solution is to use parse the markdown with the
commonmark XML spec and then use that to extract the paragraph elements, recast
them into markdown and use those for basis of the translated messages. This
way, parsing won't be an issue. The big challenge is that the library has to be
re-written for that to happen.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.