Nothing
#' Interactively provide translations for a package's messages
#'
#' @description
#' This function handles the "grunt work" of building and updating translation
#' libraries. In addition to providing a friendly interface for supplying
#' translations, some internal logic is built to help make your package more
#' translation-friendly.
#'
#' To get started, the package developer should run `translate_package()` on
#' your package's source to produce a template `.pot` file (or files, if your
#' package has both R and C/C++ messages to translated), e.g.
#'
#' To add translations in your desired language, include the target language:
#' in the `translate_package(languages = "es")` call.
#'
#' @section Phases:
#' `translate_package()` goes through roughly three "phases" of translation.
#'
#' 1. Setup -- `dir` is checked for existing translations
#' (toggling between "update" and "new" modes), and R files are parsed and
#' combed for user-facing messages.
#'
#' 2. Diagnostics: see the Diagnostics section below. Any
#' diagnostic detecting "unhealthy" messages will result in a yes/no prompt to
#' exit translation to address the issues before continuing.
#'
#' 3. Translation. All of the messages found in phase one are iterated over --
#' the user is shown a message in English and prompted for the translation
#' in the target language. This process is repeated for each domain
#' in `languages`.
#'
#' An attempt is made to provide hints for some translations that require
#' special care (e.g. that have escape sequences or use templates). For
#' templated messages (e.g., that use `%s`), the user-provided message
#' must match the templates of the English message. The templates *don't*
#' have to be in the same order -- R understands template reordering, e.g.
#' `%2$s` says "interpret the second input as a string". See
#' [sprintf()] for more details.
#'
#' After each language is completed, a corresponding \file{.po} file is written
#' to the package's \file{po} directory (which is created if it does not yet
#' exist).
#'
#' There are some discrepancies in the default behavior of
#' `translate_package` and the translation workflow used to generate the
#' \file{.po}/\file{.pot} files for R itself (mainly, the suite of functions
#' from `tools`, [tools::update_pkg_po()],
#' [tools::xgettext2pot()], [tools::xgettext()], and
#' [tools::xngettext()]). They should only be superficial (e.g.,
#' whitespace or comments), but nevertheless may represent a barrier to
#' smoothly submitting patchings to R Core. To make the process of translating
#' base R and the default packages (`tools`, `utils`, `stats`,
#' etc.) as smooth as possible, set the `use_base_rules` argument to
#' `TRUE` and your resulting \file{.po}/\file{.pot}/\file{.mo} file will
#' match base's.
#'
#' @section Custom translation functions:
#'
#' `base` R provides several functions for messaging that are natively equipped
#' for translation (they all have a `domain` argument): `stop()`, `warning()`,
#' `message()`, `gettext()`, `gettextf()`, `ngettext()`, and
#' `packageStartupMessage()`.
#'
#' While handy, some developers may prefer to write their own functions, or to
#' write wrappers of the provided functions that provide some enhanced
#' functionality (e.g., templating or automatic wrapping). In this case,
#' the default R tooling for translation (`xgettext()`, `xngettext()`
#' `xgettext2pot()`, `update_pkg_po()` from `tools`) will not work, but
#' `translate_package()` and its workhorse `get_message_data()` provide an
#' interface to continue building translations for your workflow.
#'
#' Suppose you wrote a function `stopf()` that is a wrapper of
#' `stop(gettextf())` used to build templated error messages in R, which makes
#' translation easier for translators (see below), e.g.:
#'
#' ```R
#' stopf = function(fmt, ..., domain = NULL) {
#' stop(gettextf(fmt, ...), domain = domain, call. = FALSE)
#' }
#' ```
#'
#' Note that `potools` itself uses just such a wrapper internally to build
#' error messages! To extract strings from calls in your package to `stopf()`
#' and mark them for translation, use the argument
#' `custom_translation_functions`:
#'
#' ```R
#' get_message_data(
#' '/path/to/my_package',
#' custom_translation_functions = list(R = 'stopf:fmt|1')
#' )
#' ```
#'
#' This invocation tells `get_message_data()` to look for strings in the
#' `fmt` argument in calls to `stopf()`. `1` indicates that `fmt` is the
#' first argument.
#'
#' This interface is inspired by the `--keyword` argument to the
#' `xgettext` command-line tool. This argument consists of a list with two
#' components, `R` and `src` (either can be excluded), owing to
#' differences between R and C/C++. Both components, if present, should consist
#' of a character vector.
#'
#' For R, there are two types of input: one for named arguments, the other for
#' unnamed arguments.
#'
#' * Entries for __named__ arguments will look like `"fname:arg|num"` (singular
#' string) or `"fname:arg1|num1,arg2|num2"` (plural string). `fname`
#' gives the name of the function/call to be extracted from the R source,
#' `arg`/`arg1`/`arg2` specify the name of the argument to
#' `fname` from which strings should be extracted, and
#' `num`/`num1`/`num2` specify the *order* of the named
#' argument within the signature of `fname`.
#'
#' * Entries for __unnamed__ arguments will look like
#' `"fname:...\xarg1,...,xargn"`, i.e., `fname`, followed by
#' `:`, followed by `...` (three dots), followed by a backslash
#' (`\`), followed by a comma-separated list of argument names. All
#' strings within calls to `fname` *except* those supplied to the
#' arguments named among `xarg1`, ..., `xargn` will be extracted.
#'
#' To clarify, consider the how we would (redundantly) specify
#' `custom_translation_functions` for some of the default messagers,
#' `gettext`, `gettextf`, and `ngettext`:
#' `custom_translation_functions = list(R = c("gettext:...\domain",
#' "gettextf:fmt|1", "ngettext:msg1|2,msg2|3"))`.
#'
#' For src, there is only one type of input, which looks like
#' `"fname:num"`, which says to look at the `num` argument of calls
#' to `fname` for `char` arrays.
#'
#' Note that there is a difference in how translation works for src vs. R -- in
#' R, all strings passed to certain functions are considered marked for
#' translations, but in src, all translatable strings must be explicitly marked
#' as such. So for `src` translations, `custom_translation_functions`
#' is not used to customize which strings are marked for translation, but
#' rather, to expand the set of calls which are searched for potentially
#' *untranslated* arrays (i.e., arrays passed to the specified calls that
#' are not explicitly marked for translation). These can then be reported in
#' the [check_untranslated_src()] diagnostic, for example.
#'
#' @section Diagnostics:
#'
#' ### Cracked messages
#'
#' A cracked message is one like:
#'
#' ```r
#' stop("There are ", n, " good things and ", m, " bad things.")
#' ```
#'
#' In its current state, translators will be asked to translate three messages
#' independently:
#'
#' - "There are"
#' - "good things and"
#' - "bad things."
#'
#' The message has been cracked; it might not be possible to translate a string
#' as generic as "There are" into many languages -- context is key!
#'
#' To keep the context, the error message should instead be build with
#' `gettextf` like so:
#'
#' ``` r
#' stop(domain=NA, gettextf("There are %d good things and %d bad things."))
#' ```
#'
#' Now there is only one string to translate! Note that this also allows the
#' translator to change the word order as they see fit -- for example, in
#' Japanese, the grammatical order usually puts the verb last (where in
#' English it usually comes right after the subject).
#'
#' `translate_package` detects such cracked messages and suggests a
#' `gettextf`-based approach to fix them.
#'
#' ### Untranslated R messages produced by `cat()`
#'
#' Only strings which are passed to certain `base` functions are eligible for
#' translation, namely `stop`, `warning`, `message`, `packageStartupMessage`,
#' `gettext`, `gettextf`, and `ngettext` (all of which have a `domain` argument
#' that is key for translation).
#'
#' However, it is common to also produce some user-facing messages using
#' `cat` -- if your package does so, it must first use `gettext` or `gettextf`
#' to translate the message before sending it to the user with `cat`.
#'
#' `translate_package` detects strings produced with `cat` and suggests a
#' `gettext`- or `gettextf`-based fix.
#'
#' ### Untranslated C/C++ messages
#'
#' This diagnostic detects any literal `char` arrays provided to common
#' messaging functions in C/C++, namely `ngettext()`, `Rprintf()`, `REprintf()`,
#' `Rvprintf()`, `REvprintf()`, `R_ShowMessage()`, `R_Suicide()`, `warning()`,
#' `Rf_warning()`, `error()`, `Rf_error()`, `dgettext()`, and `snprintf()`.
#' To actually translate these strings, pass them through the translation
#' macro `_`.
#'
#' NB: Translation in C/C++ requires some additional `#include`s and
#' declarations, including defining the `_` macro.
#' See the Internationalization section of Writing R Extensions for details.
#'
#' @section Custom diagnostics:
#'
#' A diagnostic is a function which takes as input a `data.table`
#' summarizing the translatable strings in a package (e.g. as generated by
#' [get_message_data()]), evaluates whether these messages are
#' "healthy" in some sense, and produces a digest of "unhealthy" strings and
#' (optionally) suggested replacements.
#'
#' The diagnostic function must have an attribute named `diagnostic_tag`
#' that describes what the diagnostic does; it is reproduced in the format
#' \code{Found {nrow(result)} {diagnostic_tag}:}. For example,
#' [check_untranslated_cat()] has `diagnostic_tag =
#' "untranslated messaging calls passed through cat()"`.
#'
#' The output diagnostic result has the following schema:
#'
#' * `call`: `character`, the call identified as problematic
#' * `file`: `character`, the file where `call` was found
#' * `line_number`: `integer`, the line in `file` where `call` was found
#' * `replacement`: `character`, *optional*, a suggested fix to make the call
#' "healthy"
#'
#' See [check_cracked_messages()],
#' [check_untranslated_cat()], and
#' [check_untranslated_src()] for examples of diagnostics.
#'
#' @param dir Character, default the present directory; a directory in which an
#' R package is stored.
#' @param languages Character vector; locale codes to which to translate.
#' Must be a valid language accepted by gettext. This almost always takes
#' the form of (1) an ISO 639 2-letter language code; or (2) `ll_CC`, where
#' `ll` is an ISO 639 2-letter language code and `CC` is an ISO 3166 2-letter
#' country code e.g. `es` for Spanish, `es_AR` for Argentinian Spanish, `ro`
#' for Romanian, etc. See [base::Sys.getlocale()] for some helpful tips
#' about how to tell which locales are currently available on your machine, and
#' see the References below for some web resources listing more locales.
#' @param diagnostics A `list` of diagnostic functions to be run on the
#' package's message data. See Details.
#' @param custom_translation_functions A `list` with either/both of two
#' components, `R` and `src`, together governing how to extract any
#' non-standard strings from the package. See Details.
#' @param max_translations Numeric; used for setting a cap on the number of
#' translations to be done for each language. Defaults to `Inf`, meaning
#' all messages in the package.
#' @param use_base_rules Logical; Should internal behavior match base behavior
#' as strictly as possible? `TRUE` if being run on a base package (i.e.,
#' `base` or one of the default packages like `utils`,
#' `graphics`, etc.). See Details.
#' @param copyright Character; passed on to [write_po_file()].
#' @param bugs Character; passed on to [write_po_file()].
#' @param verbose Logical, default `TRUE` (except during testing). Should
#' extra information about progress, etc. be reported?
#' @return This function returns nothing invisibly. As a side effect, a
#' \file{.pot} file is written to the package's \file{po} directory (updated if
#' one does not yet exist, or created from scratch otherwise), and a \file{.po}
#' file is written in the same directory for each element of `languages`.
#' @author Michael Chirico
#' @seealso [get_message_data()], [write_po_file()],
#' [tools::xgettext()], [tools::update_pkg_po()],
#' [tools::checkPoFile()], [base::gettext()]
#' @references
#' <https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Internationalization>
#' \cr
#' <https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Internationalization>
#' \cr
#' <https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Internationalization-in-the-R-sources>
#' \cr <https://developer.r-project.org/Translations30.html> \cr
#' <https://web.archive.org/web/20230108213934/https://www.isi-web.org/resources/glossary-of-statistical-terms> \cr
#' <https://www.gnu.org/software/gettext/> \cr
#' <https://www.gnu.org/software/gettext/manual/html_node/Usual-Language-Codes.html#Usual-Language-Codes>
#' \cr
#' <https://www.gnu.org/software/gettext/manual/html_node/Country-Codes.html#Country-Codes>
#' \cr <https://www.stats.ox.ac.uk/pub/Rtools/goodies/gettext-tools.zip>
#' \cr <https://saimana.com/list-of-country-locale-code/>
#' @examples
#'
#' pkg <- system.file('pkg', package = 'potools')
#' # copy to a temporary location to be able to read/write/update below
#' tmp_pkg <- file.path(tempdir(), "pkg")
#' dir.create(tmp_pkg)
#' file.copy(pkg, dirname(tmp_pkg), recursive = TRUE)
#'
#' # run translate_package() without any languages
#' # this will generate a .pot template file and en@quot translations (in UTF-8 locales)
#' # we can also pass empty 'diagnostics' to skip the diagnostic step
#' # (skip if gettext isn't available to avoid an error)
#' if (isTRUE(check_potools_sys_reqs)) {
#' translate_package(tmp_pkg, diagnostics = NULL)
#' }
#'
#' \dontrun{
#' # launches the interactive translation dialog for translations into Estonian:
#' translate_package(tmp_pkg, "et_EE", diagnostics = NULL, verbose = TRUE)
#' }
#'
#' # cleanup
#' unlink(tmp_pkg, recursive = TRUE)
#' rm(pkg, tmp_pkg)
#'
#' @export
translate_package = function(
dir = '.',
languages = NULL,
diagnostics = list(check_cracked_messages, check_untranslated_cat, check_untranslated_src),
custom_translation_functions = list(R = NULL, src = NULL),
max_translations = Inf,
use_base_rules = package %chin% .potools$base_package_names,
copyright = NULL,
bugs = '',
verbose = !is_testing()
) {
result <- check_potools_sys_reqs()
if (!isTRUE(result)) stop(result) # nocov
stopifnot(
'Only one package at a time' = length(dir) == 1L,
"'dir' must be a character" = is.character(dir),
"'languages' must be a character vector" = is.null(languages) || is.character(languages),
"'diagnostics' should be empty, a function, or list of functions" =
is.null(diagnostics)
|| is.function(diagnostics)
|| (is.list(diagnostics) && all(vapply(diagnostics, is.function, logical(1L))))
)
# en-list singleton diagnostic for convenience
if (is.function(diagnostics)) diagnostics = list(diagnostics)
dir = get_directory(dir)
desc_data = get_desc_data(dir)
package <- desc_data['Package']
is_base <- package == 'base'
version <- desc_data['Version']
po_dir <- file.path(dir, 'po')
r_potfile <- file.path(po_dir, glue("R-{package}.pot"))
src_potfile <- file.path(po_dir, sprintf("%s.pot", if (is_base) 'R' else package))
update = file.exists(po_dir) && (file.exists(r_potfile) || file.exists(src_potfile))
if (verbose) {
if (update) {
# is it worthwhile to try and distinguish the creation time of the
# R pot file and the src pot file? probably not...
messagef(
"Updating translation template for package '%s' (last updated %s)",
package,
format(file.info(r_potfile)$atime)
)
} else {
messagef("Starting translations for package '%s'", package)
}
}
if (!update) dir.create(po_dir, showWarnings = FALSE)
message_data = get_message_data(dir, custom_translation_functions, verbose=verbose)
if (!nrow(message_data)) {
if (verbose) message('No messages to translate; finishing')
return(invisible())
}
# for testing, we need a connection that stays open so that readLines(n=1L)
# reads successive lines. originally tried passing a test connection as
# an argument to prompt(), but that closed the connection each time -->
# only the first line is ever read.
set_prompt_conn()
on.exit(unset_prompt_conn())
if (verbose) message('Running message diagnostics...')
for (diagnostic in diagnostics) {
diagnostic <- match.fun(diagnostic)
result <- diagnostic(message_data)
if (!nrow(result)) next
show_diagnostic_results(result, diagnostic)
responded_yes = tolower(prompt('Exit now to repair any of these? [y/N]')) %chin% c('y', 'yes')
# length() required for use in batch mode
if (length(responded_yes) && responded_yes) return(invisible())
}
po_params = list(package = package, version = version, copyright = copyright, bugs = bugs)
write_po_files(message_data, po_dir, po_params, template = TRUE, use_base_rules = use_base_rules)
if (l10n_info()[["UTF-8"]]) {
# on UTF-8 machines we install the en@quot messages too
# TODO: streamline this -- en_quote is definitely doing some redundant stuff
message('Generating en@quot translations')
update_en_quot_mo_files(dir, verbose)
}
if (is.null(languages)) {
if (verbose) message('No languages provided; finishing')
return(invisible())
}
for (language in languages) {
metadata = get_lang_metadata(language)
po_params$language = language
# overwrite any existing translations written in previous translation.
# set blank initially (rather than deleting the column) to allow
# for interrupting the translation -- if unset, write_po_files will
# fail if both these columns are not yet present.
message_data[type == 'singular', 'msgstr' := ""]
message_data[type == 'plural', 'msgstr_plural' := .(list(rep("", metadata$nplurals)))]
lang_file <- file.path(po_dir, glue("R-{language}.po"))
if (update && file.exists(lang_file)) {
if (verbose) {
messagef(
'Found existing R translations for %s (%s/%s) in %s. Running msgmerge...',
language, metadata$full_name_eng, metadata$full_name_native, lang_file
)
}
run_msgmerge(lang_file, r_potfile)
find_fuzzy_messages(message_data, lang_file)
} else {
message_data[message_source == "R", 'fuzzy' := 0L]
}
lang_file <- file.path(po_dir, glue("{language}.po"))
if (update && file.exists(lang_file)) {
if (verbose) {
messagef(
'Found existing src translations for %s (%s/%s) in %s. Running msgmerge...',
language, metadata$full_name_eng, metadata$full_name_native, lang_file
)
}
run_msgmerge(lang_file, src_potfile)
find_fuzzy_messages(message_data, lang_file)
} else {
message_data[message_source == "src", 'fuzzy' := 0L]
}
new_idx = message_data[
is_marked_for_translation & (
fuzzy == 1L
| (type == 'singular' & !nzchar(msgstr) & nzchar(msgid, keepNA = TRUE))
| (type == 'plural' & !vapply(msgstr_plural, function(x) all(nzchar(x)), logical(1L)))
),
which = TRUE
]
if (!length(new_idx)) {
if (verbose) messagef('Translations for %s are up to date! Skipping.', language)
next
}
if (verbose) {
messagef(
'Beginning new translations for %s (%s/%s); found %d untranslated messages',
language, metadata$full_name_eng, metadata$full_name_native, length(new_idx)
)
message("(To quit translating, press 'Esc'; progress will be saved)")
}
po_params$author = prompt('Thanks! Who should be credited with these translations?')
po_params$email = prompt('And what is their email?')
# on.exit this to allow ESC to quit mid-translation. the intent is for the
# on.exit command to be overwritten on each iteration over languages --
# only one partially-finished language should be written at a time.
INCOMPLETE = TRUE
on.exit({
if (INCOMPLETE) write_po_files(message_data, po_dir, po_params, use_base_rules = use_base_rules) # nocov
# since add=FALSE, we overwrite the above call; duplicate it here
unset_prompt_conn()
})
if (verbose) {
# nolint start: line_length_linter.
message(
"***************************\n",
"** BEGINNING TRANSLATION **\n",
"***************************\n\n",
"Some helpful reminders:\n",
" * You can skip a translation by entering nothing (just press RETURN)\n",
" * Special characters (like newlines, \\n, or tabs, \\t) should be written just like that (with an escape)\n",
" * Be sure to match message templates. The count of templates (%s, %d, etc.) must match in all languages, as must initial and terminal newlines (\\n)\n",
" * While the count of templates must match, the _order_ can be changed by using e.g. %2$s to mean 'use the second input as a string here'\n",
" * Whenever templates or escaping is happening in a string, these will be 'highlighted' by carets (^) in the line below"
)
# nolint end: line_length_linter.
}
# NB: loop over rows to facilitate quitting without losing progress
for (ii in head(new_idx, max_translations)) {
if (message_data$type[ii] == 'plural') {
translation = read_translation(
message_data$msgid_plural[[ii]][1L],
'plural',
message_data$file[ii],
message_data$call[ii],
message_data$fuzzy[ii],
message_data$msgstr_plural[[ii]],
metadata
)
set(message_data, ii, 'msgstr_plural', list(translation))
} else if (!message_data$is_repeat[ii]) {
translation = read_translation(
message_data$msgid[ii],
'singular',
message_data$file[ii],
message_data$call[ii],
message_data$fuzzy[ii],
message_data$msgstr[ii],
metadata
)
set(message_data, ii, 'msgstr', translation)
}
}
# set INCOMPLETE after write_po_files for the event of a process interruption
# between the loop finishing and the write_po_files command executing
write_po_files(message_data, po_dir, po_params, use_base_rules = use_base_rules)
INCOMPLETE = FALSE
}
# TODO: reinstate source marker tags, at least for src .pot file & maybe for R .pot file too?
po_compile(dir, package, verbose = verbose)
return(invisible())
}
# take from those present in r-devel:
# ls -1 ~/svn/R-devel/src/library/*/po/*.po | \
# awk -F"[./]" '{print $10}' | \
# sed -r 's/^R(Gui)?-//g' | sort -u | \
# awk '{print " ", $1, " = ,"}'
# alternatively, a more complete list can be found on some websites:
# https://saimana.com/list-of-country-locale-code/
# nplurals,plural info from https://l10n.gnome.org/teams/<language>
# NB: looks may be deceiving for right-to-left scripts (e.g. Farsi), where the
# displayed below might not be in the order it is parsed.
# assign to .potools, not a package env, to keep more readily mutable inside update_metadata()
.potools$KNOWN_LANGUAGES = fread(system.file('extdata', 'language_metadata.csv', package='potools'), key='code')
# the 'plural' column above is designed for computers;
# translate that to something human-legible here.
# NB: 'plural' is 0-based (like in the .po file), but
# 'plural_index' is 1-based (to match the above R-level code).
# assign to .potools, not a package env, to keep more readily mutable inside update_metadata()
.potools$PLURAL_RANGE_STRINGS = fread(
system.file('extdata', 'plurals_metadata.csv', package='potools'),
key = c('plural', 'plural_index')
)
# for testing; unexported
# nocov start
reset_language_metadata = function() {
.potools$KNOWN_LANGUAGES = fread(
system.file('extdata', 'language_metadata.csv', package='potools'),
key='code'
)
.potools$PLURAL_RANGE_STRINGS = fread(
system.file('extdata', 'plurals_metadata.csv', package='potools'),
key = c('plural', 'plural_index')
)
}
# nocov end
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.