about_locale | R Documentation |
In this section we explain how we specify locales in stringi. Locale is a fundamental concept in ICU. It identifies a specific user community, i.e., a group of users who have similar culture and language expectations for human-computer interaction.
Because a locale is just an identifier of a region, no validity check is performed when you specify a Locale. ICU is implemented as a set of services. If you want to verify whether particular resources are available in the locale you asked for, you must query those resources. Note: when you ask for a resource for a particular locale, you get back the best available match, not necessarily precisely the one you requested.
ICU services are parametrized by locale,
to deliver culturally correct results.
Locales are identified by character strings
of the form Language
code,
Language_Country
code, or Language_Country_Variant
code, e.g., 'en_US'.
The two-letter Language
code uses the ISO-639-1 standard,
e.g., 'en' stands for English, 'pl' – Polish, 'fr' – French,
and 'de' for German.
Country
is a two-letter code following the ISO-3166 standard.
This is to reflect different language conventions within the same language,
for example in US-English ('en_US') and Australian-English ('en_AU').
Differences may also appear in language conventions used within
the same country. For example, the Euro currency may be used in several European
countries while the individual country's currency is still in circulation.
In such a case, ICU Variant
'_EURO' could be used for selecting
locales that support the Euro currency.
The final (optional) element of a locale is a list of
keywords together with their values. Keywords must be unique.
Their order is not significant. Unknown keywords are ignored.
The handling of keywords depends on the specific services that
utilize them. Currently, the following keywords are recognized:
calendar
, collation
, currency
, and numbers
,
e.g., fr@collation=phonebook;
calendar=islamic-civil
is a valid
French locale specifier together with keyword arguments. For
more information, refer to the ICU user guide.
For a list of locales that are recognized by ICU,
call stri_locale_list
.
Note that in stringi, 'C' is a synonym of 'en_US_POSIX'.
Each locale-sensitive function in stringi
selects the current default locale if an empty string or NULL
is provided as its locale
argument. Default locales are available
to all the functions; initially, the system locale on that platform is used,
but it may be changed by calling stri_locale_set
.
Your program should avoid changing the default locale.
All locale-sensitive functions may request
any desired locale per-call (by specifying the locale
argument),
i.e., without referencing to the default locale.
During many tests, however, we did not observe any improper
behavior of stringi while using a modified default locale.
One of many examples of locale-dependent services is the Collator, which
performs a locale-aware string comparison. It is used for string comparing,
ordering, sorting, and searching. See stri_opts_collator
for the description on how to tune its settings, and its locale
argument in particular.
When choosing a resource bundle that is not available in the explicitly requested locale (but not when using the default locale) nor in its more general variants (e.g., 'es_ES' vs 'es'), a warning is emitted.
Other locale-sensitive functions include, e.g.,
stri_trans_tolower
(that does character case mapping).
Marek Gagolewski and other contributors
Locale – ICU User Guide, https://unicode-org.github.io/icu/userguide/locale/
ISO 639: Language Codes, https://www.iso.org/iso-639-language-codes.html
ISO 3166: Country Codes, https://www.iso.org/iso-3166-country-codes.html
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v103.i02")}
Other locale_management:
stri_locale_info()
,
stri_locale_list()
,
stri_locale_set()
Other locale_sensitive:
%s<%()
,
about_search_boundaries
,
about_search_coll
,
stri_compare()
,
stri_count_boundaries()
,
stri_duplicated()
,
stri_enc_detect2()
,
stri_extract_all_boundaries()
,
stri_locate_all_boundaries()
,
stri_opts_collator()
,
stri_order()
,
stri_rank()
,
stri_sort_key()
,
stri_sort()
,
stri_split_boundaries()
,
stri_trans_tolower()
,
stri_unique()
,
stri_wrap()
Other stringi_general_topics:
about_arguments
,
about_encoding
,
about_search_boundaries
,
about_search_charclass
,
about_search_coll
,
about_search_fixed
,
about_search_regex
,
about_search
,
about_stringi
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.