stri_trans_general: General Text Transforms, Including Transliteration
In stringi: Fast and Portable Character String Processing Facilities

stri_trans_general

R Documentation

General Text Transforms, Including Transliteration

Description

ICU General transforms provide different ways for processing Unicode text. They are useful in handling a variety of different tasks, including:

locale-independent upper case, lower case, title case, full/halfwidth conversions,
normalization,
hex and character name conversions,
script to script conversion/transliteration.

Usage

stri_trans_general(str, id, rules = FALSE, forward = TRUE)

Arguments

`str`	character vector
`id`	a single string with transform identifier, see `stri_trans_list`, or custom transliteration rules
`rules`	if `TRUE`, treat `id` as a string with semicolon-separated transliteration rules (see the ICU manual);
`forward`	transliteration direction (`TRUE` for forward, `FALSE` for reverse)

Details

ICU Transforms were mainly designed to transliterate characters from one script to another (for example, from Greek to Latin, or Japanese Katakana to Latin). However, these services are also capable of handling a much broader range of tasks. In particular, the Transforms include prebuilt transformations for case conversions, for normalization conversions, for the removal of given characters, and also for a variety of language and script transliterations. Transforms can be chained together to perform a series of operations and each step of the process can use a UnicodeSet to restrict the characters that are affected.

To get the list of available transforms, call stri_trans_list.

Note that transliterators are often combined in sequence to achieve a desired transformation. This is analogous to the composition of mathematical functions. For example, given a script that converts lowercase ASCII characters from Latin script to Katakana script, it is convenient to first (1) separate input base characters and accents, and then (2) convert uppercase to lowercase. To achieve this, a compound transform can be specified as follows: NFKD; Lower; Latin-Katakana; (with the default rules=FALSE).

Custom rule-based transliteration is also supported, see the ICU manual and below for some examples.

Transliteration is not dependent on the current locale.

Value

Returns a character vector.

Author(s)

Marek Gagolewski and other contributors

References

General Transforms – ICU User Guide, https://unicode-org.github.io/icu/userguide/transforms/general/

Examples

stri_trans_general('gro\u00df', 'latin-ascii')
stri_trans_general('stringi', 'latin-greek')
stri_trans_general('stringi', 'latin-cyrillic')
stri_trans_general('stringi', 'upper') # see stri_trans_toupper
stri_trans_general('\u0104', 'nfd; lower') # compound id; see stri_trans_nfd
stri_trans_general('Marek G\u0105golewski', 'pl-pl_FONIPA')
stri_trans_general('\u2620', 'any-name') # character name
stri_trans_general('\\N{latin small letter a}', 'name-any') # decode name
stri_trans_general('\u2620', 'hex/c') # to hex
stri_trans_general("\u201C\u2026\u201D \u0105\u015B\u0107\u017C",
    "NFKD; NFC; [^\\p{L}] latin-ascii")

x <- "\uC885\uB85C\uAD6C \uC0AC\uC9C1\uB3D9"
stringi::stri_trans_general(x, "Hangul-Latin")
# Deviate from the ICU rules of romanisation of Korean,
# see https://en.wikipedia.org/wiki/Romanization_of_Korean
id <- "
    :: NFD;
    \u11A8 > k;
    \u11AE > t;
    \u11B8 > p;
    \u1105 > r;
    :: Hangul-Latin;
"
stringi::stri_trans_general(x, id, rules=TRUE)

stringi documentation built on May 29, 2024, 8:16 a.m.

stringi index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stringi
Fast and Portable Character String Processing Facilities

stri_trans_general: General Text Transforms, Including Transliteration
In stringi: Fast and Portable Character String Processing Facilities

General Text Transforms, Including Transliteration

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to stri_trans_general in stringi...

R Package Documentation

Browse R Packages

We want your feedback!

stringi Fast and Portable Character String Processing Facilities

stri_trans_general: General Text Transforms, Including Transliteration In stringi: Fast and Portable Character String Processing Facilities

General Text Transforms, Including Transliteration

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to stri_trans_general in stringi...

R Package Documentation

Browse R Packages

We want your feedback!

stringi
Fast and Portable Character String Processing Facilities

stri_trans_general: General Text Transforms, Including Transliteration
In stringi: Fast and Portable Character String Processing Facilities