char_tolower: Convert the case of character objects

Description Usage Arguments Examples

Description

char_tolower and char_toupper are replacements for tolower and toupper based on the stringi package. The stringi functions for case conversion are superior to the base functions because they correctly handle case conversion for Unicode. In addition, the *_tolower functions provide an option for preserving acronyms.

Usage

1
2
3
char_tolower(x, keep_acronyms = FALSE, ...)

char_toupper(x, ...)

Arguments

x

the input object whose character/tokens/feature elements will be case-converted

keep_acronyms

logical; if TRUE, do not lowercase any all-uppercase words (applies only to *_tolower functions)

...

additional arguments passed to stringi functions, (e.g. stri_trans_tolower), such as locale

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
txt1 <- c(txt1 = "b A A", txt2 = "C C a b B")
char_tolower(txt1) 
char_toupper(txt1)

# with acronym preservation
txt2 <- c(text1 = "England and France are members of NATO and UNESCO", 
          text2 = "NASA sent a rocket into space.")
char_tolower(txt2)
char_tolower(txt2, keep_acronyms = TRUE)
char_toupper(txt2)

quanteda/quanteda documentation built on June 15, 2019, 8:36 a.m.