replace_non_ascii: Replace Common Non-ASCII Characters

Description Usage Arguments Value Examples

Description

replace_non_ascii - Replaces common non-ASCII characters.

place_non_ascii2 - Replaces all non-ASCII (defined as '[^ -~]+'). This provides a subset of functionality found in replace_non_ascii that is faster and likely less accurate.

replace_curly_quote - Replaces curly single and double quotes. This provides a subset of functionality found in replace_non_ascii specific to quotes.

Usage

1
2
3
4
5
replace_non_ascii(x, replacement = "", remove.nonconverted = TRUE, ...)

replace_non_ascii2(x, replacement = "", ...)

replace_curly_quote(x, ...)

Arguments

x

The text variable.

replacement

Character string equal in length to pattern or of length one which are a replacement for matched pattern.

remove.nonconverted

logical. If TRUE unmapped encodings are deleted from the string.

...

ignored.

Value

Returns a text variable (character sting) with non-ASCII characters replaced.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
x <- c(
    "Hello World", "6 Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher",
    'This is a \xA9 but not a \xAE', '6 \xF7 2 = 3', 
    'fractions \xBC, \xBD, \xBE', 'cows go \xB5', '30\xA2'
)
Encoding(x) <- "latin1"
x

replace_non_ascii(x)
replace_non_ascii(x, remove.nonconverted = FALSE)

z <- '\x95He said, \x93Gross, I am going to!\x94'
Encoding(z) <- "latin1"
z

replace_curly_quote(z)
replace_non_ascii(z)

trinker/textmod documentation built on Nov. 3, 2021, 7:20 p.m.