Description Usage Arguments Value Examples
View source: R/testCharSystem.R
This function helps to detect characters originated from different Unicode blocks.
1 | testCharSystem(dat, addCharSys = NULL, markword = TRUE, autochange = FALSE)
|
dat |
data vector |
addCharSys |
the list of character blocks. If not defined, c("Latin", "Cyrillic") are used. |
markword |
if TRUE (default), detect the word containing anomalous character and mark it. If FALSE, detect an anomalous character within a word and mark that character. |
autochange |
if TRUE change characters based on the proposed coding rules in "charcodescheme.csv" (for more than one character "|" separator appears, for an unknown character that character is replaced with "?"). |
Returns an altered data vector with anomalous words/characters/replacements surrounded by asterisks (*).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(HooverArchives)
library(stringi)
dat_vectorR <- c("\u0418\u043D\u0444\u043E\u0440\u043C\u0061\u0446\u0438\u044F", "\u0410\u0440\u0078\u0438\u0432\u044B")
dat_vector <- stri_unescape_unicode(dat_vectorR)
# Mark the word
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=FALSE, markword=TRUE)
# Mark anamolous character
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=FALSE, markword=FALSE)
# Replace anamolous character with correct character and mark it
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=TRUE, markword=FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.