testCharSystem: testCharSystem function
In kkalininMI/HooverArchives: HooverArchives

Description Usage Arguments Value Examples

View source: R/testCharSystem.R

This function helps to detect characters originated from different Unicode blocks.

1	testCharSystem(dat, addCharSys = NULL, markword = TRUE, autochange = FALSE)

`dat`	data vector
`addCharSys`	the list of character blocks. If not defined, c("Latin", "Cyrillic") are used.
`markword`	if TRUE (default), detect the word containing anomalous character and mark it. If FALSE, detect an anomalous character within a word and mark that character.
`autochange`	if TRUE change characters based on the proposed coding rules in "charcodescheme.csv" (for more than one character "\|" separator appears, for an unknown character that character is replaced with "?").

Returns an altered data vector with anomalous words/characters/replacements surrounded by asterisks (*).

library(HooverArchives)
library(stringi)

dat_vectorR <- c("\u0418\u043D\u0444\u043E\u0440\u043C\u0061\u0446\u0438\u044F", "\u0410\u0440\u0078\u0438\u0432\u044B")
dat_vector <- stri_unescape_unicode(dat_vectorR)

# Mark the word
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=FALSE, markword=TRUE)

# Mark anamolous character
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=FALSE, markword=FALSE)

# Replace anamolous character with correct character and mark it
testCharSystem(dat_vector, addCharSys=c("Latin", "Cyrillic"), autochange=TRUE, markword=FALSE)