Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/subNonStandardNames.R
sub(nonStandardNames[, 1], nonStandardNames[, 2], x)
Accented characters common in non-English languages often
get mangled in different ways by different software. For
example, the "e" in "Andre" may carry an accent that
gets replaced by other characters by different software.
This function first converts "Andr*" with "Andr_" for
and character "*" not in standardCharacters
.
It then looks for "Andr_" in nonStandardNames
.
By default, it will find that and replace it with "Andre".
1 2 3 4 5 6 7 8 9 10 | subNonStandardNames(x,
standardCharacters=c(letters, LETTERS, ' ','.', '?', '!',
',', 0:9, '/', '*', '$', '%', '\"', "\'", '-', '+', '&', '_', ';',
'(', ')', '[', ']', '\n'),
replacement='_',
gsubList=list(list(pattern='\\\\\\\\|\\\\',
replacement='\"')),
removeSecondLine=TRUE,
nonStandardNames=Ecdat::nonEnglishNames,
namesNotFound="attr.replacement", ...)
|
x |
character vector or matrix or a |
standardCharacters, replacement, gsubList, ... |
arguments passed to |
removeSecondLine |
logical: If TRUE, delete anything following "\n" and return it as an attribute "secondLine". |
nonStandardNames |
data.frame or character matrix with two columns: Replace any
substring of |
namesNotFound |
character vecvtor describing how to treat substitions not found
in
NOTE: x = "_" will be identified by "attr.replacement" but
not by "attr.notfound" assuming the default value for
|
1. removeSecondLine
2. x. <- subNonStandardCharacters(x, standardCharacters, replacement, ...)
3. Loop over all rows of nonStandardNames
substituting anything
matching nonStandardNames[i, 1]
with nonStandardNames[i,
2]
.
4. Eliminate leading and trailing blanks.
5. if(is.matrix(x)) return a matrix; if(is.data.frame(x)) return a data.frame(..., stringsAsFactors=FALSE)
NOTE: On 13 May 2013 Jeff Newmiller at the University of California, Davis, wrote, 'I think it is a fools errand to think that you can automatically "normalize" arbitrary Unicode characters to an ASCII form that everyone will agree on.' (This was a reply on r-help@r-project.org, subject: "Re: [R] Matching names with non- English characters".) Doubtless someone has software to do a better job of this than what this function does, but I've so far been unable to find it in R. If you know of a better solution to this problem, I'd be pleased to hear from you. Spencer Graves
a character vector with all nonStandardCharacters
replaced
first by replacement
and then by the second column of
nonStandardNames
for any that match the first column. If a
secondLine is found on any elements, it is returned as a "secondLine"
attribute. If any names with nonStandardCharacters are not found
in nonStandardNames[, 1]
, they are identifed in an
optional attribute per the namesNotFound
argument.
Spencer Graves
sub
nonEnglishNames
subNonStandardCharacters
stripBlanks
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | ##
## 1. Example
##
tstSNSN <- c('Raul', 'Ra`l', 'Torres,Raul', 'Torres, Ra`l',
"Robert C. \\Bobby\\\\", 'Ed \n --Vacancy', '', ' ')
# confusion in character sets can create
# names like Names[2]
##
## 2. subNonStandardNames(vector)
##
SNS2 <- subNonStandardNames(tstSNSN)
SNS2
# check
SNS2. <- c('Raul', 'Raul', 'Torres,Raul', 'Torres, Raul',
'Robert C. "Bobby"', 'Ed', '', '')
attr(SNS2., 'secondLine') <- c(rep(NA, 5), ' --Vacancy', NA, NA)
all.equal(SNS2, SNS2.)
##
## 3. subNonStandardNames(matrix)
##
tstmat <- parseName(tstSNSN, surnameFirst=TRUE)
submat <- subNonStandardNames(tstmat)
# check
SNSmat <- parseName(SNS2., surnameFirst=TRUE)
all.equal(submat, SNSmat)
##
## 4. subNonStandardNames(data.frame)
##
tstdf <- as.data.frame(tstmat)
subdf <- subNonStandardNames(tstdf)
# check
SNSdf <- as.data.frame(SNSmat, stringsAsFactors=FALSE)
all.equal(subdf, SNSdf)
##
## 5. namesNotFound
##
noSub <- subNonStandardNames('xx_x')
# check
noSub. <- 'xx_x'
attr(noSub., 'namesNotFound') <- 'xx_x'
all.equal(noSub, noSub.)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.