fixName | R Documentation |
Standardize name notation
fixName( nomes, sep.in = c(";", "&", "|", " e ", " y ", " and ", " und ", " et "), sep.out = "|", bad.comma = TRUE, special.char = FALSE )
nomes |
a character string or a vector with names. to FALSE. |
sep.in |
a vector of the symbols separating multiple names. Default to: ";", "&", "|", " e ", " y ", " and ", " und ", and " et ". |
sep.out |
a character string with the symbol separating multiple names in the output string. Defaults to "|". If a character vector of length 2 or more is supplied, the first element is used with a warning. |
bad.comma |
logical. Should the cases when source data use commas to separate last names and first names/initials, as well as multiple people's names, be isolated and (tried to be) fixed? Default to TRUE. |
special.char |
logical. Should special characters be maintained? Default to FALSE. |
The function fixes small problems in name notation (e.g. orphan spaces), standardize the separation between multiple authors and between initials and/or prepositions within the same name. It also standardize the notation of some compound names (i.e. Faria Jr. to Faria Junior). In addition, the function removes numbers, some unwanted expressions (e.g. 'et al.') and symbols (e.g. ? or !).
The function was created to deal with people's names, so input separators for multiple names composed only by letters should be surrounded by spaces. If separators are non-alphabetic characters (e.g. semi-colons, ampersand), they are taken independently of the presence of spaces nearby.
By default, commas are not within the symbols separating multiple people's
names are, because commas are often used to separate people's last names
from their first names or initials. There are cases when the name notation
uses commas to separate last names and first names/initials, as well as
multiple people's names (which is not at all encouraged). For some cases
(e.g. "M. Costa, J. Ribeiro"), but not for all of those cases (e.g. 'Costa,
M., Ribeiro, J.'), the function tries to isolate and solve the separation
between multiple people's names. But this procedure currently is very
preliminary and it may include noise in the name notation. If this is the
case, it can be skipped by setting the argument bad.comma
to FALSE.
Due to common encoding problems related to Latin characters, names are
returned without accents by default. But users can choose between outputs
with and without accents and species characters, by setting the argument
special.char
to TRUE.
The character string x
in the standard notation to facilitate
further data processing.
Renato A. F. de Lima & Hans ter Steege
names <- c("J.E.Q. Faria Jr.", "Leitão F°, H.F.", "Gert G. Hatschbach, et al.", "Karl Emrich & Balduino Rambo", '( Karl) Emrich ;(Balduino ) Rambo', "F.daS.N.Thomé", 'F. da S.N. Thomé', 'Pedro L.R.de Moraes (30/4/1998)') Encoding(names) <- "latin1" names fixName(names) fixName(names, special.char = TRUE) fixName(names, sep.out = " | ")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.