prepTDWG: Format People's Name

View source: R/prepTDWG.R

prepTDWGR Documentation

Format People's Name

Description

Convert people's names to different formats (i.e. Last Name, First Names(s) or First Name(s), Last Name) with or without last name prepositions. The default name format is the one suggested in the Biodiversity Information Standards (TDWG) format.

Usage

prepTDWG(
  x,
  sep = ", ",
  format = "last_init",
  pretty = TRUE,
  get.prep = FALSE,
  get.initials = TRUE,
  max.initials = 4
)

Arguments

x

the character string or vector containing the names.

sep

character. Input and output name separator. Default to ", ".

format

character. Output name format. The default is "last_init".

pretty

logical. Should the output name be returned in a pretty presentation (i.e. only the first letter of names capitalized, initials separated by points and no spaces, and family name prepositions in lower cases). Default to TRUE. If FALSE, names are returned in the same way as the input object x.

get.prep

logical. Should last name prepositions be included? Default to FALSE.

get.initials

logical. Should the first name(s) be abbreviated? Default to TRUE.

max.initials

numerical. Upper limit of number of letter for a single word to be considered as initials and not as a first name. Default to 4.

Details

The default name format follows the one suggested by the TDWG, which is: Last name, followed by a comma and then the initials, separated by points (e.g. Hatschbach, G.G.).

The functions uses internally another plantR function: lastName(). So, it assumes that people last names are the ones provided at the end of the name string or preceding the name separator (i.e. comma), if present.

The function deals with simples last names, as well as with compound last names and last names with common name prefixes or prepositions (e.g. de, dos, van, ter, ...). By default, these prefixes and prepositions are removed, but they can be returned if the argument get.prep is set to TRUE.

The function assumes that all names containing separators (default to a comma) are in the format suggested by TDWG. But even for those cases, the function fixes simple problems (e.g. missing points between name initials).

If only one name is given, the function return the same name with the first letter capitalized.

The function output it is relatively stable regarding the input format, lower/uppercasing and spacing. But if the name provided has unusual formatting or if names for multiple people are provided within the same string, the function may not work properly. So, the output may depend on the input format and some level of double-checking may be necessary. See examples below.

Value

The character string x in the standardized format.

Author(s)

Renato A. F. de Lima

References

Conn, Barry J. (ed.) (1996). HISPID 3 - Herbarium Information Standards and Protocols for Interchange of Data. Herbarium Information Systems Committee' (HISCOM). https://www.tdwg.org/standards/hispid3/

Willemse, L.P., van Welzen, P.C. & Mols, J.B. (2008). Standardisation in data-entry across databases: Avoiding Babylonian confusion. Taxon 57(2): 343-345.

See Also

lastName, getPrep and getInit.

Examples

  # Single names
  prepTDWG("gentry")
  prepTDWG("GENTRY")

  # Simple names
  prepTDWG("Alwyn Howard Gentry")
  prepTDWG("Alwyn H. Gentry")
  prepTDWG("A.H. Gentry")
  prepTDWG("A H Gentry")
  prepTDWG("Gentry, Alwyn Howard")
  prepTDWG("Gentry, AH")
  prepTDWG("Gentry AH")
  prepTDWG("GENTRY, A H")
  prepTDWG("gentry, alwyn howard")
  prepTDWG("gentry, a.h.")
  prepTDWG("gentry, a. h.")

  # Name with prepositions
  prepTDWG("Carl F. P. von Martius")
  prepTDWG("Carl F. P. von Martius", get.prep = TRUE)

  # Names with generational suffixes
  prepTDWG("Hermogenes de Freitas Leitao Filho")
  prepTDWG("H.F. Leitao Filho")
  prepTDWG("Leitao Filho, HF")
  prepTDWG("Leitao filho, H. F.")

  # Compound last name
  prepTDWG("Augustin Saint-Hilaire")
  prepTDWG("A. Saint-Hilaire")
  prepTDWG("Saint-Hilaire, Augustin")

  # Other formats
  prepTDWG("John MacDonald")
  prepTDWG("John McDonald")
  prepTDWG("John O'Brien")

  # Multiple names, different settings
  names <- c("Gentry, AH", "Gentry A.H.",
  "Carl F. P. von Martius","Leitao filho, H. de F.",
  "Auguste de Saint-Hilaire", "John O'Reilly")
  prepTDWG(names)
  prepTDWG(names, format = "init_last")
  prepTDWG(names, format = "init_last", get.prep = TRUE)
  prepTDWG(names, get.prep = TRUE, format = "prep_last_init")
  prepTDWG(names, get.prep = TRUE, format = "prep_last_init",
           get.initials = FALSE)
  prepTDWG(names, get.prep = TRUE, pretty = FALSE,
           get.initials = FALSE)

  ## Unusual formatting (function won't work always...)
  # two or more people names: output incorrect (combine names of authors)
  prepTDWG("C. Mendonca Filho; F. da Silva")
  # two or more names, separated by comma: output incorrect (combine names of authors)
  prepTDWG("A. Alvarez, A. Zamora & V. Huaraca")
  # one name, two commas: fails to get all names
  prepTDWG("Cesar Sandro, Esteves, F")
  #' one name, abbreviations in the start and end: fails to get all names
  prepTDWG("C.S. Esteves F.")


LimaRAF/plantR documentation built on Jan. 1, 2023, 10:18 a.m.