splitremove: Split Remove

View source: R/splitcomma.R

splitremoveR Documentation

Split Remove

Description

This function removes characters from a string based on a character vector named remove. This function can be used to remove prefixes, suffixes, titles, etc. from a given character vector. The function splits the string by empty spaces, dots, commas, and parentheses first & then it removes the items that are in the remove vector.

Usage

splitremove(string, remove)

Arguments

string

character vector that contains the text to keep and to remove

remove

character vector that contains the characters to remove from the string

Value

the revised character vector with the contents of remove removed from the string

Author(s)

Irucka Embry

Source

regex - r regexp - replace title and suffix in any part of string with nothing in large file (> 2 million rows) - Stack Overflow answered by Molx on Apr 16 2015. See https://stackoverflow.com/questions/29680131/r-regexp-replace-title-and-suffix-in-any-part-of-string-with-nothing-in-large.

Examples


# Example

install.load::load_package("iemisc", "data.table")

# create the list of items to remove from the text
remove <- c("mister", "sir", "mr", "madam", "mrs", "miss", "ms", "iv",
"iii", "ii", "jr", "sr", "md", "phd", "mba", "pe", "mrcp", "and", "&", "prof",
"professor", "esquire", "esq", "dr", "doctor")

names <- data.table(Named = c("Alfredy 'Chipp' Kahner IV",
"Denis G. Barnekdt III", "JERUEG, RICHARDS Z. MR.", "EDWARDST, HOWARDD K. JR."))

# first use split comma
names[, Corrected_Named := splitcomma(names$Named)]

names

names[, Corrected_Named := splitremove(names$Corrected_Named, remove)]

names







iemisc documentation built on Sept. 25, 2023, 5:09 p.m.