clean_strings: clean_strings

Description Usage Arguments Value Examples

Description

default string cleaning process for "name_match"

Usage

1
2
3
clean_strings(string, sp_char_words = NULL, common_words = NULL,
  remove_char = NULL, remove_words = FALSE, stem = FALSE,
  replace_null = NULL)

Arguments

string

character or character vector of strings

sp_char_words

character vector. Data.frame where first column is special characaters and second column is full words.

common_words

data.frame. Data.frame where first column is abbreviations and second column is full words.

remove_char

character vector. string of specific characters (for example, "letters") to be removed

remove_words

logical. If TRUE, removes all abbreviations and replacement words in common_words

stem

logical. If TRUE, words are stemmed

Value

cleaned strings

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# basic cleaning example
sample_str = c("Holding Co. A @ B St, 3 ))", "Company B & C 4, Inc.", "make $, inc.")
sample_clean1 = clean_strings(sample_str)

# defining common words with a package database
sample_clean2 = clean_strings(sample_str, common_words=corporate_words)

# dropping common words in a database
sample_clean3 = clean_strings(sample_str, common_words=corporate_words, remove_words=TRUE)

# sunco example
sample_clean4 = clean_strings("co cosuncosunco co co", common_words = cbind(c("co"), c("company")))

# changing special characters to words(Note that @ and & are dropped with punctuation)
drop_char = cbind(c("\\$", "\\%"), c("dollar", "percent"))
sample_clean5 = clean_strings(sample_str, sp_char_words = drop_char)

seunglee98/fedmatch documentation built on June 26, 2019, 11:56 a.m.