clean_strings: String cleaning for easier matching

Description Usage Arguments Details Value

View source: R/clean_strings.R

Description

clean_strings takes a string vector and cleans it according to user-given options.

Usage

1
2
3
4
5
6
7
8
clean_strings(
  string,
  sp_char_words = fedmatch::sp_char_words,
  common_words = NULL,
  remove_char = NULL,
  remove_words = FALSE,
  stem = FALSE
)

Arguments

string

character or character vector of strings

sp_char_words

character vector. Data.frame where first column is special characters and second column is full words. The default is

common_words

data.frame. Data.frame where first column is abbreviations and second column is full words.

remove_char

character vector. string of specific characters (for example, "letters") to be removed

remove_words

logical. If TRUE, removes all abbreviations and replacement words in common_words

stem

logical. If TRUE, words are stemmed

Details

This function takes a variety of options, each of which changes the behavior. Without the default settings, clean_strings will do the following: make the string lowercase; replace special characters &, $, \ names ("and", "dollar", "percent", "at"); convert tabs to spaces and removes extra spaces. This default cleaning puts the strings in a standard format to allow for easier matching.

The other options allow for the removal or replacement of other words or characters.

Value

cleaned strings


fedmatch documentation built on Nov. 23, 2021, 1:07 a.m.