preprocText | R Documentation |
Preprocess text data such as names and addresses.
preprocText(text, convert_text, tolower, soundex,
usps_address, remove_whitespace, remove_punctuation, convert_text_to)
text |
A vector of text data to convert. |
convert_text |
Whether to convert text to the desired encoding, where the encoding is specified in the 'convert_text_to' argument. Default is TRUE |
tolower |
Whether to normalize the text to be all lowercase. Default is TRUE. |
soundex |
Whether to convert the field to the Census's soundex encoding. Default is FALSE. |
usps_address |
Whether to use USPS address standardization rules to clean address fields. Default is FALSE. |
remove_whitespace |
Whether to remove leading and trailing whitespace, and to convert multiple spaces to a single space. Default is TRUE. |
remove_punctuation |
Whether to remove punctuation from a string. Default is TRUE. |
convert_text_to |
Which encoding to use when converting text. Default is 'Latin-ASCII'.
Full list of encodings in the |
preprocText()
returns the preprocessed vector of text.
Ben Fifield <benfifield@gmail.com>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.