preprocText: preprocText

View source: R/preprocText.R

preprocTextR Documentation

preprocText

Description

Preprocess text data such as names and addresses.

Usage

preprocText(text, convert_text, tolower, soundex,
usps_address, remove_whitespace, remove_punctuation, convert_text_to)

Arguments

text

A vector of text data to convert.

convert_text

Whether to convert text to the desired encoding, where the encoding is specified in the 'convert_text_to' argument. Default is TRUE

tolower

Whether to normalize the text to be all lowercase. Default is TRUE.

soundex

Whether to convert the field to the Census's soundex encoding. Default is FALSE.

usps_address

Whether to use USPS address standardization rules to clean address fields. Default is FALSE.

remove_whitespace

Whether to remove leading and trailing whitespace, and to convert multiple spaces to a single space. Default is TRUE.

remove_punctuation

Whether to remove punctuation from a string. Default is TRUE.

convert_text_to

Which encoding to use when converting text. Default is 'Latin-ASCII'. Full list of encodings in the stri_trans_list() function in the stringi package.

Value

preprocText() returns the preprocessed vector of text.

Author(s)

Ben Fifield <benfifield@gmail.com>


fastLink documentation built on Nov. 17, 2023, 9:06 a.m.