string_clean | R Documentation |
string_clean
is designed to clean and preprocess strings and factors within a
data.frame or data.table after importing from SQL, text files, CSVs, etc. It
encodes text to UTF-8, trims and replaces multiple whitespaces, converts blank
strings to true NA values, and optionally converts strings factors. The function
maintains the original order of columns and leaves numeric and logical columns
as they were.
string_clean(dat = NULL, stringsAsFactors = FALSE)
dat |
name of data.frame or data.table |
stringsAsFactors |
logical. Specifies whether to convert strings to factors (TRUE) or not (FALSE). Note that columns that were originally factors will always be returned as factors. |
Depending on the size of the data.frame/data.table, the cleaning process can take a long time.
The string_clean
function modifies objects in place due to the use
of data.table's by-reference assignment (e.g., :=). In other words, there is
no need to assign the output, just
type string_clean(myTable)
.
data.table
myTable <- data.table::data.table(
intcol = as.integer(1, 2, 3),
county = c(' King County ', 'Pierce County', ' Snohomish county '))
myTable[, county_factor := factor(county)]
string_clean(myTable, stringsAsFactors = TRUE)
print(myTable)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.