wash_df: clean up a data frame by parsing/updating column classes.

View source: R/miscellaneous.R

wash_dfR Documentation

clean up a data frame by parsing/updating column classes.

Description

Clean up a data frame by parsing/updating column classes, converting column names to a common case for easier use, and remove empty rows & columns. A convenience wrapper for some helpful routine cleaning functions from the janitor, readr, & tibble packages.

Usage

wash_df(
  data,
  clean_names = TRUE,
  case = "snake",
  remove_empty = TRUE,
  remove_which = c("rows", "cols"),
  parse = TRUE,
  guess_integer = FALSE,
  na = c("", "NA"),
  rownames_to_column = FALSE,
  col_name = "rowname",
  column_to_rownames = FALSE,
  names_col = "rowname"
)

Arguments

data

A messy data frame that contains inappropriate column classifications, inconsistently structured column names, empty rows/columns

clean_names

If TRUE (default), applies clean_names to reformat column names according to the specified case.

case

The case/format you want column names to be converted to if clean_names = TRUE. Default is snake_case.

remove_empty

If TRUE (the default), applies remove_empty to remove empty rows &/or columns as per remove_which

remove_which

Either "rows" to remove empty rows, "cols" to remove empty columns, or c("rows", "cols") to remove both (the default).

parse

If TRUE (the default), applies parse_guess to each column in data to guess the appropriate column classes and update them accordingly.

guess_integer

If TRUE, will classify variables containing whole numbers as integer, otherwise they are classified as the more general double/numeric class.

na

A character vector of values that should be read as missing/NA when parse = TRUE. Default is c("", "NA").

rownames_to_column

If TRUE, applies rownames_to_column to add the row names of data as a column. This is often helpful when cleaning up a data frame or tibble that used to be a matrix with row names.

col_name

If rownames_to_column = TRUE, this specifies the name of the new column to store the row names in.

column_to_rownames

If TRUE, applies column_to_rownames to use the values of a column as the row names for the data object.

names_col

If column_to_rownames = TRUE, this specifies the column containing the names you want to assign to the rows of the data object.

Value

An updated version of the input data, modified according to the chosen options.

Author(s)

Craig P. Hutton, Craig.Hutton@gov.bc.ca

See Also

remove_empty

Examples

data(mtcars)

mtcars$`Extra Column` <- rep(NA, length.out = nrow(mtcars)) #add an empty column

mtcars[33:50,] <- NA #add some missing rows

mtcars #now mtcars is messy & more like a real raw data set

#clean it up and convert the row names to a column
mtcars <- wash_df(mtcars, rownames_to_column = TRUE, col_name = "car")

mtcars #the empty rows and column are gone, huzzah! So is that awkard column name!

#or turn a column with rownames into row names
mtcars <- wash_df(mtcars, column_to_rownames = TRUE, names_col = "car")
mtcars


bcgov/elucidate documentation built on Sept. 3, 2022, 7:16 p.m.