format_data: Format data frames and simple features using common...

View source: R/format_data.R

format_dataR Documentation

Format data frames and simple features using common approaches

Description

This function can apply the following common data cleaning tasks:

Usage

format_data(
  x,
  var_names = NULL,
  clean_names = TRUE,
  replace_na_with = NULL,
  replace_with_na = NULL,
  replace_empty_char_with_na = TRUE,
  fix_date = TRUE,
  sf_col = NULL
)

rename_with_xwalk(x, xwalk = NULL)

fix_date(x)

relocate_sf_col(x, .after = dplyr::everything())

rename_sf_col(x, sf_col = "geometry")

bind_address_col(x, city = NULL, county = NULL, state = NULL)

bind_block_col(
  x,
  bldg_num = "bldg_num",
  street_dir_prefix = "street_dir_prefix",
  street_name = "street_name",
  street_suffix = "street_type"
)

bind_boundary_col(x, boundary = NULL, join = NULL, ...)

bind_units_col(x, y, units = NULL, drop = FALSE, keep_all = TRUE, .id = NULL)

Arguments

x

A tibble or data frame object

var_names

A named list following the format, list("New var name" = old_var_name), or a two column data frame with the first column being the new variable names and the second column being the old variable names; defaults to NULL.

clean_names

If TRUE, pass data frame to janitor::clean_names; defaults to TRUE.

replace_na_with

A named list to pass to tidyr::replace_na; defaults to NULL.

replace_with_na

A named list to pass to naniar::replace_with_na; defaults to NULL.

replace_empty_char_with_na

If TRUE, replace "" with NA using naniar::replace_with_na_if, Default: TRUE

fix_date

If TRUE, fix UNIX dates (common issue with dates from FeatureServer and MapServer sources) , Default: TRUE

sf_col

Name to use for the sf column after renaming; defaults to "geometry".

xwalk

a data frame with two columns using the first column as name and the second column as value; or a named list. The existing names of x must be the values and the new names must be the names.

.after

The location to place sf column after; defaults to dplyr::everything.

city, county, state

City, county, and state to bind to data frame or sf object.

boundary

An sf object with a column named "name" or a list of sf objects where all items in the list have a "name" column.

join

geometry predicate function; defaults to NULL, set to sf::st_intersects if key_list contains only POLYGON or MULTIPOLYGON objects or sf::st_nearest_feature if key_list contains other types.

y

Vector of numeric or units values to bind to x.

units

Units to use for y (if numeric) or convert to (if y is units class); defaults to NULL.

drop

If TRUE, apply the units::drop_units function to the column with units class values and return numeric values instead; defaults to FALSE.

keep_all

If FALSE, keep all columns. If FALSE, return only the named .id column.

.id

Name to use for vector of units provided to "y" parameter, when "y" is bound to the "x" data frame or tibble as a new column.

Details

  • Applies stringr::str_squish and stringr::str_trim to all character columns (str_trim_squish)

  • Optionally replaces all character values of "" with NA values

  • Optionally corrects UNIX formatted dates with 1970-01-01 origins

  • Optionally renames variables by passing a named list of variables

Bind columns:

  • bind_address_col bind a provided value for city, county, and state to a data frame (to supplement address data with consistent values for these variables)

  • bind_block_col requires a data frame with columns named "bldg_num", "street_dir_prefix", "street_name", and "street_type" and binds derived values for whether a building is on the even or odd side of a block and create a block number (street segment), and block face (street segment side) identifier.

  • bind_boundary_col uses sf::st_join to assign simple feature data to an enclosing polygon.

Simple feature only functions:

If "sf_col" is not NULL for format_data, the function calls rename_sf_col and relocate_sf_col

  • rename_sf_col: Rename sf column.

  • relocate_sf_col: Relocate sf column after everything (default) or specified column.

bind_boundary_col is also only able to work with simple feature objects.

Value

The input data frame or simple feature object with formatting functions applied.


elipousson/overedge documentation built on Aug. 13, 2022, 7:41 p.m.