clean_vars: Variable Cleaning

View source: R/clean_vars.R

clean_varsR Documentation

Variable Cleaning

Description

This function takes a dataframe and performs a variety of cleaning. Characters are stripped of unnecessary whitespaces, non-ASCII, and lowercased. Numbers are stripped of non-numeric and turned to numeric. Date variables are parsed. Finally, gender is inferred if first name variable is provided, overwritten by self-reported prefixes and original gender variable, if it exists. Function title_gender_infer is a variation of this function.

Usage

clean_vars(
  df,
  varnames = NULL,
  varnames_date = NULL,
  date_order = "mdy",
  varnames_num = NULL,
  firstname = NULL,
  gender = TRUE,
  prefix = NULL,
  prefix_male = "^mr$",
  prefix_female = "^ms$|^miss$|^mrs$",
  gender_original = NULL,
  gender_male = "^m$|^male$",
  gender_female = "^f$|^female$"
)

Arguments

df

Dataframe to be cleaned.

varnames

All variables to be cleaned. Defaults to NULL.

varnames_date

Date variables. Defaults to NULL.

date_order

Order of the date variable, if string format. Defaults to "mdy".

varnames_num

Numeric variables. Defaults to NULL.

firstname

Variable containing first names. Defaults to NULL.

gender

Whether to create an inferred gender variable. Defaults to TRUE.

prefix

Variable containing self-reported personal prefixes. Defaults to NULL.

prefix_male

Regex expression that indicates male in the prefix variable. Defaults to "^mr$" (the input will be lowercased before comparison.)

prefix_female

Regex expressions that indicate female in the prefix variable. Defaults to "^ms$|^miss$|^mrs$" (the input will be lowercased before comparison.)

gender_original

Variable containing original gender entry. Defaults to NULL.

gender_male

Regex expression that indicates male in the original gender entry. Defaults to "^m$|^male$" (the input will be lowercased before comparison.)

gender_female

Regex expression that indicates female in the original gender entry. Defaults to "^f$|^female$" (the input will be lowercased before comparison.)


sysilviakim/Kmisc documentation built on Jan. 28, 2023, 10:58 a.m.