README.md

sternclean seeks to simplify cleaning dataframes.

Multiple cleaning steps are accomplished in just one function.

For example, you can change column types, impute one set of columns' NAs with a set value, impute another set of columns' NAs with a group mean, and impute another set of columns' infinite values with another set value in a few lines of clean code

Here is the order of operations under the hood:

This allows multiple cleaning processes to happen in this one function

Simple Examples

We will start with simple one-step cleaning examples. Later we will take on more complex situations.

Rickle and Mortan Dataset

people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle Inf NA

Class Change Parameters

class(rickle_and_mortan$people)
#> [1] "factor"

sternclean("rickle_and_mortan",
           class_to_strng = "people")

class(rickle_and_mortan$people)
#> [1] "character"
class(rickle_and_mortan$intelligence)
#> [1] "character"

sternclean("rickle_and_mortan",
           class_to_numer = "intelligence")

class(rickle_and_mortan$intelligence)
#> [1] "numeric"

Column/Row Removal Parameters

sternclean("rickle_and_mortan",
           remove_columns = "intelligence")
people original_person evil_rank Rickle Rickle 5 Mortan Mortan 2.75 Jerry Jerry 2 Pickle Rickle Rickle NA
sternclean("rickle_and_mortan",
           remove_na_rows =  "evil_rank")
people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2
sternclean("rickle_and_mortan",
           removeby_regex = "pe")
intelligence evil_rank Inf 5 9 2.75 0.1 2 Inf NA
sternclean("rickle_and_mortan",
           remove_all_nas = TRUE)
people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2
sternclean("rickle_and_mortan",
           remove_non_num = TRUE)
intelligence evil_rank Inf 5 9 2.75 0.1 2 Inf NA
sternclean("rickle_and_mortan",
           remove_all_exc = c("people", "evil_rank"))
people evil_rank Rickle 5 Mortan 2.75 Jerry 2 Pickle Rickle NA

Impute Parameters

sternclean("rickle_and_mortan",
           impute_na2mean = "evil_rank")
people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle Inf 3.25
sternclean("rickle_and_mortan",
           impute_na_cols = "evil_rank",
           impute_na_with = 1738)
people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle Inf 1738
sternclean("rickle_and_mortan",
           impute_grpmean = "evil_rank",
           impute_grpwith = "original_person")
original_person people intelligence evil_rank Jerry Jerry 0.1 2 Mortan Mortan 9 2.75 Rickle Rickle Inf 5 Rickle Pickle Rickle Inf 5
sternclean("rickle_and_mortan",
           impute_inf_col = "intelligence",
           impute_inf_wit = 1738)
people original_person intelligence evil_rank Rickle Rickle 1738 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle 1738 NA
sternclean("rickle_and_mortan",
           impute_cust_cl = "evil_rank",
           impute_cust_fn = quantile,
           probs = .25,
           na.rm = TRUE
           )
people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle Inf 2.375

More Complex Example

Here we:

sternclean("rickle_and_mortan",
           class_to_strng = "people",
           class_to_numer = "intelligence",
           remove_columns = "original_person",
           impute_na2mean = "evil_rank",
           impute_inf_col = "intelligence",
           impute_inf_wit = 1738
           )
people intelligence evil_rank Rickle 1738 5 Mortan 9 2.75 Jerry 0.1 2 Pickle Rickle 1738 3.25

Compared to Original Data Frame

people original_person intelligence evil_rank Rickle Rickle Inf 5 Mortan Mortan 9 2.75 Jerry Jerry 0.1 2 Pickle Rickle Rickle Inf NA

basketballbeane/sternclean documentation built on Sept. 10, 2021, 7:50 a.m.