preprocess_data: Pre-process cognostic data

Description Usage Arguments Details Value Author(s) See Also

View source: R/preprocess_data.R

Description

Generates a tibble with features optimized for machine learning

Usage

1
2
3
preprocess_data(x, target = "Truth", reduce_cols = FALSE,
  factor_y = TRUE, impute = "zero", corr_cutoff = 0.9,
  freq_cut = 95/5, unique_cut = 10, k = 10)

Arguments

x

data frame or tibble.

target

classifier column

reduce_cols

TRUE = Columns are reduced based on near zero variance and correlation; FALSE = Nothing

factor_y

FALSE = Recodes pred to 0 and 1; TRUE = Recodes pred to factor

impute

Impute NA by "knn","mean","zero"

corr_cutoff

Corelation coefficient level to cut off highly correlated columns, devaulted to .90

freq_cut

the cutoff for the ratio of the most common value to the second most common value

unique_cut

the cutoff for the percentage of distinct values out of the number of total samples (knn takes substantially longer to compute, zero replaces NA with 0)

k

the number of nearest neighbours to use for imputate (defaults to 10)

Details

This is the details section

Value

This function returns a tibble of optimized features

Author(s)

"Dallin Webb <[email protected]>"

See Also

preProcess


BYUIDSS/BYUImachine documentation built on Dec. 11, 2018, 1:29 a.m.