getBasicCleanData: Does some basic cleaning of the dataset (imputation, zero...

Usage Arguments

Usage

1
2
3
getBasicCleanData(df, cleanFnx = c("impute", "zeroVar", "vif", "transform"),
  transType = c("center", "scale"), imputeType = "medianImpute",
  freqCutoff = 0.95, vifCutoff = 10, colsToKeep = NULL)

Arguments

df

data frame

cleanFnx

list of cleaning/preprocessing to do, defaults: c("impute", "zeroVar", "vif", "transform")

transType

c("center","scale") (default), also can do 'Range', 'YeoJohnson', 'BoxCox', others

imputeType

"medianImpute" (default), "knnImpute", may be others (see ?caret::preProcess)

freqCutoff

minimum

\item

vifCutoffvariance inflation factor cutoff level (default=10, which is low threshold)

\item

colsToKeepvector of column names that should not be removed via VIF

Does some basic cleaning of the dataset (imputation, zero variance, VIF for multicollinearity, or transformations). Note: imputation and transformations done using caret::preProcess, VIF done using car::vif, and zero inflation done using a modified version of caret's zeroinf function. The VIF function is slow, but not sure how to speed up unless using a multithreaded version of covariance functions used internally by car::vif(), maybe MS/Revolution R Open would do this, haven't tried though. data(iris) doBasicDataClean(iris)


wtcooper/modUtils documentation built on May 4, 2019, 11:59 a.m.