1 2 3 | getBasicCleanData(df, cleanFnx = c("impute", "zeroVar", "vif", "transform"),
transType = c("center", "scale"), imputeType = "medianImpute",
freqCutoff = 0.95, vifCutoff = 10, colsToKeep = NULL)
|
df |
data frame |
cleanFnx |
list of cleaning/preprocessing to do, defaults: c("impute", "zeroVar", "vif", "transform") |
transType |
c("center","scale") (default), also can do 'Range', 'YeoJohnson', 'BoxCox', others |
imputeType |
"medianImpute" (default), "knnImpute", may be others (see ?caret::preProcess) |
freqCutoff |
minimum \itemvifCutoffvariance inflation factor cutoff level (default=10, which is low threshold) \itemcolsToKeepvector of column names that should not be removed via VIF |
Does some basic cleaning of the dataset (imputation, zero variance, VIF for multicollinearity, or transformations). Note: imputation and transformations done using caret::preProcess, VIF done using car::vif, and zero inflation done using a modified version of caret's zeroinf function. The VIF function is slow, but not sure how to speed up unless using a multithreaded version of covariance functions used internally by car::vif(), maybe MS/Revolution R Open would do this, haven't tried though. data(iris) doBasicDataClean(iris)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.