prepXY | R Documentation |
Optional-but-useful function to: 1) provide a plausible ordering of the 'y' (fusion) variables and 2) identify the subset of 'x' (predictor) variables likely to be consequential during subsequent model training. Output can be passed directly to train
. Most useful for large datasets with many and/or highly-correlated predictors. Employs an absolute Spearman rank correlation screen and then LASSO models (via glmnet
) to return a plausible ordering of 'y' and the preferred subset of 'x' variables associated with each.
prepXY(
data,
y,
x,
weight = NULL,
cor_thresh = 0.05,
lasso_thresh = 0.95,
xmax = 100,
xforce = NULL,
fraction = 1,
cores = 1
)
data |
Data frame. Training dataset. All categorical variables should be factors and ordered whenever possible. |
y |
Character or list. Variables in |
x |
Character. Predictor variables in |
weight |
Character. Name of the observation weights column in |
cor_thresh |
Numeric. Predictors that exhibit less than |
lasso_thresh |
Numeric. Controls how aggressively the LASSO step screens out predictors. Lower value is more aggressive. |
xmax |
Integer. Maximum number of predictors returned by LASSO step. Does not strictly control the number of final predictors returned (especially for categorical |
xforce |
Character. Subset of |
fraction |
Numeric. Fraction of observations in |
cores |
Integer. Number of cores used. Only applicable on Unix systems. |
List with named slots "y" and "x". Each is a list of the same length. Former gives the preferred fusion order. Latter gives the preferred sets of predictor variables.
y <- names(recs)[c(14:16, 20:22)]
x <- names(recs)[2:13]
# Fusion variable "blocks" are respected by prepXY()
y <- c(list(y[1:2]), y[-c(1:2)])
# Do the prep work...
prep <- prepXY(data = recs, y = y, x = x)
# The result can be passed to train()
train(data = recs, y = prep$y, x = prep$x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.