View source: R/Selbal_Functions.R
selbal.cv | R Documentation |
Cross - validation process for the selection of the optimal number of variables and robustness evaluation
selbal.cv(x, y, n.fold = 5, n.iter = 10, seed = 31415,
covar = NULL, col = c("steelblue1", "tomato1"),
col2 = c("darkgreen", "steelblue4", "tan1"), logit.acc = "AUC",
maxV = 20, zero.rep = "bayes", opt.cri = "1se",
user_numVar = NULL)
x |
a |
y |
the response variable, either continuous or dichotomous. |
n.fold |
number of folds in which to divide the whole data set. |
n.iter |
number of iterations for the cross - validation process. |
seed |
a seed to make the results reproducible. |
covar |
|
col |
|
col2 |
|
logit.acc |
when |
maxV |
|
zero.rep |
a value defining the method to use for zero - replacement.
|
opt.cri |
parameter indicating the method to determine the optimal
number of variables. |
user_numVar |
parameter to modify the choosen optimal number of variables. If it is used, it is the final number of variables used in the method. |
th.imp |
the minimum increment needed when adding a new variable into the balance in order to consider an improvement. |
A list
with the following objects:
a boxplot with the mean squared errors (numeric responses) or AUC
values (dichotomous responses) for the test data sets using the balances
resulted in the cross - validation. Branches represent the standard error and
the optimal number of components according with the opt.cri
criteria
is highlighted with a dashed line.
barplot with the proportion of times a variable appears in the cross - validation balances.
a graphical representation of the Global Balance (draw it using
grid.draw
function).
a table with the infromation of Global Balance, CV Balance and the
three most repeated balances in the cross - validation process (draw it using
plot.tab
function).
a vector with the accuracy values (MSE for continuous variables and AUC for dichotomous variables) obtained in the cross - validation procedure.
a table with the variables appearing in the Global Balance in a useful
format for bal.value
function in order to get the balance score for
new datasets.
the regression model object where the covariates and the final balance
are the explanatory variables and y
the response variable.
the optimal number of variables estimated in the cross - validation.
# Load data set
load("HIV.rda")
# Define x and y
x <- HIV[,1:60]
y <- HIV[,62]
# Run the algorithm
CV.Bal <- selbal.cv(x,y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.