selbal.cv: Cross - validation process for the selection of the optimal...

View source: R/Selbal_Functions.R

selbal.cvR Documentation

Cross - validation process for the selection of the optimal number of variables and robustness evaluation

Description

Cross - validation process for the selection of the optimal number of variables and robustness evaluation

Usage

selbal.cv(x, y, n.fold = 5, n.iter = 10, seed = 31415,
  covar = NULL, col = c("steelblue1", "tomato1"),
  col2 = c("darkgreen", "steelblue4", "tan1"), logit.acc = "AUC",
  maxV = 20, zero.rep = "bayes", opt.cri = "1se",
  user_numVar = NULL)

Arguments

x

a matrix object with the information of variables (columns) for each sample (rows).

y

the response variable, either continuous or dichotomous.

n.fold

number of folds in which to divide the whole data set.

n.iter

number of iterations for the cross - validation process.

seed

a seed to make the results reproducible.

covar

data.frame with the variables to adjust for (columns).

col

vector of two colours for differentiate the variables appearing in the numerator and in the denominator of the balances.

col2

vector of three colours for the lines of the barplot with the aim of identifying if each variable appears in the Global balance, in the CV - balance or in both of them.

logit.acc

when y is dichotomous, the measure to compute for the correlation between y and the proposed balance adjusting for covariates. One of the following values: "AUC" (default), "Dev", "Rsq" or "Tjur".

maxV

numeric value defining the maximum number of variables composing the balance. Default 1e10 to give prevalence to th.imp parameter.

zero.rep

a value defining the method to use for zero - replacement. "bayes" for BM-replacement or "one" to add one read tho each cell of the matrix.

opt.cri

parameter indicating the method to determine the optimal number of variables. "max" to define this number as the number of variables which maximizes the association value or "1se" to take also the standard error into account.

user_numVar

parameter to modify the choosen optimal number of variables. If it is used, it is the final number of variables used in the method.

th.imp

the minimum increment needed when adding a new variable into the balance in order to consider an improvement.

Value

A list with the following objects:

  • a boxplot with the mean squared errors (numeric responses) or AUC values (dichotomous responses) for the test data sets using the balances resulted in the cross - validation. Branches represent the standard error and the optimal number of components according with the opt.cri criteria is highlighted with a dashed line.

  • barplot with the proportion of times a variable appears in the cross - validation balances.

  • a graphical representation of the Global Balance (draw it using grid.draw function).

  • a table with the infromation of Global Balance, CV Balance and the three most repeated balances in the cross - validation process (draw it using plot.tab function).

  • a vector with the accuracy values (MSE for continuous variables and AUC for dichotomous variables) obtained in the cross - validation procedure.

  • a table with the variables appearing in the Global Balance in a useful format for bal.value function in order to get the balance score for new datasets.

  • the regression model object where the covariates and the final balance are the explanatory variables and y the response variable.

  • the optimal number of variables estimated in the cross - validation.

Examples

# Load data set
  load("HIV.rda")
# Define x and y
  x <- HIV[,1:60]
  y <- HIV[,62]
# Run the algorithm
  CV.Bal <- selbal.cv(x,y)

malucalle/selbal documentation built on May 31, 2024, 2:36 p.m.