cv: Cross Validation
In msgl: Multinomial Sparse Group Lasso

Description Usage Arguments Value Author(s) Examples

Multinomial sparse group lasso cross validation, with or without parallel backend.

cv(x, classes, sampleWeights = NULL, grouping = NULL,
  groupWeights = NULL, parameterWeights = NULL, alpha = 0.5,
  standardize = TRUE, lambda, d = 100, fold = 10L,
  cv.indices = list(), intercept = TRUE, sparse.data = is(x,
  "sparseMatrix"), max.threads = NULL, use_parallel = FALSE,
  algorithm.config = msgl.standard.config)

`x`	design matrix, matrix of size N \times p.
`classes`	classes, factor of length N.
`sampleWeights`	sample weights, a vector of length N.
`grouping`	grouping of features (covariates), a vector of length p. Each element of the vector specifying the group of the feature.
`groupWeights`	the group weights, a vector of length m (the number of groups). If `groupWeights = NULL` default weights will be used. Default weights are 0 for the intercept and √{K\cdot\textrm{number of features in the group}} for all other weights.
`parameterWeights`	a matrix of size K \times p. If `parameterWeights = NULL` default weights will be used. Default weights are is 0 for the intercept weights and 1 for all other weights.#'
`alpha`	the α value 0 for group lasso, 1 for lasso, between 0 and 1 gives a sparse group lasso penalty.
`standardize`	if TRUE the features are standardize before fitting the model. The model parameters are returned in the original scale.
`lambda`	lambda.min relative to lambda.max or the lambda sequence for the regularization path.
`d`	length of lambda sequence (ignored if `length(lambda) > 1`)
`fold`	the fold of the cross validation, an integer larger than 1 and less than N+1. Ignored if `cv.indices != NULL`. If `fold`≤`max(table(classes))` then the data will be split into `fold` disjoint subsets keeping the ration of classes approximately equal. Otherwise the data will be split into `fold` disjoint subsets without keeping the ration fixed.
`cv.indices`	a list of indices of a cross validation splitting. If `cv.indices = NULL` then a random splitting will be generated using the `fold` argument.
`intercept`	should the model include intercept parameters
`sparse.data`	if TRUE `x` will be treated as sparse, if `x` is a sparse matrix it will be treated as sparse by default.
`max.threads`	Deprecated (will be removed in 2018), instead use `use_parallel = TRUE` and registre parallel backend (see package 'doParallel'). The maximal number of threads to be used.
`use_parallel`	If `TRUE` the `foreach` loop will use `%dopar%`. The user must registre the parallel backend.
`algorithm.config`	the algorithm configuration to be used.

`link`	the linear predictors – a list of length `length(lambda)` one item for each lambda value, with each item a matrix of size K \times N containing the linear predictors.
`response`	the estimated probabilities - a list of length `length(lambda)` one item for each lambda value, with each item a matrix of size K \times N containing the probabilities.
`classes`	the estimated classes - a matrix of size N \times d with d=`length(lambda)`.
`cv.indices`	the cross validation splitting used.
`features`	number of features used in the models.
`parameters`	number of parameters used in the models.
`classes.true`	the true classes used for estimation, this is equal to the `classes` argument

Martin Vincent

data(SimData)

# A quick look at the data
dim(x)
table(classes)

# Setup clusters
cl <- makeCluster(2)
registerDoParallel(cl)

# Run cross validation using 2 clusters
# Using a lambda sequence ranging from the maximal lambda to 0.7 * maximal lambda
fit.cv <- msgl::cv(x, classes, alpha = 0.5, lambda = 0.7, use_parallel = TRUE)

# Stop clusters
stopCluster(cl)

# Print some information
fit.cv

# Cross validation errors (estimated expected generalization error)
# Misclassification rate
Err(fit.cv)

# Negative log likelihood error
Err(fit.cv, type="loglike")