CVmFold: m-fold Cross-validation for Generalised Linear Models and...
In SMAC-Group/panning: An implementation of the Panning Algorithm

Description Usage Arguments Details Value Author(s) See Also Examples

CVmFold returns the m-fold Cross-validation prediction error of different divergences for generalised linear models.

1 2	CVmFold(y, X, m = 10L, K = 10L, family, type = NULL, divergence, C0 = 0.5, W = NULL, increasing = FALSE, trace = TRUE, ...)

`y`	is a (n x 1) vector of response variable.
`X`	is a (n x p) matrice of predictors.
`m`	is the number of folds.
`K`	is the number of repetitions.
`family`	the family object for `glm` or `family = "multinomial"` to use `multinom`.
`type`	the type of prediction required for `predict` function.
`divergence`	the type of divergence. `divergence = "L1"` is the L1-norm error. `divergence = "sq.error"` is the squared error. `divergence = "classification"` gives the classification error.
`C0`	is a cutoff value between (0,1)
`W`	is a matrix of weights for classification errors (if `divergence = "classification"`). If `W=NULL` (default), `W` has 0 elements in the diagonal (good predictions) and 1s elsewhere.
`increasing`	is a boolean characterising `y` (see details).
`trace`	if `trace = TRUE`, hide the warnings of the fitting method.
`...`	additional arguments affecting the fitting method (see `glm` or `multinom`).

This function computes the m-fold Cross-validation (CV) of a Generalised Linear Models family to assesses the prediction error according to a specific divergence. It is called inside InitialStep and GeneralStep functions, the two main functions of the Panning Algorithm.

In the case divergence = "classification", it is possible to have asymmetric classification errors by setting the W matrix (rows: estimated y; columns: true y) (see the example below). For logistic regression (runned with glm), the cutoff value C0 determines whether the prediction takes value 0 (prediction <=C0) or 1 (prediction >C0). For multinomial regression, increasing=TRUE states y>=1 with unit increments (it makes CVmFold runs faster).

Attention should be taken on how the estimated values of y should be returned, and choose type accordingly. See the example below on logistic regression.

CVmFold returns a single numeric value assessing the estimated prediction error.

Samuel Orso Samuel.Orso@unige.ch

glm, family, predict.glm, InitialStep, GeneralStep

## Not run: 
### Binary data
# load the data
library(MASS)
data("birthwt")
y <- birthwt$low
X <- as.matrix(birthwt)[,-1]

## logistic regression with glm()
# L1 error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "L1",
     type = "response", trace = FALSE, control = list(maxit=100) )

# Squared error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "sq.error",
     type = "response", trace = FALSE, control = list(maxit=100) )

# misclassification error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "classification",
     type = "response", trace = FALSE, control = list(maxit=100) )

# asymmetric misclassification error
Weight <- matrix(c(0,1.5,0.5,0),2,2)
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "classification",
     W = Weight, type = "response", trace = FALSE, control = list(maxit=100) )

## logistic regression with multinom()
# L1 error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "L1", type = "probs" )

# Squared Error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "sq.error", type = "probs" )

# misclassification error
y <- y+1L
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "classification",
     type = "class", increasing = TRUE )

# asymmetric misclassification error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "classification",
     type = "class", W = Weight, increasing = TRUE )

### Count data
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
set.seed(123)
CVmFold(y = counts, X = cbind(outcome, treatment), m = 3, K = 30, family = poisson(),
     divergence = "L1" )

## End(Not run)