CVmFold: m-fold Cross-validation for Generalised Linear Models and...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

CVmFold returns the m-fold Cross-validation prediction error of different divergences for generalised linear models.

Usage

1
2
CVmFold(y, X, m = 10L, K = 10L, family, type = NULL, divergence,
  C0 = 0.5, W = NULL, increasing = FALSE, trace = TRUE, ...)

Arguments

y

is a (n x 1) vector of response variable.

X

is a (n x p) matrice of predictors.

m

is the number of folds.

K

is the number of repetitions.

family

the family object for glm or family = "multinomial" to use multinom.

type

the type of prediction required for predict function.

divergence

the type of divergence. divergence = "L1" is the L1-norm error. divergence = "sq.error" is the squared error. divergence = "classification" gives the classification error.

C0

is a cutoff value between (0,1)

W

is a matrix of weights for classification errors (if divergence = "classification"). If W=NULL (default), W has 0 elements in the diagonal (good predictions) and 1s elsewhere.

increasing

is a boolean characterising y (see details).

trace

if trace = TRUE, hide the warnings of the fitting method.

...

additional arguments affecting the fitting method (see glm or multinom).

Details

This function computes the m-fold Cross-validation (CV) of a Generalised Linear Models family to assesses the prediction error according to a specific divergence. It is called inside InitialStep and GeneralStep functions, the two main functions of the Panning Algorithm.

In the case divergence = "classification", it is possible to have asymmetric classification errors by setting the W matrix (rows: estimated y; columns: true y) (see the example below). For logistic regression (runned with glm), the cutoff value C0 determines whether the prediction takes value 0 (prediction <=C0) or 1 (prediction >C0). For multinomial regression, increasing=TRUE states y>=1 with unit increments (it makes CVmFold runs faster).

Attention should be taken on how the estimated values of y should be returned, and choose type accordingly. See the example below on logistic regression.

Value

CVmFold returns a single numeric value assessing the estimated prediction error.

Author(s)

Samuel Orso Samuel.Orso@unige.ch

See Also

glm, family, predict.glm, InitialStep, GeneralStep

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
## Not run: 
### Binary data
# load the data
library(MASS)
data("birthwt")
y <- birthwt$low
X <- as.matrix(birthwt)[,-1]

## logistic regression with glm()
# L1 error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "L1",
     type = "response", trace = FALSE, control = list(maxit=100) )

# Squared error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "sq.error",
     type = "response", trace = FALSE, control = list(maxit=100) )

# misclassification error
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "classification",
     type = "response", trace = FALSE, control = list(maxit=100) )

# asymmetric misclassification error
Weight <- matrix(c(0,1.5,0.5,0),2,2)
set.seed(123)
CVmFold(y = y, X = X, family = binomial(link = 'logit'), divergence = "classification",
     W = Weight, type = "response", trace = FALSE, control = list(maxit=100) )

## logistic regression with multinom()
# L1 error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "L1", type = "probs" )

# Squared Error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "sq.error", type = "probs" )

# misclassification error
y <- y+1L
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "classification",
     type = "class", increasing = TRUE )

# asymmetric misclassification error
set.seed(123)
CVmFold(y = y, X = X, family = "multinomial", divergence = "classification",
     type = "class", W = Weight, increasing = TRUE )

### Count data
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
set.seed(123)
CVmFold(y = counts, X = cbind(outcome, treatment), m = 3, K = 30, family = poisson(),
     divergence = "L1" )

## End(Not run)

SMAC-Group/panning documentation built on May 9, 2019, 11:19 a.m.