Description Usage Arguments Details Value Side Effects References See Also Examples

This function calculates the estimated K-fold cross-validation prediction error for generalized linear models.

1 |

`data` |
A matrix or data frame containing the data. The rows should be cases and the columns correspond to variables, one of which is the response. |

`glmfit` |
An object of class |

`cost` |
A function of two vector arguments specifying the cost function for the
cross-validation. The first argument to |

`K` |
The number of groups into which the data should be split to estimate the
cross-validation prediction error. The value of |

The data is divided randomly into `K`

groups. For each group the generalized
linear model is fit to `data`

omitting that group, then the function `cost`

is applied to the observed responses in the group that was omitted from the fit
and the prediction made by the fitted models for those observations.

When `K`

is the number of observations leave-one-out cross-validation is used
and all the possible splits of the data are used. When `K`

is less than
the number of observations the `K`

splits to be used are found by randomly
partitioning the data into `K`

groups of approximately equal size. In this
latter case a certain amount of bias is introduced. This can be reduced by
using a simple adjustment (see equation 6.48 in Davison and Hinkley, 1997).
The second value returned in `delta`

is the estimate adjusted by this method.

The returned value is a list with the following components.

`call` |
The original call to |

`K` |
The value of |

`delta` |
A vector of length two. The first component is the raw cross-validation estimate of prediction error. The second component is the adjusted cross-validation estimate. The adjustment is designed to compensate for the bias introduced by not using leave-one-out cross-validation. |

`seed` |
The value of |

The value of `.Random.seed`

is updated.

Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984)
*Classification and Regression Trees*. Wadsworth.

Burman, P. (1989) A comparative study of ordinary cross-validation,
*v*-fold cross-validation and repeated learning-testing methods.
*Biometrika*, **76**, 503–514

Davison, A.C. and Hinkley, D.V. (1997)
*Bootstrap Methods and Their Application*. Cambridge University Press.

Efron, B. (1986) How biased is the apparent error rate of a prediction rule?
*Journal of the American Statistical Association*, **81**, 461–470.

Stone, M. (1974) Cross-validation choice and assessment of statistical
predictions (with Discussion).
*Journal of the Royal Statistical Society, B*, **36**, 111–147.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
# leave-one-out and 6-fold cross-validation prediction error for
# the mammals data set.
data(mammals, package="MASS")
mammals.glm <- glm(log(brain) ~ log(body), data = mammals)
(cv.err <- cv.glm(mammals, mammals.glm)$delta)
(cv.err.6 <- cv.glm(mammals, mammals.glm, K = 6)$delta)
# As this is a linear model we could calculate the leave-one-out
# cross-validation estimate without any extra model-fitting.
muhat <- fitted(mammals.glm)
mammals.diag <- glm.diag(mammals.glm)
(cv.err <- mean((mammals.glm$y - muhat)^2/(1 - mammals.diag$h)^2))
# leave-one-out and 11-fold cross-validation prediction error for
# the nodal data set. Since the response is a binary variable an
# appropriate cost function is
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
nodal.glm <- glm(r ~ stage+xray+acid, binomial, data = nodal)
(cv.err <- cv.glm(nodal, nodal.glm, cost, K = nrow(nodal))$delta)
(cv.11.err <- cv.glm(nodal, nodal.glm, cost, K = 11)$delta)
``` |

```
[1] 0.4918650 0.4916571
[1] 0.5033827 0.5000242
[1] 0.491865
[1] 0.1886792 0.1886792
[1] 0.245283 0.241723
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.