cv.lm: Cross-validation for an object of class 'lm'

Description Usage Arguments Details Value See Also Examples

Description

k-fold cross-validation for an object of class 'lm'

Usage

1
2
cv.lm(object, k = 10, ks_test = FALSE, fun = NULL, log = FALSE,
  seed = NULL, max_cores = NULL, ...)

Arguments

object

Object of class 'lm'

k

Integer, number of folds

ks_test

Boolean, if TRUE, a Kolmogorov-Smirnov test is carried out. See details.

fun

User-specified function for which cross-validation results are to be obtained. See details.

log

Boolean, specifies whether object contains a fit to the response vector Y or its logarithm \log Y

seed

Integer, seed for the random number generator. The seed is not set when seed equals NULL.

max_cores

Integer, maximum number of CPU-cores that can be used. For the default value NULL, the number is set to the number of available cores minus one.

...

Other parameters, not used in the current implementation.

Details

Cross-validations

The function cv.lm carries out a k-fold cross-validation for a linear model (i.e. a 'lm' model). For each fold, an 'lm' model is fit to all observations that are not in the fold (the 'training set') and prediction errors are calculated for the observations in the fold (the 'test set'). The prediction errors are the absolute error |y - μ| and its square (y - μ)^2. The average prediction errors over the observations in the fold are calculated, and the square root of the average of the squared errors is taken. Optionally, one can calculate a user-specified function fun for the test set and the 'lmvar' model resulting from the training set. Optionally, one can also calculate the Kolmogorov-Smirnov (KS) distance for the test set and its p-value.

The results for the k folds are averaged over the folds and standard deviations are calculated from the k results.

Requirements on the 'lm' object

object must contain the list-members x and y. I.e., it must be created by running lm with the options x = TRUE and y = TRUE.

User defined function

The argument fun allows a user to specify a function for which cross-validation results must be obtained. This function must meet the following requirements.

Carrying out a k-fold cross-validation, the function is called k times with object_t equal to the fit to the training set, y equal to the response vector of the test set, and X_mu the design matrix of the test set.

If the evaluation of fun gives an error, cv.lm will give a warning and exclude that evaluation from the mean and the standard deviation of fun over the k folds. If the evaluation of fun gives a warning, it will be ignored.

In the cross-validations, object_t contains the design matrix used in the fit to the training set as object_t$x.

Kolmogorov-Smirnov test

When ks_test = TRUE, a Kolmogorov-Smirnov (KS) test is carried out for each fold. The test checks whether the standardized residuals (y - μ) / σ in a fold are distributed as a standard normal distribution. The KS-distance and the corresponding p-value are calculated for each fold. The test uses the function ks.test. The expectation values μ and standard deviation σ are calculated from the model matrices for the test set (the fold) and the 'lm' fit to the training set.

Other

The number of available CPU cores is detected with detectCores.

Value

An object of class 'cvlmvar', which is a list with the following items:

The items KS_distance and KS_p.value are added only in case ks_test = TRUE. The item fun is added only in case a function fun has been specified.

See Also

cv.lmvar is the equivalent function for an object of class 'lmvar'. It is supplied in case one wants to compare an 'lmvar' fit with an 'lm' fit.

print.cvlmvar provides a print-method for an object of class 'cvlmvar'.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Create an object of class 'lm'. We use a model matrix obtained from the 'cats' dataframe,
# an arbitrary parameter vector beta and a generated response vector y for the purpose of the
# example.
library(MASS)

X = model.matrix(~ Sex + Bwt, cats)
beta_mu = c(-0.1, 0.3, 4)

mu = X %*% beta_mu

y = rnorm( nrow(X), mean = mu, sd = 0.5)

fit = lm(y ~ ., as.data.frame(X[,-1]), x = TRUE, y = TRUE)

# Carry out a cross-validation
cv.lm(fit)   

# Carry out a cross-validation using a single CPU-core
cv.lm(fit, max_cores = 1)

# Carry out a cross-validation including a Kolmogorov-Smirnov test, using at most two CPU-cores
cv.lm(fit, ks_test = TRUE, max_cores = 2)

# Carry out a cross-validation with 5 folds and control the random numbers used
cv.lm(fit, k = 5, seed = 5483, max_cores = 1)


# Calculate cross-validation results for the fourth moment of the residuals, using a
# user-specified function
fourth = function(object, y, X){
  mu = predict(object, as.data.frame(X))
  residuals = y - mu
  return(mean(residuals^4))
}
cv.lm(fit, fun = fourth)
rm(fourth)

# Use option 'log = TRUE' if you fit the log of the response vector and require error estimates for
# the response vector itself
fit = lm(log(y) ~ ., as.data.frame(X[,-1]), x = TRUE, y = TRUE)
cv = cv.lm(fit, log = TRUE)

# Print 'cv' using the print-method print.cvlmvar
cv

# Print 'cv' with a specified number of digits
print(cv, digits = 2)

lmvar documentation built on May 16, 2019, 5:06 p.m.