Description Usage Arguments Details Value See Also Examples
k-fold cross-validation for an object of class 'lm'
1 2 |
object |
Object of class 'lm' |
k |
Integer, number of folds |
ks_test |
Boolean, if |
fun |
User-specified function for which cross-validation results are to be obtained. See details. |
log |
Boolean, specifies whether |
seed |
Integer, seed for the random number generator. The seed is not set when |
max_cores |
Integer, maximum number of CPU-cores that can be used. For the default value |
... |
Other parameters, not used in the current implementation. |
The function cv.lm
carries out a k-fold cross-validation for a linear model (i.e. a 'lm' model).
For each fold, an 'lm'
model is fit to all observations that are not in the fold (the 'training set') and prediction errors are calculated
for the observations in the fold (the 'test set'). The prediction errors are the absolute error |y - μ|
and its square (y - μ)^2. The average prediction errors over the observations in the fold are calculated,
and the square root of the average of the squared errors is taken. Optionally, one can calculate a user-specified
function fun
for the test set and the 'lmvar' model resulting from the
training set. Optionally, one can also calculate the Kolmogorov-Smirnov (KS) distance for the test set and its p-value.
The results for the k folds are averaged over the folds and standard deviations are calculated from the k results.
object
must contain the list-members x
and y
. I.e., it must be created by running
lm
with the options x = TRUE
and y = TRUE
.
The argument fun
allows a user to specify a function for which cross-validation results
must be obtained. This function must meet the following requirements.
Its arguments are:
object_t
an object of class 'lm',
y
a numerical vector of response values and
X
the model matrix for the response vector y
.
It returns a single numerical value.
Carrying out a k-fold cross-validation, the function is called k times with object_t
equal to the fit
to the training set, y
equal
to the response vector of the test set, and
X_mu
the design matrix of the test set.
If the evaluation of fun
gives an error, cv.lm
will give a warning and exclude that
evaluation from the mean and the standard deviation of fun
over the k folds. If the evaluation
of fun
gives a warning, it will be ignored.
In the cross-validations, object_t
contains the design matrix used in the fit to the training set as
object_t$x
.
When ks_test = TRUE
, a Kolmogorov-Smirnov (KS) test is carried out for each fold. The test checks whether the
standardized residuals (y - μ) / σ in a fold are distributed as a standard normal distribution. The
KS-distance and the corresponding p-value are calculated for each fold. The test uses the
function ks.test
. The expectation values μ and standard deviation σ are
calculated from the model matrices for the test set (the fold) and the 'lm' fit to the training set.
The number of available CPU cores is detected with detectCores
.
An object of class 'cvlmvar', which is a list with the following items:
MAE
a list with two items
mean
the sample mean of the absolute prediction error over the k folds
sd
the sample standard deviation of the absolute prediction error over the k folds
MSE
a list with two items
mean
the sample mean of the mean squared prediction error over the k folds
sd
the sample standard deviation of the mean squared prediction error over the k folds
MSE_sqrt
a list with two items
mean
the sample mean of the root mean squared prediction error over the k folds
sd
the sample standard deviation of the root mean squared prediction error
over the k folds
KS_distance
a list with two items
mean
the sample mean of the Kolmogorov-Smirnov distance over the k folds
sd
the sample standard deviation of the Kolmogorov-Smirnov distance over the k folds
KS_p.value
a list with two items
mean
the sample mean of the p-value of Kolmogorov-Smirnov distance over the k folds
sd
the sample standard deviation of the p-value of the Kolmogorov-Smirnov distance over the k folds
fun
a list with two items
mean
the sample mean of the user-specified function fun
sd
the sample standard deviation of the of the user-specified function over the k folds
The items KS_distance
and KS_p.value
are added only in case ks_test = TRUE
. The item
fun
is added only in case a function fun
has been specified.
cv.lmvar
is the equivalent function for an object of class 'lmvar'. It is supplied in
case one wants to compare an 'lmvar' fit with an 'lm' fit.
print.cvlmvar
provides a print-method for an object of class 'cvlmvar'.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | # Create an object of class 'lm'. We use a model matrix obtained from the 'cats' dataframe,
# an arbitrary parameter vector beta and a generated response vector y for the purpose of the
# example.
library(MASS)
X = model.matrix(~ Sex + Bwt, cats)
beta_mu = c(-0.1, 0.3, 4)
mu = X %*% beta_mu
y = rnorm( nrow(X), mean = mu, sd = 0.5)
fit = lm(y ~ ., as.data.frame(X[,-1]), x = TRUE, y = TRUE)
# Carry out a cross-validation
cv.lm(fit)
# Carry out a cross-validation using a single CPU-core
cv.lm(fit, max_cores = 1)
# Carry out a cross-validation including a Kolmogorov-Smirnov test, using at most two CPU-cores
cv.lm(fit, ks_test = TRUE, max_cores = 2)
# Carry out a cross-validation with 5 folds and control the random numbers used
cv.lm(fit, k = 5, seed = 5483, max_cores = 1)
# Calculate cross-validation results for the fourth moment of the residuals, using a
# user-specified function
fourth = function(object, y, X){
mu = predict(object, as.data.frame(X))
residuals = y - mu
return(mean(residuals^4))
}
cv.lm(fit, fun = fourth)
rm(fourth)
# Use option 'log = TRUE' if you fit the log of the response vector and require error estimates for
# the response vector itself
fit = lm(log(y) ~ ., as.data.frame(X[,-1]), x = TRUE, y = TRUE)
cv = cv.lm(fit, log = TRUE)
# Print 'cv' using the print-method print.cvlmvar
cv
# Print 'cv' with a specified number of digits
print(cv, digits = 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.