cv.lmvar: Cross-validation for an object of class 'lmvar'
In lmvar: Linear Regression with Non-Constant Variances

Description Usage Arguments Details Value See Also Examples

k-fold cross-validation for an object of class 'lmvar'

1
2
3

cv.lmvar(object, k = 10, ks_test = FALSE, fun = NULL, log = FALSE,
  seed = NULL, sigma_min = NULL, exclude = NULL,
  slvr_options = list(), max_cores = NULL, ...)

`object`	Object of class 'lmvar'
`k`	Integer, number of folds
`ks_test`	Boolean, if `TRUE`, a Kolmogorov-Smirnov test is carried out. See details.
`fun`	User-specified function for which cross-validation results are to be obtained. See details.
`log`	Boolean, specifies whether `object` contains a fit to the response vector Y or its logarithm \log Y
`seed`	Integer, seed for the random number generator. The seed is not set when `seed` equals `NULL`.
`sigma_min`	Minimum value for the standard deviations. Can be a single number which applies to all observations, or a vector giving a minimum per observation. In case of the the default value `NULL`, the value is the same as the value in `object`.
`exclude`	Numeric vector with observations that must be excluded for error statistics. The default `NULL` means no observations are excluded. See 'Details' for more information.
`slvr_options`	List of options passed on to the function `maxLik` which carries out the fits for the k folds. See 'Details' for more information.
`max_cores`	Integer, maximum number of CPU-cores that can be used. For the default value `NULL`, the number is set to the number of available cores minus one.
`...`	Other parameters, not used in the current implementation.

Cross-validations

The function cv.lmvar carries out a k-fold cross-validation for an 'lmvar' model. For each fold, an 'lmvar' model is fit to all observations that are not in the fold (the 'training set') and prediction errors are calculated for the observations in the fold (the 'test set'). The prediction errors are the absolute error |y - μ| and its square (y - μ)^2. The average prediction errors over the observations in the fold are calculated, and the square root of the average of the squared errors is taken. Optionally, one can calculate a user-specified function fun for the test set and the 'lmvar' model resulting from the training set. Optionally, one can also calculate the Kolmogorov-Smirnov (KS) distance for the test set and its p-value.

The results for the k folds are averaged over the folds and standard deviations are calculated from the k results.

User defined function

The argument fun allows a user to specify a function for which cross-validation results must be obtained. This function must meet the following requirements.

Its arguments are:
- object_t an object of class 'lmvar',
- y a numerical vector of response values and
- X_mu the model matrix for the expected values of the response vector y.
- X_sigma the model matrix for the standard deviations of the response vector y.
It returns a single numerical value.

Carrying out a k-fold cross-validation, the function is called k times with object_t equal to the fit to the training set, y equal to the response vector of the test set, and X_mu and X_sigma the design matrices of the test set.

If the evaluation of fun gives an error, cv.lmvar will give a warning and exclude that evaluation from the mean and the standard deviation of fun over the k folds. If the evaluation of fun gives a warning, it will be ignored.

In the cross-validations, object_t contains the design matrices of the training set as object_t$X_mu and object_t$X_sigma. object_t$X_mu was formed by taking object$X_mu and removing the fold-rows. In addition, columns may have been removed to make the matrix full-rank. Therefore, object_t$X_mu may have fewer columns than object$X_mu. The same is true for object_t$X_sigma compared to object$X_sigma.

Kolmogorov-Smirnov test

When ks_test = TRUE, a Kolmogorov-Smirnov (KS) test is carried out for each fold. The test checks whether the standardized residuals (y - μ) / σ in a fold are distributed as a standard normal distribution. The KS-distance and the corresponding p-value are calculated for each fold. The test uses the function ks.test. The expectation values μ and standard deviations σ are calculated from the model matrices for the test set (the fold) and the 'lmvar' fit to the training set.

Excluding observations

The observations specified in the argument exclude are not used to calculate the error statistics MAE (mean absolute error), MSE (mean squared error) and the square root of MSE. They are also not used to calculate the statistics for the user-defined function fun. This is useful when there are a few observations with such large residuals that they dominate the error estimates. Note that the excluded observations are not excluded from the training sets. It is only in the calculation of the statistics of the test sets that the observations are excluded. They are not excluded from the KS-test: when observations have large residuals, they should have large standard deviations as well, to give the standardized residuals normal values.

Minimum sigma

The argument sigma_min gives the option to enforce a minimum standard deviation. This is useful when, in a cross-validation, a fit fails because the maximum likelihood occurs when the standard deviation of one or more observations becomes zero. When a minimum standard deviation is specified, all fits are carried out under the boundary condition that the standard deviation is larger than the minimum. If sigma_min = NULL the same value is used as was used to create object.

Other

The fits are carried out with the options slvr_options stored in the 'lmvar' object object. However, these options can be overwritten with an explicit argument slvr_options in the call of cv.lmvar. Some of the options are affected by a sigma_min larger than zero, see lmvar for details.

The argument slvr_options is a list, members of which can be a list themselves. If members of a sublist are overwritten, the other members of the sublist remain unchanged. E.g., the argument slvr_options = list(control = list(iterlim = 600)) will set control$iterlim to 600 while leaving other members of the list control unchanged.

The number of available CPU cores is detected with detectCores.

In case none of the fits in the cross-validations returns an error or a warning, a 'cvlmvar' object is returned. This is a list with the following items:

MAE a list with two items
- mean the sample mean of the absolute prediction error over the k folds
- sd the sample standard deviation of the absolute prediction error over the k folds
MSE a list with two items
- mean the sample mean of the mean squared prediction error over the k folds
- sd the sample standard deviation of the mean squared prediction error over the k folds
MSE_sqrt a list with two items
- mean the sample mean of the root mean squared prediction error over the k folds
- sd the sample standard deviation of the root mean squared prediction error over the k folds
KS_distance a list with two items
- mean the sample mean of the Kolmogorov-Smirnov distance over the k folds
- sd the sample standard deviation of the Kolmogorov-Smirnov distance over the k folds
KS_p.value a list with two items
- mean the sample mean of the p-value of Kolmogorov-Smirnov distance over the k folds
- sd the sample standard deviation of the p-value of the Kolmogorov-Smirnov distance over the k folds
fun a list with two items
- mean the sample mean of the user-specified function fun
- sd the sample standard deviation of the of the user-specified function over the k folds

The items KS_distance and KS_p.value are added only in case ks_test = TRUE.

In case a fit returns an error or a warning, the return value of cv.lmvar lists the arguments of the first call to lmvar which failed. In addition, it lists the row number of the observations in object that formed the training set for which the fit returned an error or warning. These items are returned as a list:

y the argument y of the failing call
X_mu the argument X_mu of the failing call
X_sigma the argument X_sigma of the failing call
intercept_mu the argument intercept_mu of the failing call
intercept_sigma the argument intercept_sigma of the failing call
sigma_min the argument sigma_min of the failing call
slvr_options the argument slvr_options of the failing call
control the argument control of the failing call
training_rows numeric vector containing the rows of the observations in object that were used in the failing fit

See lmvar for the options slvr_options stored in an 'lmvar' object.

cv.lm is the equivalent function for an object of class 'lm'. It is supplied in case one wants to compare an 'lmvar' fit with an 'lm' fit.

print.cvlmvar provides a print-method for an object of class 'cvlmvar'.

# Create an object of class 'lmvar'. We use a model matrix obtained from the 'cats' dataframe,
# arbitrary parameter vectors beta and a generated response vector y for the purpose of the
# example.


library(MASS)

X = model.matrix(~ Sex + Bwt, cats)
beta_mu = c(-0.1, 0.3, 4)
beta_sigma = c(-0.5, -0.1, 0.3)

mu = X %*% beta_mu
log_sigma = X %*% beta_sigma

y = rnorm( nrow(X), mean = mu, sd = exp(log_sigma))

fit = lmvar(y, X_mu = X[,-1], X_sigma = X[,-1])

# Carry out a cross-validation
cv.lmvar(fit)     

# Carry out a cross-validation using a single CPU-core
cv.lmvar(fit, max_cores = 1)

# Carry out a cross-validation including a Kolmogorov-Smirnov test, using at most two CPU-cores
cv.lmvar(fit, ks_test = TRUE, max_cores = 2)

# Carry out a cross-validation with 5 folds and control the random numbers used
cv.lmvar(fit, k = 5, seed = 5483, max_cores = 1)

# Carry out a cross-validation and exclude observations 5, 11 and 20 from the calculation of
# the error statistics
cv.lmvar(fit, exclude = c(5, 11, 20), max_cores = 1)

# Calculate cross-validation results for the fourth moment of the residuals, using a
# user-specified function
fourth = function(object, y, X_mu, X_sigma){
  mu = predict(object, X_mu[,-1], X_sigma[,-1], sigma = FALSE)
  residuals = y - mu
  return(mean(residuals^4))
}
cv.lmvar(fit, fun = fourth)
rm(fourth)

# Carry out a cross-validation and specify the maximization routine and maximum number of iterations
cv.lmvar(fit, slvr_options = list( method = "NR", control = list(iterlim = 500)))

# Use option 'log = TRUE' if you fit the log of the response vector and require error estimates for
# the response vector itself
fit = lmvar(log(y), X_mu = X[,-1], X_sigma = X[,-1])
cv = cv.lmvar(fit, log = TRUE)

# Print 'cv' using the print-method print.cvlmvar
cv

# Print 'cv' with a specified number of digits
print(cv, digits = 2)

lmvar documentation built on May 16, 2019, 5:06 p.m.

lmvar index

README.md A linear model with non-constant variances"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lmvar
Linear Regression with Non-Constant Variances

cv.lmvar: Cross-validation for an object of class 'lmvar'
In lmvar: Linear Regression with Non-Constant Variances

Description

Usage

Arguments

Details

Cross-validations

User defined function

Kolmogorov-Smirnov test

Excluding observations

Minimum sigma

Other

Value

See Also

Examples

Related to cv.lmvar in lmvar...

R Package Documentation

Browse R Packages

We want your feedback!

lmvar Linear Regression with Non-Constant Variances

cv.lmvar: Cross-validation for an object of class 'lmvar' In lmvar: Linear Regression with Non-Constant Variances

Description

Usage

Arguments

Details

Cross-validations

User defined function

Kolmogorov-Smirnov test

Excluding observations

Minimum sigma

Other

Value

See Also

Examples

Related to cv.lmvar in lmvar...

R Package Documentation

Browse R Packages

We want your feedback!

lmvar
Linear Regression with Non-Constant Variances

cv.lmvar: Cross-validation for an object of class 'lmvar'
In lmvar: Linear Regression with Non-Constant Variances