cvLM-package: Cross-validation for linear and ridge regression models

cvLM-packageR Documentation

Cross-validation for linear and ridge regression models

Description

This package provides efficient implementations of cross-validation techniques for linear and ridge regression models, leveraging C++ code with Rcpp, RcppParallel, and Eigen libraries. It supports leave-one-out, generalized, and K-fold cross-validation methods, utilizing Eigen matrices for high performance.

Usage

cvLM(object, ...)
## S3 method for class 'formula'
cvLM(object, data, subset, na.action, K.vals = 10L, lambda = 0,
     generalized = FALSE, seed = 1L, n.threads = 1L, ...)
## S3 method for class 'lm'
cvLM(object, data, K.vals = 10L, lambda = 0,
     generalized = FALSE, seed = 1L, n.threads = 1L, ...)
## S3 method for class 'glm'
cvLM(object, data, K.vals = 10L, lambda = 0,
     generalized = FALSE, seed = 1L, n.threads = 1L, ...)

Arguments

object

a formula, a linear model (lm), or a generalized linear model (glm) object.

data

a data.frame containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process. See model.frame for more details.

na.action

a function that indicates how to handle NA values in the data. See model.frame for more details.

K.vals

an integer vector specifying the number of folds for cross-validation.

lambda

a non-negative numeric scalar specifying the regularization parameter for ridge regression.

generalized

a logical value indicating whether to compute generalized or ordinary cross-validation. Defaults to FALSE for ordinary cross-validation.

seed

a single integer value specifying the seed for random number generation.

n.threads

a single positive integer value specifying the number of threads for parallel computation.

...

additional arguments. Currently, these do not affect the function's behavior.

Details

The cvLM function is a generic function that dispatches to specific methods based on the class of the object argument.

The cvLM.formula method performs cross-validation for linear and ridge regression models specified using a formula interface.

The cvLM.lm method performs cross-validation for linear regression models.

The cvLM.glm method performs cross-validation for generalized linear models. It currently supports only gaussian family with identity link function.

The cross-validation process involves splitting the data into K folds, fitting the model on K-1 folds, and evaluating its performance on the remaining fold. This process is repeated K times, each time with a different fold held out for testing.

The cvLM functions use closed-form solutions for leave-one-out and generalized cross-validation and efficiently handle the K-fold cross-validation process, optionally using multithreading for faster computation when working with larger data.

Value

A data.frame consisting of columns representing the number of folds, the cross-validation result, and the seed used for the computation.

Author(s)

Philip Nye, phipnye@proton.me

References

Bates D, Eddelbuettel D (2013). "Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package." Journal of Statistical Software, 52(5), 1-24. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v052.i05")}.

Aggarwal, C. C. (2020). Linear Algebra and Optimization for Machine Learning: A Textbook. Springer Cham. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-030-40344-7")}.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer New York, NY. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-84858-7")}.

See Also

formula, lm, glm

Examples

data(mtcars)
n <- nrow(mtcars)

# Formula method
cvLM(
  mpg ~ .,
  data = mtcars,
  K.vals = n,    # Leave-one-out CV
  lambda = 10    # Shrinkage parameter of 10
)

# lm method
my.lm <- lm(mpg ~ ., data = mtcars)
cvLM(
  my.lm,
  data = mtcars,
  K.vals = c(5L, 8L), # Perform both 5- and 8-fold CV
  n.threads = 8L,     # Allow up to 8 threads for computation
  seed = 1234L
)

# glm method
my.glm <- glm(mpg ~ ., data = mtcars)
cvLM(
  my.glm,
  data = mtcars,
  K.vals = n, generalized = TRUE # Use generalized CV
)

cvLM documentation built on Sept. 11, 2024, 5:28 p.m.