cvLM-package | R Documentation |
This package provides efficient implementations of cross-validation techniques for linear and ridge regression models, leveraging C++ code with Rcpp
, RcppParallel
, and Eigen
libraries. It supports leave-one-out, generalized, and K
-fold cross-validation methods, utilizing Eigen matrices for high performance.
cvLM(object, ...)
## S3 method for class 'formula'
cvLM(object, data, subset, na.action, K.vals = 10L, lambda = 0,
generalized = FALSE, seed = 1L, n.threads = 1L, ...)
## S3 method for class 'lm'
cvLM(object, data, K.vals = 10L, lambda = 0,
generalized = FALSE, seed = 1L, n.threads = 1L, ...)
## S3 method for class 'glm'
cvLM(object, data, K.vals = 10L, lambda = 0,
generalized = FALSE, seed = 1L, n.threads = 1L, ...)
object |
a |
data |
a |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. See |
na.action |
a function that indicates how to handle |
K.vals |
an integer vector specifying the number of folds for cross-validation. |
lambda |
a non-negative numeric scalar specifying the regularization parameter for ridge regression. |
generalized |
a logical value indicating whether to compute generalized or ordinary cross-validation. Defaults to |
seed |
a single integer value specifying the seed for random number generation. |
n.threads |
a single positive integer value specifying the number of threads for parallel computation. |
... |
additional arguments. Currently, these do not affect the function's behavior. |
The cvLM
function is a generic function that dispatches to specific methods based on the class of the object
argument.
The cvLM.formula
method performs cross-validation for linear and ridge regression models specified using a formula interface.
The cvLM.lm
method performs cross-validation for linear regression models.
The cvLM.glm
method performs cross-validation for generalized linear models. It currently supports only gaussian family with identity link function.
The cross-validation process involves splitting the data into K
folds, fitting the model on K-1
folds, and evaluating its performance on the remaining fold. This process is repeated K
times, each time with a different fold held out for testing.
The cvLM
functions use closed-form solutions for leave-one-out and generalized cross-validation and efficiently handle the K-fold cross-validation process, optionally using multithreading for faster computation when working with larger data.
A data.frame
consisting of columns representing the number of folds, the cross-validation result, and the seed used for the computation.
Philip Nye, phipnye@proton.me
Bates D, Eddelbuettel D (2013). "Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package." Journal of Statistical Software, 52(5), 1-24. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v052.i05")}.
Aggarwal, C. C. (2020). Linear Algebra and Optimization for Machine Learning: A Textbook. Springer Cham. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-030-40344-7")}.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer New York, NY. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-84858-7")}.
formula
, lm
, glm
data(mtcars)
n <- nrow(mtcars)
# Formula method
cvLM(
mpg ~ .,
data = mtcars,
K.vals = n, # Leave-one-out CV
lambda = 10 # Shrinkage parameter of 10
)
# lm method
my.lm <- lm(mpg ~ ., data = mtcars)
cvLM(
my.lm,
data = mtcars,
K.vals = c(5L, 8L), # Perform both 5- and 8-fold CV
n.threads = 8L, # Allow up to 8 threads for computation
seed = 1234L
)
# glm method
my.glm <- glm(mpg ~ ., data = mtcars)
cvLM(
my.glm,
data = mtcars,
K.vals = n, generalized = TRUE # Use generalized CV
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.