cv.sprinter: Running sprinter with cross-validation

Description Usage Arguments Value See Also Examples

View source: R/cv.sprinter.R

Description

The main cross-validation function to select the best sprinter fit for a path of tuning parameters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cv.sprinter(
  x,
  y,
  square = FALSE,
  num_keep = NULL,
  lambda1 = NULL,
  lambda3 = NULL,
  cv_step1 = FALSE,
  nlam1 = 10,
  nlam3 = 100,
  lam_min_ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
  nfold = 5,
  foldid = NULL,
  verbose = FALSE,
  ...
)

Arguments

x

An n by p design matrix of main effects. Each row is an observation of p main effects.

y

A response vector of size n.

square

Indicator of whether squared effects should be fitted in Step 1. Default to be FALSE.

num_keep

A user specified number of candidate interactions to keep in Step 2. If num_keep is not specified (as default), it will be set to round[n / log n].

lambda1

Tuning parameter values for Step 1. lambda1 is a vector. Default to be NULL, and the program will compute its own lambda1 based on nlam1 and lam_min_ratio.

lambda3

Tuning parameter values for Step 3. lambda3 is a matrix, where the k-th column is the list of tuning parameter in Step 3 corresponding to Step 1 using lambda1[k]. Default to be NULL, and the program will compute its own lambda3 based on nlam3 and lam_min_ratio.

cv_step1

Indicator of whether cross-validation of lambda1 should be carried out in Step 1 before subsequent steps. Default is FALSE.

nlam1

the number of values in lambda1. If not specified, they will be all set to 10.

nlam3

the number of values in each column of lambda3. If not specified, they will be all set to 100.

lam_min_ratio

The ratio of the smallest and the largest values in lambda1 and each column of lambda2. The largest value is usually the smallest value for which all coefficients are set to zero. Default to be 1e-2 in the n < p setting.

nfold

Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal nfold = 3 is used.

foldid

A vector of length n representing which fold each observation belongs to. Default to be NULL, and the program will generate its own randomly.

verbose

If TRUE, a progress bar shows the progress of the fitting.

...

other arguments to be passed to the glmnet calls, such as alpha or penalty.factor

Value

An object of S3 class "sprinter".

n

The sample size.

p

The number of main effects.

square

The square parameter passed into sprinter.

a0_step3

Estimate of intercept corresponding to the CV-selected model.

compact

A compact representation of the selected variables. compact has three columns, with the first two columns representing the indices of a selected variable (main effects with first index = 0), and the last column representing the estimate of coefficients.

fit

The whole glmnet fit object.

fitted

fitted value of response corresponding to the CV-selected model.

num_keep

The value of num_keep.

cvm

The averaged estimated prediction error on the test sets over K folds.

cvse

The standard error of the estimated prediction error on the test sets over K folds.

foldid

Fold assignment. A vector of length n.

i_lambda1_best

The index in lambda1 that is chosen by CV by minimizing cvm.

i_lambda3_best

The index in lambda3 that is chosen by CV by minimizing cvm.

lambda1_best

The value of lambda1 that is chosen by CV by minimizing cvm.

lambda3_best

The value of lambda3 that is chosen by CV by minimizing cvm.

call

Function call.

See Also

predict.cv.sprinter

Examples

1
2
3
4
5
n <- 100
p <- 100
x <- matrix(rnorm(n * p), n, p)
y <- x[, 1] - 2 * x[, 2] + 3 * x[, 1] * x[, 3] - 4 * x[, 4] * x[, 5] + rnorm(n)
mod <- cv.sprinter(x = x, y = y)

hugogogo/sprintr documentation built on Dec. 14, 2021, 6:07 p.m.