varsel: Variable selection for generalized linear models

Description Usage Arguments Value Examples

View source: R/varsel.R

Description

Perform the projection predictive variable selection for generalized linear models, generalized linear and additive multilevel models using generic reference models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
varsel(object, ...)

## Default S3 method:
varsel(object, ...)

## S3 method for class 'refmodel'
varsel(
  object,
  d_test = NULL,
  method = NULL,
  ndraws = NULL,
  nclusters = NULL,
  ndraws_pred = NULL,
  nclusters_pred = NULL,
  cv_search = TRUE,
  nterms_max = NULL,
  intercept = TRUE,
  verbose = TRUE,
  lambda_min_ratio = 1e-05,
  nlambda = 150,
  thresh = 1e-06,
  regul = 1e-04,
  penalty = NULL,
  search_terms = NULL,
  ...
)

Arguments

object

Either a refmodel-type object created by get_refmodel, a init_refmodel, an object which can be converted to a reference model using get_refmodel or a vsel object resulting from varsel or cv_varsel.

...

Additional arguments to be passed to the get_refmodel-function.

d_test

A test dataset, which is used to evaluate model performance. If not provided, training data is used. Currently this argument is for internal use only.

method

The method used in the variable selection. Possible options are 'L1' for L1-search and 'forward' for forward selection. Default is 'forward' if the number of variables in the full data is at most 20,' and 'L1' otherwise.

ndraws

Number of posterior draws used in the variable selection. Cannot be larger than the number of draws in the reference model. Ignored if nclusters is set.

nclusters

Number of clusters to use in the clustered projection. Overrides the ndraws argument. Defaults to 1.

ndraws_pred

Number of projected draws used for prediction (after selection). Ignored if nclusters_pred is given. Note that setting less draws or clusters than posterior draws in the reference model may result in slightly inaccurate projection performance, although increasing this argument linearly affects the computation time.

nclusters_pred

Number of clusters used for prediction (after selection). Default is 5.

cv_search

If TRUE, then the projected coefficients after L1-selection are computed without any penalization (or using only the regularization determined by regul). If FALSE, then the coefficients are the solution from the' L1-penalized projection. This option is relevant only if method='L1'. Default is TRUE for genuine reference models and FALSE if object is datafit (see init_refmodel).

nterms_max

Maximum number of varibles until which the selection is continued. Defaults to min(20, D, floor(0.4*n)) where n is the number of observations and D the number of variables.

intercept

Whether to use intercept in the submodels. Defaults to TRUE.

verbose

If TRUE, may print out some information during the selection. Defaults to FALSE.

lambda_min_ratio

Ratio between the smallest and largest lambda in the L1-penalized search. This parameter essentially determines how long the search is carried out, i.e., how large submodels are explored. No need to change the default value unless the program gives a warning about this.

nlambda

Number of values in the lambda grid for L1-penalized search. No need to change unless the program gives a warning about this.

thresh

Convergence threshold when computing L1-path. Usually no need to change this.

regul

Amount of regularization in the projection. Usually there is no need for regularization, but sometimes for some models the projection can be ill-behaved and we need to add some regularization to avoid numerical problems.

penalty

Vector determining the relative penalties or costs for the variables. Zero means that those variables have no cost and will therefore be selected first, whereas Inf means those variables will never be selected. Currently works only if method == 'L1'. By default 1 for each variable.

search_terms

A custom list of terms to evaluate for variable selection. By default considers all the terms in the reference model's formula.

Value

An object of type vsel that contains information about the feature selection. The fields are not meant to be accessed directly by the user but instead via the helper functions (see the vignettes or type ?projpred to see the main functions in the package.)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
if (requireNamespace('rstanarm', quietly=TRUE)) {
  ### Usage with stanreg objects
  n <- 30
  d <- 5
  x <- matrix(rnorm(n*d), nrow=n)
  y <- x[,1] + 0.5*rnorm(n)
  data <- data.frame(x,y)
  fit <- rstanarm::stan_glm(y ~ X1 + X2 + X3 + X4 + X5, gaussian(), data=data,
    chains=2, iter=500)
  vs <- varsel(fit)
  plot(vs)
}

projpred documentation built on Oct. 28, 2020, 5:08 p.m.