summary.vsel: Summary of a 'varsel()' or 'cv_varsel()' run

View source: R/methods.R

summary.vselR Documentation

Summary of a varsel() or cv_varsel() run

Description

This is the summary() method for vsel objects (returned by varsel() or cv_varsel()). Apart from some general information about the varsel() or cv_varsel() run, it shows the full-data predictor ranking, basic information about the (CV) variability in the ranking of the predictors (if available; inferred from cv_proportions()), and estimates for user-specified predictive performance statistics. For a graphical representation, see plot.vsel().

Usage

## S3 method for class 'vsel'
summary(
  object,
  nterms_max = NULL,
  stats = "elpd",
  type = c("mean", "se", "diff", "diff.se"),
  deltas = FALSE,
  alpha = 2 * pnorm(-1),
  baseline = if (!inherits(object$refmodel, "datafit")) "ref" else "best",
  resp_oscale = TRUE,
  cumulate = FALSE,
  ...
)

Arguments

object

An object of class vsel (returned by varsel() or cv_varsel()).

nterms_max

Maximum submodel size (number of predictor terms) for which the performance statistics are calculated. Using NULL is effectively the same as length(ranking(object)[["fulldata"]]). Note that nterms_max does not count the intercept, so use nterms_max = 0 for the intercept-only model. For plot.vsel(), nterms_max must be at least 1.

stats

One or more character strings determining which performance statistics (i.e., utilities or losses) to estimate based on the observations in the evaluation (or "test") set (in case of cross-validation, these are all observations because they are partitioned into multiple test sets; in case of varsel() with d_test = NULL, these are again all observations because the test set is the same as the training set). Available statistics are:

  • "elpd": expected log (pointwise) predictive density (for a new dataset). Estimated by the sum of the observation-specific log predictive density values (with each of these predictive density values being a—possibly weighted—average across the parameter draws).

  • "mlpd": mean log predictive density, that is, "elpd" divided by the number of observations.

  • "mse": mean squared error (only available in the situations mentioned in section "Details" below).

  • "rmse": root mean squared error (only available in the situations mentioned in section "Details" below). For the corresponding standard error and lower and upper confidence interval bounds, bootstrapping is used.

  • "acc" (or its alias, "pctcorr"): classification accuracy (only available in the situations mentioned in section "Details" below).

  • "auc": area under the ROC curve (only available in the situations mentioned in section "Details" below). For the corresponding standard error and lower and upper confidence interval bounds, bootstrapping is used.

type

One or more items from "mean", "se", "lower", "upper", "diff", and "diff.se" indicating which of these to compute for each item from stats (mean, standard error, lower and upper confidence interval bounds, mean difference to the corresponding statistic of the reference model, and standard error of this difference, respectively). The confidence interval bounds belong to normal-approximation (or bootstrap; see argument stats) confidence intervals with (nominal) coverage 1 - alpha. Items "diff" and "diff.se" are only supported if deltas is FALSE.

deltas

If TRUE, the submodel statistics are estimated as differences from the baseline model (see argument baseline). With a "difference from the baseline model", we mean to take the submodel statistic minus the baseline model statistic (not the other way round).

alpha

A number determining the (nominal) coverage 1 - alpha of the normal-approximation (or bootstrap; see argument stats) confidence intervals. For example, in case of the normal approximation, alpha = 2 * pnorm(-1) corresponds to a confidence interval stretching by one standard error on either side of the point estimate.

baseline

For summary.vsel(): Only relevant if deltas is TRUE. For plot.vsel(): Always relevant. Either "ref" or "best", indicating whether the baseline is the reference model or the best submodel found (in terms of stats[1]), respectively.

resp_oscale

Only relevant for the latent projection. A single logical value indicating whether to calculate the performance statistics on the original response scale (TRUE) or on latent scale (FALSE).

cumulate

Passed to argument cumulate of cv_proportions(). Affects column cv_proportions_diag of the summary table.

...

Arguments passed to the internal function which is used for bootstrapping (if applicable; see argument stats). Currently, relevant arguments are B (the number of bootstrap samples, defaulting to 2000) and seed (see set.seed(), but defaulting to NA so that set.seed() is not called within that function at all).

Details

The stats options "mse" and "rmse" are only available for:

  • the traditional projection,

  • the latent projection with resp_oscale = FALSE,

  • the latent projection with resp_oscale = TRUE in combination with ⁠<refmodel>$family$cats⁠ being NULL.

The stats option "acc" (= "pctcorr") is only available for:

  • the binomial() family in case of the traditional projection,

  • all families in case of the augmented-data projection,

  • the binomial() family (on the original response scale) in case of the latent projection with resp_oscale = TRUE in combination with ⁠<refmodel>$family$cats⁠ being NULL,

  • all families (on the original response scale) in case of the latent projection with resp_oscale = TRUE in combination with ⁠<refmodel>$family$cats⁠ being not NULL.

The stats option "auc" is only available for:

  • the binomial() family in case of the traditional projection,

  • the binomial() family (on the original response scale) in case of the latent projection with resp_oscale = TRUE in combination with ⁠<refmodel>$family$cats⁠ being NULL.

Value

An object of class vselsummary.

See Also

print.vselsummary()

Examples


# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)

# The "stanreg" fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
  y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
  QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)

# Run varsel() (here without cross-validation, with L1 search, and with small
# values for `nterms_max` and `nclusters_pred`, but only for the sake of
# speed in this example; this is not recommended in general):
vs <- varsel(fit, method = "L1", nterms_max = 3, nclusters_pred = 10,
             seed = 5555)
print(summary(vs), digits = 1)


projpred documentation built on Oct. 1, 2023, 1:07 a.m.