plot.vsel | R Documentation |
This is the plot()
method for vsel
objects (returned by varsel()
or
cv_varsel()
). It visualizes the predictive performance of the reference
model (possibly also that of some other "baseline" model) and that of the
submodels along the full-data predictor ranking. Basic information about the
(CV) variability in the ranking of the predictors is included as well (if
available; inferred from cv_proportions()
). For a tabular representation,
see summary.vsel()
and performances()
.
## S3 method for class 'vsel'
plot(
x,
nterms_max = NULL,
stats = "elpd",
deltas = FALSE,
alpha = 2 * pnorm(-1),
baseline = if (!inherits(x$refmodel, "datafit")) "ref" else "best",
thres_elpd = NA,
resp_oscale = TRUE,
point_size = 3,
bar_thickness = 1,
ranking_nterms_max = NULL,
ranking_abbreviate = FALSE,
ranking_abbreviate_args = list(),
ranking_repel = NULL,
ranking_repel_args = list(),
ranking_colored = FALSE,
show_cv_proportions = TRUE,
cumulate = FALSE,
text_angle = NULL,
size_position = "primary_x_bottom",
...
)
x |
An object of class |
nterms_max |
Maximum submodel size (number of predictor terms) for which
the performance statistics are calculated. Using |
stats |
One or more character strings determining which performance
statistics (i.e., utilities or losses) to estimate based on the
observations in the evaluation (or "test") set (in case of
cross-validation, these are all observations because they are partitioned
into multiple test sets; in case of
|
deltas |
If |
alpha |
A number determining the (nominal) coverage |
baseline |
For |
thres_elpd |
Only relevant if |
resp_oscale |
Only relevant for the latent projection. A single logical
value indicating whether to calculate the performance statistics on the
original response scale ( |
point_size |
Passed to argument |
bar_thickness |
Passed to argument |
ranking_nterms_max |
Maximum submodel size (number of predictor terms)
for which the predictor names and the corresponding ranking proportions are
added on the x-axis. Using |
ranking_abbreviate |
A single logical value indicating whether the
predictor names in the full-data predictor ranking should be abbreviated by
|
ranking_abbreviate_args |
A |
ranking_repel |
Either |
ranking_repel_args |
A |
ranking_colored |
A single logical value indicating whether the points
and the uncertainty bars should be gradient-colored according to the CV
ranking proportions ( |
show_cv_proportions |
A single logical value indicating whether the CV
ranking proportions (see |
cumulate |
Passed to argument |
text_angle |
Passed to argument |
size_position |
A single character string specifying the position of the
submodel sizes. Either |
... |
Arguments passed to the internal function which is used for
bootstrapping (if applicable; see argument |
The stats
options "mse"
and "rmse"
are only available for:
the traditional projection,
the latent projection with resp_oscale = FALSE
,
the latent projection with resp_oscale = TRUE
in combination with
<refmodel>$family$cats
being NULL
.
The stats
option "acc"
(= "pctcorr"
) is only available for:
the binomial()
family in case of the traditional projection,
all families in case of the augmented-data projection,
the binomial()
family (on the original response scale) in case of the
latent projection with resp_oscale = TRUE
in combination with
<refmodel>$family$cats
being NULL
,
all families (on the original response scale) in case of the latent
projection with resp_oscale = TRUE
in combination with
<refmodel>$family$cats
being not NULL
.
The stats
option "auc"
is only available for:
the binomial()
family in case of the traditional projection,
the binomial()
family (on the original response scale) in case of the
latent projection with resp_oscale = TRUE
in combination with
<refmodel>$family$cats
being NULL
.
A ggplot2 plotting object (of class gg
and ggplot
). If
ranking_abbreviate
is TRUE
, the output of abbreviate()
is stored in
an attribute called projpred_ranking_abbreviated
(to allow the
abbreviations to be easily mapped back to the original predictor names).
As long as the reference model's performance is computable, it is always
shown in the plot as a dashed red horizontal line. If baseline = "best"
,
the baseline model's performance is shown as a dotted black horizontal line.
If !is.na(thres_elpd)
and any(stats %in% c("elpd", "mlpd", "gmpd"))
, the
value supplied to thres_elpd
(which is automatically adapted internally in
case of the MLPD or the GMPD or deltas = FALSE
) is shown as a dot-dashed
gray horizontal line for the reference model and, if baseline = "best"
, as
a long-dashed green horizontal line for the baseline model.
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Run varsel() (here without cross-validation, with L1 search, and with small
# values for `nterms_max` and `nclusters_pred`, but only for the sake of
# speed in this example; this is not recommended in general):
vs <- varsel(fit, method = "L1", nterms_max = 3, nclusters_pred = 10,
seed = 5555)
print(plot(vs))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.