| pred-projection | R Documentation |
After the projection of the reference model onto a submodel, the linear
predictors (for the original or a new dataset) based on that submodel can be
calculated by proj_linpred(). These linear predictors can also be
transformed to response scale and averaged across the projected parameter
draws. Furthermore, proj_linpred() returns the corresponding log predictive
density values if the (original or new) dataset contains response values. The
proj_predict() function draws from the predictive distributions (there is
one such distribution for each observation from the original or new dataset)
of the submodel that the reference model has been projected onto. If the
projection has not been performed yet, both functions call project()
internally to perform the projection. Both functions can also handle multiple
submodels at once (for objects of class vsel or objects returned by a
project() call to an object of class vsel; see project()).
proj_linpred(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
transform = FALSE,
integrated = FALSE,
allow_nonconst_wdraws_prj = return_draws_matrix,
return_draws_matrix = FALSE,
.seed = NA,
...
)
proj_predict(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
nresample_clusters = 1000,
return_draws_matrix = FALSE,
.seed = NA,
resp_oscale = TRUE,
...
)
object |
An object returned by |
newdata |
Passed to argument |
offsetnew |
Passed to argument |
weightsnew |
Passed to argument |
filter_nterms |
Only applies if |
transform |
For |
integrated |
For |
allow_nonconst_wdraws_prj |
Only relevant for |
return_draws_matrix |
A single logical value indicating whether to
return an object (in case of |
.seed |
Pseudorandom number generation (PRNG) seed by which the same
results can be obtained again if needed. Passed to argument |
... |
Arguments passed to |
nresample_clusters |
For |
resp_oscale |
Only relevant for the latent projection. A single logical
value indicating whether to draw from the posterior-projection predictive
distributions on the original response scale ( |
Currently, proj_predict() ignores observation weights that are not
equal to 1. A corresponding warning is thrown if this is the case.
In case of the latent projection and transform = FALSE:
Output element pred contains the linear predictors without any
modifications that may be due to the original response distribution (e.g.,
for a brms::cumulative() model, the ordered thresholds are not taken into
account).
Output element lpd contains the latent log predictive density values,
i.e., those corresponding to the latent Gaussian distribution. If newdata
is not NULL, this requires the latent response values to be supplied in a
column called .<response_name> of newdata where <response_name> needs
to be replaced by the name of the original response variable (if
<response_name> contained parentheses, these have been stripped off by
init_refmodel(); see the left-hand side of formula(<refmodel>)). For
technical reasons, the existence of column <response_name> in newdata
is another requirement (even though .<response_name> is actually used).
In the following, S_{\mathrm{prj}}, N,
C_{\mathrm{cat}}, and C_{\mathrm{lat}} from help
topic refmodel-init-get are used. (For proj_linpred() with integrated = TRUE, we have S_{\mathrm{prj}} = 1.) Furthermore, let
C denote either C_{\mathrm{cat}} (if transform = TRUE)
or C_{\mathrm{lat}} (if transform = FALSE). Then, if the
prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(predictor_terms) in the explicit or implicit call to project(),
see argument object):
proj_linpred() returns a list with the following elements:
Element pred contains the actual predictions, i.e., the linear
predictors, possibly transformed to response scale (depending on
argument transform).
Element lpd is non-NULL only if newdata is NULL or if
newdata contains response values in the corresponding column. In that
case, it contains the log predictive density values (conditional on
each of the projected parameter draws if integrated = FALSE and
averaged across the projected parameter draws if integrated = TRUE).
In case of (i) the traditional projection, (ii) the latent projection
with transform = FALSE, or (iii) the latent projection with
transform = TRUE and <refmodel>$family$cats (where <refmodel> is
an object resulting from init_refmodel(); see also
extend_family()'s argument latent_y_unqs) being NULL, both
elements are S_{\mathrm{prj}} \times N matrices
(converted to a—possibly weighted—draws_matrix if argument
return_draws_matrix is TRUE, see the description of this argument).
In case of (i) the augmented-data projection or (ii) the latent
projection with transform = TRUE and <refmodel>$family$cats being
not NULL, pred is an S_{\mathrm{prj}} \times N \times C array (if argument return_draws_matrix is TRUE, this array
is "compressed" to an S_{\mathrm{prj}} \times (N \cdot C) matrix—with the columns consisting of C blocks of
N rows—and then converted to a—possibly
weighted—draws_matrix) and lpd is an S_{\mathrm{prj}} \times
N matrix (converted to a—possibly
weighted—draws_matrix if argument return_draws_matrix is TRUE).
If return_draws_matrix is FALSE and allow_nonconst_wdraws_prj is
TRUE and integrated is FALSE and the projected draws have
nonconstant weights, then both list elements have the weights of
these draws stored in an attribute wdraws_prj. (If
return_draws_matrix, allow_nonconst_wdraws_prj, and integrated
are all FALSE, then projected draws with nonconstant weights cause an
error.)
proj_predict() returns an S_{\mathrm{prj}} \times N
matrix of predictions where S_{\mathrm{prj}} denotes
nresample_clusters in case of clustered projection (or, more generally,
in case of projected draws with nonconstant weights). If argument
return_draws_matrix is TRUE, the returned matrix is converted to a
draws_matrix (see posterior::draws_matrix()). In case of (i) the
augmented-data projection or (ii) the latent projection with resp_oscale = TRUE and <refmodel>$family$cats being not NULL, the returned matrix
(or draws_matrix) has an attribute called cats (the character vector of
response categories) and the values of the matrix (or draws_matrix) are
the predicted indices of the response categories (these indices refer to
the order of the response categories from attribute cats).
If the prediction is done for more than one submodel, the output from above
is returned for each submodel, giving a named list with one element for
each submodel (the names of this list being the numbers of predictor
terms of the submodels when counting the intercept, too).
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `ndraws`, but only for the sake of speed in this example; this
# is not recommended in general):
prj <- project(fit, predictor_terms = c("X1", "X3", "X5"), ndraws = 21,
seed = 9182)
# Predictions (at the training points) from the submodel onto which the
# reference model was projected:
prjl <- proj_linpred(prj)
prjp <- proj_predict(prj, .seed = 7364)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.