pred-projection | R Documentation |
After the projection of the reference model onto a submodel, the linear
predictors (for the original or a new dataset) based on that submodel can be
calculated by proj_linpred()
. These linear predictors can also be
transformed to response scale and averaged across the projected parameter
draws. Furthermore, proj_linpred()
returns the corresponding log predictive
density values if the (original or new) dataset contains response values. The
proj_predict()
function draws from the predictive distributions (there is
one such distribution for each observation from the original or new dataset)
of the submodel that the reference model has been projected onto. If the
projection has not been performed yet, both functions call project()
internally to perform the projection. Both functions can also handle multiple
submodels at once (for object
s of class vsel
or object
s returned by a
project()
call to an object of class vsel
; see project()
).
proj_linpred(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
transform = FALSE,
integrated = FALSE,
allow_nonconst_wdraws_prj = return_draws_matrix,
return_draws_matrix = FALSE,
.seed = NA,
...
)
proj_predict(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
nresample_clusters = 1000,
return_draws_matrix = FALSE,
.seed = NA,
resp_oscale = TRUE,
...
)
object |
An object returned by |
newdata |
Passed to argument |
offsetnew |
Passed to argument |
weightsnew |
Passed to argument |
filter_nterms |
Only applies if |
transform |
For |
integrated |
For |
allow_nonconst_wdraws_prj |
Only relevant for |
return_draws_matrix |
A single logical value indicating whether to
return an object (in case of |
.seed |
Pseudorandom number generation (PRNG) seed by which the same
results can be obtained again if needed. Passed to argument |
... |
Arguments passed to |
nresample_clusters |
For |
resp_oscale |
Only relevant for the latent projection. A single logical
value indicating whether to draw from the posterior-projection predictive
distributions on the original response scale ( |
Currently, proj_predict()
ignores observation weights that are not
equal to 1
. A corresponding warning is thrown if this is the case.
In case of the latent projection and transform = FALSE
:
Output element pred
contains the linear predictors without any
modifications that may be due to the original response distribution (e.g.,
for a brms::cumulative()
model, the ordered thresholds are not taken into
account).
Output element lpd
contains the latent log predictive density values,
i.e., those corresponding to the latent Gaussian distribution. If newdata
is not NULL
, this requires the latent response values to be supplied in a
column called .<response_name>
of newdata
where <response_name>
needs
to be replaced by the name of the original response variable (if
<response_name>
contained parentheses, these have been stripped off by
init_refmodel()
; see the left-hand side of formula(<refmodel>)
). For
technical reasons, the existence of column <response_name>
in newdata
is another requirement (even though .<response_name>
is actually used).
In the following, S_{\mathrm{prj}}
, N
,
C_{\mathrm{cat}}
, and C_{\mathrm{lat}}
from help
topic refmodel-init-get are used. (For proj_linpred()
with integrated = TRUE
, we have S_{\mathrm{prj}} = 1
.) Furthermore, let
C
denote either C_{\mathrm{cat}}
(if transform = TRUE
)
or C_{\mathrm{lat}}
(if transform = FALSE
). Then, if the
prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(predictor_terms)
in the explicit or implicit call to project()
,
see argument object
):
proj_linpred()
returns a list
with the following elements:
Element pred
contains the actual predictions, i.e., the linear
predictors, possibly transformed to response scale (depending on
argument transform
).
Element lpd
is non-NULL
only if newdata
is NULL
or if
newdata
contains response values in the corresponding column. In that
case, it contains the log predictive density values (conditional on
each of the projected parameter draws if integrated = FALSE
and
averaged across the projected parameter draws if integrated = TRUE
).
In case of (i) the traditional projection, (ii) the latent projection
with transform = FALSE
, or (iii) the latent projection with
transform = TRUE
and <refmodel>$family$cats
(where <refmodel>
is
an object resulting from init_refmodel()
; see also
extend_family()
's argument latent_y_unqs
) being NULL
, both
elements are S_{\mathrm{prj}} \times N
matrices
(converted to a—possibly weighted—draws_matrix
if argument
return_draws_matrix
is TRUE
, see the description of this argument).
In case of (i) the augmented-data projection or (ii) the latent
projection with transform = TRUE
and <refmodel>$family$cats
being
not NULL
, pred
is an S_{\mathrm{prj}} \times N \times C
array (if argument return_draws_matrix
is TRUE
, this array
is "compressed" to an S_{\mathrm{prj}} \times (N \cdot C)
matrix—with the columns consisting of C
blocks of
N
rows—and then converted to a—possibly
weighted—draws_matrix
) and lpd
is an S_{\mathrm{prj}} \times
N
matrix (converted to a—possibly
weighted—draws_matrix
if argument return_draws_matrix
is TRUE
).
If return_draws_matrix
is FALSE
and allow_nonconst_wdraws_prj
is
TRUE
and integrated
is FALSE
and the projected draws have
nonconstant weights, then both list
elements have the weights of
these draws stored in an attribute wdraws_prj
. (If
return_draws_matrix
, allow_nonconst_wdraws_prj
, and integrated
are all FALSE
, then projected draws with nonconstant weights cause an
error.)
proj_predict()
returns an S_{\mathrm{prj}} \times N
matrix of predictions where S_{\mathrm{prj}}
denotes
nresample_clusters
in case of clustered projection (or, more generally,
in case of projected draws with nonconstant weights). If argument
return_draws_matrix
is TRUE
, the returned matrix is converted to a
draws_matrix
(see posterior::draws_matrix()
). In case of (i) the
augmented-data projection or (ii) the latent projection with resp_oscale = TRUE
and <refmodel>$family$cats
being not NULL
, the returned matrix
(or draws_matrix
) has an attribute called cats
(the character vector of
response categories) and the values of the matrix (or draws_matrix
) are
the predicted indices of the response categories (these indices refer to
the order of the response categories from attribute cats
).
If the prediction is done for more than one submodel, the output from above
is returned for each submodel, giving a named list
with one element for
each submodel (the names of this list
being the numbers of predictor
terms of the submodels when counting the intercept, too).
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `ndraws`, but only for the sake of speed in this example; this
# is not recommended in general):
prj <- project(fit, predictor_terms = c("X1", "X3", "X5"), ndraws = 21,
seed = 9182)
# Predictions (at the training points) from the submodel onto which the
# reference model was projected:
prjl <- proj_linpred(prj)
prjp <- proj_predict(prj, .seed = 7364)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.