pred-projection: Predictions from a submodel (after projection)

pred-projectionR Documentation

Predictions from a submodel (after projection)

Description

After the projection of the reference model onto a submodel, proj_linpred() gives the linear predictor (possibly transformed to response scale) for all projected draws of such a submodel. proj_predict() draws from the predictive distribution of such a submodel. If the projection has not been performed, both functions also perform the projection. Both functions can also handle multiple submodels at once (if the input object is of class vsel).

Usage

proj_linpred(
  object,
  newdata = NULL,
  offsetnew = NULL,
  weightsnew = NULL,
  filter_nterms = NULL,
  transform = FALSE,
  integrated = FALSE,
  .seed = sample.int(.Machine$integer.max, 1),
  ...
)

proj_predict(
  object,
  newdata = NULL,
  offsetnew = NULL,
  weightsnew = NULL,
  filter_nterms = NULL,
  nresample_clusters = 1000,
  .seed = sample.int(.Machine$integer.max, 1),
  ...
)

Arguments

object

An object returned by project() or an object that can be passed to argument object of project().

newdata

Passed to argument newdata of the reference model's extract_model_data function (see init_refmodel()). Provides the predictor (and possibly also the response) data for the new (or old) observations. May also be NULL (see argument extract_model_data of init_refmodel()). If not NULL, any NAs will trigger an error.

offsetnew

Passed to argument orhs of the reference model's extract_model_data function (see init_refmodel()). Used to get the offsets for the new (or old) observations.

weightsnew

Passed to argument wrhs of the reference model's extract_model_data function (see init_refmodel()). Used to get the weights for the new (or old) observations.

filter_nterms

Only applies if object is an object returned by project(). In that case, filter_nterms can be used to filter object for only those elements (submodels) with a number of solution terms in filter_nterms. Therefore, needs to be a numeric vector or NULL. If NULL, use all submodels.

transform

For proj_linpred() only. A single logical value indicating whether the linear predictor should be transformed to response scale using the inverse-link function (TRUE) or not (FALSE).

integrated

For proj_linpred() only. A single logical value indicating whether the output should be averaged across the projected posterior draws (TRUE) or not (FALSE).

.seed

Pseudorandom number generation (PRNG) seed by which the same results can be obtained again if needed. If NULL, no seed is set and therefore, the results are not reproducible. See set.seed() for details. Here, this seed is used for drawing new group-level effects in case of a multilevel submodel (however, not yet in case of a GAMM) and for drawing from the predictive distribution of the submodel(s) in case of proj_predict(). If a clustered projection was performed, then in proj_predict(), .seed is also used for drawing from the set of the projected clusters of posterior draws (see argument nresample_clusters).

...

Arguments passed to project() if object is not already an object returned by project().

nresample_clusters

For proj_predict() with clustered projection only. Number of draws to return from the predictive distribution of the submodel. Not to be confused with argument nclusters of project(): nresample_clusters gives the number of draws (with replacement) from the set of clustered posterior draws after projection (with this set being determined by argument nclusters of project()).

Value

Let S_prj denote the number of (possibly clustered) projected posterior draws (short: the number of projected draws) and N the number of observations. Then, if the prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(solution_terms) in the call to project()):

  • proj_linpred() returns a list with elements pred (predictions) and lpd (log predictive densities). Both elements are S_prj x N matrices.

  • proj_predict() returns an S_prj x N matrix of predictions where S_prj denotes nresample_clusters in case of clustered projection.

If the prediction is done for more than one submodel, the output from above is returned for each submodel, giving a named list with one element for each submodel (the names of this list being the numbers of solutions terms of the submodels when counting the intercept, too).

Examples

if (requireNamespace("rstanarm", quietly = TRUE)) {
  # Data:
  dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)

  # The "stanreg" fit which will be used as the reference model (with small
  # values for `chains` and `iter`, but only for technical reasons in this
  # example; this is not recommended in general):
  fit <- rstanarm::stan_glm(
    y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
    QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
  )

  # Projection onto an arbitrary combination of predictor terms (with a small
  # value for `nclusters`, but only for the sake of speed in this example;
  # this is not recommended in general):
  prj <- project(fit, solution_terms = c("X1", "X3", "X5"), nclusters = 10,
                 seed = 9182)

  # Predictions (at the training points) from the submodel onto which the
  # reference model was projected:
  prjl <- proj_linpred(prj)
  prjp <- proj_predict(prj, .seed = 7364)
}


projpred documentation built on May 13, 2022, 9:08 a.m.