glm.npp: Posterior of normalized power prior (NPP)
In hdbayes: Bayesian Analysis of Generalized Linear Models with Historical Data

glm.npp

R Documentation

Posterior of normalized power prior (NPP)

Description

Sample from the posterior distribution of a GLM using the normalized power prior (NPP) by Duan et al. (2006) doi:10.1002/env.752.

Usage

glm.npp(
  formula,
  family,
  data.list,
  a0.lognc,
  lognc,
  offset.list = NULL,
  beta.mean = NULL,
  beta.sd = NULL,
  disp.mean = NULL,
  disp.sd = NULL,
  a0.shape1 = 1,
  a0.shape2 = 1,
  a0.lower = NULL,
  a0.upper = NULL,
  get.loglik = FALSE,
  iter_warmup = 1000,
  iter_sampling = 1000,
  chains = 4,
  ...
)

Arguments

`formula`	a two-sided formula giving the relationship between the response variable and covariates.
`family`	an object of class `family`. See `?stats::family`.
`data.list`	a list of `data.frame`s. The first element in the list is the current data, and the rest are the historical data sets.
`a0.lognc`	a vector giving values of the power prior parameter for which the logarithm of the normalizing constant has been evaluated.
`lognc`	an S by T matrix where S is the length of `a0.lognc`, T is the number of historical data sets, and the j-th column, j = 1, ..., T, is a vector giving the logarithm of the normalizing constant (as estimated by `glm.npp.lognc()` for `a0.lognc` using the j-th historical data set.
`offset.list`	a list of vectors giving the offsets for each data. The length of `offset.list` is equal to the length of `data.list`. The length of each element of `offset.list` is equal to the number of rows in the corresponding element of `data.list`. Defaults to a list of vectors of 0s.
`beta.mean`	a scalar or a vector whose dimension is equal to the number of regression coefficients giving the mean parameters for the initial prior on regression coefficients. If a scalar is provided, `beta.mean` will be a vector of repeated elements of the given scalar. Defaults to a vector of 0s.
`beta.sd`	a scalar or a vector whose dimension is equal to the number of regression coefficients giving the sd parameters for the initial prior on regression coefficients. If a scalar is provided, same as for `beta.mean`. Defaults to a vector of 10s.
`disp.mean`	location parameter for the half-normal prior on dispersion parameter. Defaults to 0.
`disp.sd`	scale parameter for the half-normal prior on dispersion parameter. Defaults to 10.
`a0.shape1`	first shape parameter for the i.i.d. beta prior on `a_0` vector. When `a0.shape1 == 1` and `a0.shape2 == 1`, a uniform prior is used.
`a0.shape2`	second shape parameter for the i.i.d. beta prior on `a_0` vector. When `a0.shape1 == 1` and `a0.shape2 == 1`, a uniform prior is used.
`a0.lower`	a scalar or a vector whose dimension is equal to the number of historical data sets giving the lower bounds for each element of the `a_0` vector. If a scalar is provided, `a0.lower` will be a vector of repeated elements of the given scalar. Defaults to a vector of 0s.
`a0.upper`	a scalar or a vector whose dimension is equal to the number of historical data sets giving the upper bounds for each element of the `a_0` vector. If a scalar is provided, same as for `a0.lower`. Defaults to a vector of 1s.
`get.loglik`	whether to generate log-likelihood matrix. Defaults to FALSE.
`iter_warmup`	number of warmup iterations to run per chain. Defaults to 1000. See the argument `iter_warmup` in `sample()` method in cmdstanr package.
`iter_sampling`	number of post-warmup iterations to run per chain. Defaults to 1000. See the argument `iter_sampling` in `sample()` method in cmdstanr package.
`chains`	number of Markov chains to run. Defaults to 4. See the argument `chains` in `sample()` method in cmdstanr package.
`...`	arguments passed to `sample()` method in cmdstanr package (e.g., `seed`, `refresh`, `init`).

Details

Before using this function, users must estimate the logarithm of the normalizing constant across a range of different values for the power prior parameter (a_0), possibly smoothing techniques over a find grid. The power prior parameters (a_0's) are treated as random with independent beta priors. The initial priors on the regression coefficients are independent normal priors. The current and historical data sets are assumed to have a common dispersion parameter with a half-normal prior (if applicable). For normal linear models, the exact normalizing constants for NPP can be computed. See the implementation in lm.npp().

Value

The function returns an object of class draws_df containing posterior samples. The object has two attributes:

data: a list of variables specified in the data block of the Stan program
model: a character string indicating the model name

References

Duan, Y., Ye, K., and Smith, E. P. (2005). Evaluating water quality using power priors to incorporate historical information. Environmetrics, 17(1), 95–106.

Examples


  if(requireNamespace("parallel")){
    data(actg019)
    data(actg036)
    ## take subset for speed purposes
    actg019 = actg019[1:100, ]
    actg036 = actg036[1:50, ]

    library(parallel)
    ncores    = 2
    data.list = list(data = actg019, histdata = actg036)
    formula   = cd4 ~ treatment + age + race
    family    = poisson()
    a0        = seq(0, 1, length.out = 11)
    if (instantiate::stan_cmdstan_exists()) {
      ## call created function
      ## wrapper to obtain log normalizing constant in parallel package
      logncfun = function(a0, ...){
        hdbayes::glm.npp.lognc(
          formula = formula, family = family, a0 = a0, histdata = data.list[[2]],
          ...
        )
      }

      cl = makeCluster(ncores)
      clusterSetRNGStream(cl, 123)
      clusterExport(cl, varlist = c('formula', 'family', 'data.list'))
      a0.lognc = parLapply(
        cl = cl, X = a0, fun = logncfun, iter_warmup = 500,
        iter_sampling = 1000, chains = 1, refresh = 0
      )
      stopCluster(cl)
      a0.lognc = data.frame( do.call(rbind, a0.lognc) )

      ## sample from normalized power prior
      glm.npp(
        formula = formula,
        family = family,
        data.list = data.list,
        a0.lognc = a0.lognc$a0,
        lognc = matrix(a0.lognc$lognc, ncol = 1),
        chains = 1, iter_warmup = 500, iter_sampling = 1000,
        refresh = 0
      )
    }
  }

hdbayes documentation built on Nov. 21, 2025, 1:07 a.m.