glm_standard: GLM equipped with the 'standard' prior evaluated

View source: R/glm_standard.R

glm_standardR Documentation

GLM equipped with the 'standard' prior evaluated

Description

Program for fitting a GLM equipped with the 'standard' prior evaluated in Boonstra and Barbaro, which is the regularized horseshoe.

Usage

glm_standard(
  y,
  x_standardized,
  family = "binomial",
  p,
  q,
  beta_orig_scale,
  beta_aug_scale,
  local_dof = 1,
  global_dof = 1,
  slab_dof = Inf,
  slab_scale = 15,
  mu_sd = 5,
  intercept_offset = NULL,
  only_prior = F,
  mc_warmup = 1000,
  mc_iter_after_warmup = 1000,
  mc_chains = 1,
  mc_thin = 1,
  mc_stepsize = 0.1,
  mc_adapt_delta = 0.9,
  mc_max_treedepth = 15,
  return_as_CmdStanMCMC = FALSE,
  seed = sample.int(.Machine$integer.max, 1),
  slab_precision = NULL
)

Arguments

y

(vector) outcomes corresponding to the type of glm desired. This should match whatever datatype is expected by the stan program.

x_standardized

(matrix) matrix of numeric values with number of rows equal to the length of y and number of columns equal to p+q. It is assumed without verification that each column is standardized to whatever scale the prior expects - in Boonstra and Barbaro, all predictors are marginally generated to have mean zero and unit variance, so no standardization is conducted. In practice, all data should be standardized to have a common scale before model fitting. If regression coefficients on the natural scale are desired, they be easily obtained through unstandardizing.

family

(character) Similar to argument in glm with the same name, but here this must be a character, and currently only 'binomial' (if y is binary) or 'gaussian' (if y is continuous) are valid choices.

p

see q

q

(nonneg. integers) numbers, the sum of which add up to the number of columns in x_standardized. For the standard prior, this distinction is only needed if a different constant scale parameter (beta_orig_scale, beta_aug_scale), which is the constant 'c' in the notation of Boonstra and Barbaro, is used.

beta_orig_scale

see beta_aug_scale

beta_aug_scale

(pos. real) constants indicating the prior scale of the horseshoe. Both values correspond to 'c / sigma' in the notation of Boonstra and Barbaro, because that paper never considers beta_orig_scale!=beta_aug_scale. Use the function solve_for_hiershrink_scale to calculate this quantity. If 'y' is binary, then sigma doesn't actually exist as a parameter, and it will be set equal to 2 inside the function. If 'y' is continuous, then sigma is equipped with its own weak prior. In either case, it is not intended that the user scale by sigma "manually".

local_dof

(pos. integer) number indicating the degrees of freedom for lambda_j. Boonstra and Barbaro always used local_dof = 1. Choose a negative value to tell the function that there are no local hyperparameters.

global_dof

(pos. integer) number indicating the degrees of freedom for tau. Boonstra and Barbaro always used global_dof = 1. Choose a negative value to tell the function that there is no global hyperparameter.

slab_dof

see slab_scale

slab_scale

(pos. real) these control the slab-part of the regularized horseshoe. Specifically, in the notation of Boonstra and Barbaro, d^2~InverseGamma(slab_dof/2, slab_scale^2*slab_dof/2). In Boonstra and Barbaro, d was fixed at 15, and you can achieve this by leaving these at their default values of slab_dof = Inf and slab_scale = 15.

mu_sd

(pos. real) the prior standard deviation for the intercept parameter mu

intercept_offset

(vector) vector of 0's and 1's equal having the same length as y. Those observations with a value of 1 have an additional constant offset in their linear predictor, effectively a different intercept. This is useful to jointly regress two datasets in which it is believed that the regression coefficients are the same but not the intercepts and could be useful (but was not used) in the simulation study to compare to a benchmark, namely if both the historical and current datasets were available but there is a desire to adjust for potentially different baseline prevalences.

only_prior

(logical) should all data be ignored, sampling only from the prior?

mc_warmup

number of MCMC warm-up iterations

mc_iter_after_warmup

number of MCMC iterations after warm-up

mc_chains

number of MCMC chains

mc_thin

every nth draw to keep

mc_stepsize

positive stepsize

mc_adapt_delta

between 0 and 1

mc_max_treedepth

max tree depth

return_as_CmdStanMCMC

(logical) should the function return the CmdStanMCMC object asis or should a summary of CmdStanMCMC be returned as a regular list

seed

seed for the underlying STAN model to allow for reproducibility

slab_precision

(pos. real) the slab-part of the regularized horseshoe, this is equivalent to (1/d)^2 in the notation of Boonstra and Barbaro. If specified, it is assumed that you want a fixed slab component and will take precedence over any provided values of slab_dof and slab_scale; however, slab_precision will be going away in a future release, and the proper way to specify a fixed slab component with with precision 1/d^2 for some number d is through slab_dof = Inf and slab_scale = d.

Value

list object containing the draws and other information.

Examples


data(historical)

foo = glm_standard(y = historical$y_hist,
                   x_standardized = historical[,2:5],
                   family = "binomial",
                   p = 4,
                   q = 0,
                   beta_orig_scale = 0.0231,
                   beta_aug_scale = 0.0231,
                   local_dof = 1,
                   global_dof = 1,
                   mu_sd = 5,
                   intercept_offset = NULL,
                   only_prior = 0,
                   mc_warmup = 200,
                   mc_iter_after_warmup = 200,
                   mc_chains = 2,
                   mc_thin = 1,
                   mc_stepsize = 0.1,
                   mc_adapt_delta = 0.99,
                   mc_max_treedepth = 15);

 data(current)

 foo = glm_standard(y = current$y_curr,
                    x_standardized = current[,2:11],
                    family = "binomial",
                    p = 4,
                    q = 6,
                    beta_orig_scale = 0.0223,
                    beta_aug_scale = 0.0223,
                    local_dof = 1,
                    global_dof = 1,
                    mu_sd = 5,
                    intercept_offset = NULL,
                    only_prior = 0,
                    mc_warmup = 200,
                    mc_iter_after_warmup = 200,
                    mc_chains = 2,
                    mc_thin = 1,
                    mc_stepsize = 0.1,
                    mc_adapt_delta = 0.99,
                    mc_max_treedepth = 15);


umich-biostatistics/AdaptiveBayesianUpdates documentation built on May 21, 2024, 5:29 a.m.