stan_clogit: Conditional logistic (clogit) regression models via Stan

Description Usage Arguments Details Value See Also Examples

View source: R/stan_clogit.R

Description

http://mc-stan.org/about/logo/ A model for case-control studies with optional prior distributions for the coefficients, intercept, and auxiliary parameters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
stan_clogit(
  formula,
  data,
  subset,
  na.action = NULL,
  ...,
  strata,
  prior = normal(),
  prior_covariance = decov(),
  prior_PD = FALSE,
  algorithm = c("sampling", "optimizing", "meanfield", "fullrank"),
  adapt_delta = NULL,
  QR = FALSE,
  sparse = FALSE
)

Arguments

formula, data, subset, na.action

Same as for glmer, except that any global intercept included in the formula will be dropped. We strongly advise against omitting the data argument. Unless data is specified (and is a data frame) many post-estimation functions (including update, loo, kfold) are not guaranteed to work properly.

...

Further arguments passed to the function in the rstan package (sampling, vb, or optimizing), corresponding to the estimation method named by algorithm. For example, if algorithm is "sampling" it is possibly to specify iter, chains, cores, refresh, etc.

strata

A factor indicating the groups in the data where the number of successes (possibly one) is fixed by the research design. It may be useful to use interaction or strata to create this factor. However, the strata argument must not rely on any object besides the data data.frame.

prior

The prior distribution for the regression coefficients. prior should be a call to one of the various functions provided by rstanarm for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":

Family Functions
Student t family normal, student_t, cauchy
Hierarchical shrinkage family hs, hs_plus
Laplace family laplace, lasso
Product normal family product_normal

See the priors help page for details on the families and how to specify the arguments for all of the functions in the table above. To omit a prior —i.e., to use a flat (improper) uniform prior— prior can be set to NULL, although this is rarely a good idea.

Note: Unless QR=TRUE, if prior is from the Student t family or Laplace family, and if the autoscale argument to the function used to specify the prior (e.g. normal) is left at its default and recommended value of TRUE, then the default or user-specified prior scale(s) may be adjusted internally based on the scales of the predictors. See the priors help page and the Prior Distributions vignette for details on the rescaling and the prior_summary function for a summary of the priors used for a particular model.

prior_covariance

Cannot be NULL when lme4-style group-specific terms are included in the formula. See decov for more information about the default arguments. Ignored when there are no group-specific terms.

prior_PD

A logical scalar (defaulting to FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.

algorithm

A string (possibly abbreviated) indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "optimizing" for optimization, "meanfield" for variational inference with independent normal distributions, or "fullrank" for variational inference with a multivariate normal distribution. See rstanarm-package for more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.

adapt_delta

Only relevant if algorithm="sampling". See the adapt_delta help page for details.

QR

A logical scalar defaulting to FALSE, but if TRUE applies a scaled qr decomposition to the design matrix. The transformation does not change the likelihood of the data but is recommended for computational reasons when there are multiple predictors. See the QR-argument documentation page for details on how rstanarm does the transformation and important information about how to interpret the prior distributions of the model parameters when using QR=TRUE.

sparse

A logical scalar (defaulting to FALSE) indicating whether to use a sparse representation of the design (X) matrix. If TRUE, the the design matrix is not centered (since that would destroy the sparsity) and likewise it is not possible to specify both QR = TRUE and sparse = TRUE. Depending on how many zeros there are in the design matrix, setting sparse = TRUE may make the code run faster and can consume much less RAM.

Details

The stan_clogit function is mostly similar in syntax to clogit but rather than performing maximum likelihood estimation of generalized linear models, full Bayesian estimation is performed (if algorithm is "sampling") via MCMC. The Bayesian model adds priors (independent by default) on the coefficients of the GLM.

The data.frame passed to the data argument must be sorted by the variable passed to the strata argument.

The formula may have group-specific terms like in stan_glmer but should not allow the intercept to vary by the stratifying variable, since there is no information in the data with which to estimate such deviations in the intercept.

Value

A stanreg object is returned for stan_clogit.

See Also

stanreg-methods and clogit.

The vignette for Bernoulli and binomial models, which has more details on using stan_clogit. http://mc-stan.org/rstanarm/articles/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dat <- infert[order(infert$stratum), ] # order by strata
post <- stan_clogit(case ~ spontaneous + induced + (1 | education), 
                    strata = stratum,
                    data = dat,
                    subset = parity <= 2,
                    QR = TRUE,
                    chains = 2, iter = 500) # for speed only

nd <- dat[dat$parity > 2, c("case", "spontaneous", "induced", "education", "stratum")]
# next line would fail without case and stratum variables                                 
pr <- posterior_linpred(post, newdata = nd, transform = TRUE) # transform=TRUE gives probabilities

# not a random variable b/c probabilities add to 1 within strata
all.equal(rep(sum(nd$case), nrow(pr)), rowSums(pr)) 
            

Example output

Loading required package: Rcpp
rstanarm (Version 2.17.4, packaged: 2018-04-13 01:51:52 UTC)
- Do not expect the default priors to remain the same in future rstanarm versions.
Thus, R scripts should specify priors explicitly, even if they are just the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores())
- Plotting theme set to bayesplot::theme_default().

SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 1).

Gradient evaluation took 0.000102 seconds
1000 transitions using 10 leapfrog steps per transition would take 1.02 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.371801 seconds (Warm-up)
               0.105303 seconds (Sampling)
               0.477104 seconds (Total)


SAMPLING FOR MODEL 'bernoulli' NOW (CHAIN 2).

Gradient evaluation took 4e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.4 seconds.
Adjust your expectations accordingly!


Iteration:   1 / 500 [  0%]  (Warmup)
Iteration:  50 / 500 [ 10%]  (Warmup)
Iteration: 100 / 500 [ 20%]  (Warmup)
Iteration: 150 / 500 [ 30%]  (Warmup)
Iteration: 200 / 500 [ 40%]  (Warmup)
Iteration: 250 / 500 [ 50%]  (Warmup)
Iteration: 251 / 500 [ 50%]  (Sampling)
Iteration: 300 / 500 [ 60%]  (Sampling)
Iteration: 350 / 500 [ 70%]  (Sampling)
Iteration: 400 / 500 [ 80%]  (Sampling)
Iteration: 450 / 500 [ 90%]  (Sampling)
Iteration: 500 / 500 [100%]  (Sampling)

 Elapsed Time: 0.281446 seconds (Warm-up)
               0.10516 seconds (Sampling)
               0.386606 seconds (Total)

[1] TRUE

rstanarm documentation built on Feb. 11, 2020, 5:06 p.m.