circGLM: Fitting Bayesian circular General Linear Models

View source: R/circGLM.R

circGLMR Documentation

Fitting Bayesian circular General Linear Models

Description

The main function for running Bayesian circular GLMs. The model predicts some circular outcome θ and has the form

θ_i = β_0 + δ^t d_i + g(β^t x_i) + ε_i,

where β_0 is an circular intercept, δ are group difference parameters, d_i is a vector of dummy variables indicating group membership, g(.) is a link function given by g(x) = r atan(x) where r can be chosen, β is a vector of regression coefficients, x_i is a vector of covariates, and ε_i is a von Mises distributed error with residual concentration κ. This function returns a circGLM object which can be further investigated with standard functions plot, print, coef, residuals, and special functions mcmc_summary.circGLM for results for all MCMC chains, IC_compare.circGLM for a comparison of information criteria of one or more circGLM models, BF.circGLM to obtain Bayes Factors, and predict_function.circGLM to create a prediction function.

Usage

circGLM(
  formula,
  data,
  th,
  X = if (missing(th)) {
     model.matrix(formula, data)[, -1, drop = FALSE]
 } else {
 
       matrix(nrow = length(th), ncol = 0)
 },
  conj_prior = rep(0, 3),
  bt_prior_musd = c(mu = 0, sd = 1),
  starting_values = c(0, 1, rep(0, ncol(X))),
  bwb = rep(0.05, ncol(X)),
  Q = 1000,
  burnin = 0,
  thin = 1,
  kappaModeEstBandwith = 0.1,
  CIsize = 0.95,
  r = 2,
  returnPostSample = TRUE,
  returnLLEachPar = FALSE,
  output = "list",
  SDDBFDensEstMethod = "density",
  reparametrize = TRUE,
  groupMeanComparisons = TRUE,
  skipDichSplit = FALSE,
  centerOnly = FALSE
)

Arguments

formula

an optional object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

an optional data frame or object coercible by as.data.frame to a data frame, containing the variables in the model.

th

An optional vector of angles in radians or degrees, representing the circular outcome we want to predict. If any value is larger than 2 * pi, the input is transformed to radians. Otherwise, th is treated as radians.

X

An optional matrix of predictors, both continuous (linear) and categorical (as dummies). If categorical predictors are included, the dummies must already be made and they must be in (0, 1), because this is checked to be able to separate them from the continuous predictors, so that they are treated differently. If not, or if skipDichSplit = TRUE, they will be treated as linear predictors.

conj_prior

A numeric vector of length 3, containing, in that order, prior mean direction, prior resultant length, and prior sample size. Used for the von Mises part of the model, beta_0 and kappa.

bt_prior_musd

A numeric vector of length 2, or NA. If bt_prior_musd = NA, a constant prior is used. If it is a numeric vector of length 2, a Normal prior is used so that the first value is the mean, and the second value is the standard deviation.

starting_values

A numeric vector with starting values for the MCMC sampler. The length of the numeric vector should be 2 plus the number of columns in X.

bwb

A numeric vector, where the length is at least the number of continuous predictors. This is a tuning parameters used in sampling of beta. New values are sampled uniformly around the current value of beta with bounds at bt_cur - bwb and bt_cur + bwb. If reparametrize = TRUE, bwb corresponds to the bounds around the reparametrized values.

Q

Integer; The number of iterations to perform.

burnin

Integer; The number of burn-in (warmup) iterations.

thin

Integer; The number of parameters sets to sample for each parameter set that is saved. Can be used to save memory if Q is large.

kappaModeEstBandwith

Numeric between 0 and 1. The mode of kappa is estimated by taking the midpoint of a highest density interval. Specifically, it is the midpoint of the interval that contains kappaModeEstBandwith of the density of the posterior. Reasonable values are roughly between .005 and .2, although lower values may be reasonable if Q is large.

CIsize

The size of the credible intervals. This is used for all parameters, whether they use highest density intervals, circular quantiles or regular quantiles.

r

A numeric. r is the parameter used in the link function g(x, r) = r atan(x). If r = 2, the link function maps the real line to the full circle. If r < 2 the link functions maps to a proportion r / 2 of the circle. If r > 2, the link functions can reach the same are of the circle multiple times, which is unlikely to be useful, and should be used with caution.

returnPostSample

Logical indicating whether the MCMC sample itself should be returned. Should only be set to FALSE if there are memory constraints, as many subsequent analyses rely on the posterior sample directly.

returnLLEachPar

Logical indicating whether to return the likelihood for each observation and each sampled parameter set.

output

A character string, either "list" or "vector". In most situations, "list" should be used, which returns a circGLM object. The "vector" options is only useful for simulation studies etc.

SDDBFDensEstMethod

A character string, either "density" or "histogram". Gives the method to If SDDBFDensEstMethod = "density", the default, the Bayes Factors are computed based on the density estimate given by a spline interpolation of the density() function, so they are calculated in R rather than C++. This method should be much more stable than the histogram method, especially if there is low probability at 0 in the posterior. If SDDBFDensEstMethod = "histogram", Bayes factors are computed by estimating the density from the posterior sample as the midpoint of a histogram bar at 0 containing 10% of the data.

reparametrize

Logical; If TRUE, proposals for beta are drawn uniformly around a reparametrization zt = pi * atan(bt) / 2, so from zt_can = runif(1, zt - bwb, zt + bwb), which is then transformed back. Then, the proposals amount to the truncated cauchy pdf. If FALSE, proposals for beta are drawn on uniformly around beta, so from bt_can = runif(1, bt_cur - bwb, bt_cur + bwb).

groupMeanComparisons

Logical indicating whether mean comparisons in the form of Bayes Factors and posterior model probabilities should be computed.

skipDichSplit

Logical indicating whether to treat categorical predictor specially. Usually, skipDichSplit = TRUE should be used. This removes the arbitrary dependence on the labeling of categorical predictors and ensures that each group has a regression line of the same shape. If skipDichSplit = FALSE, the model will be the same as lm.circular from the package circular in that no separate treatment for categorical variables is performed.

centerOnly

Logical; If TRUE, the continuous predictors are centered only, not standardized. If FALSE, the continuous predictors are standardized.

Details

The model can be passed either as a combination of a formula and a data frame or matrix data, as in lm(), or as an outcome vector th and a matrix of predictors X. If categorical variables are to be included that are not yet given as dummies, formula syntax is recommended as this will automatically take care of dummy creation.

circGLM performs an MCMC sampler that generates a sample from the posterior of the intercept β_0, regression coefficients β, group mean direction differences δ and residual κ.

An attempt is made to split the predictor matrix X into continuous and categorical predictors. This is done so that the categorical predictors can be treated differently, which removes the arbitrary dependence on the labeling of categorical predictors and ensures that each group has a regression line of the same shape.

If categorical predictors are passed as factors, formula syntax is recommended, as it will automatically generate dummy variables. If the predictors are passed as a matrix X, categorical variables must be entered as dummy (dichotomous) variables.

The main results obtained are estimates and credible intervals for the parameters, posterior samples, and Bayes factors for various standard hypothesis comparisons.

As with all MCMC samplers, convergence must be checked, and tuning parameters bwb and reparametrize can be tweaked if the sampler converges poorly. The circGLM object that is returned contains proportions accepted which can be used to monitor performance.

Value

A circGLM object, which can be further analyzed with its associated plot.circGLM, coef.circGLM and print.circGLM functions.

An object of class circGLM contains the following elements (although some elements are not returned if not applicable):

b0_meandir

The posterior mean direction of β_0, the circular intercept.

b0_CCI

The circular credible interval of of β_0, the circular intercept.

kp_mean

The posterior mean of κ, the concentration parameter.

kp_mode

The posterior mode of κ, the concentration parameter.

kp_HDI

The CIsize highest posterior density interval of κ.

kp_propacc

The acceptance proportion of the rejection sampler for κ.

bt_mean

The posterior means of the regression coefficients β.

bt_CCI

The credible intervals of the regression coefficients β.

bt_propacc

The acceptance proportions of the Metropolis-Hastings sampler for β.

dt_meandir

The posterior mean directions of the group difference parameters, δ.

dt_CCI

The circular credible intervals of the group difference parameters, δ.

dt_propacc

The acceptance proportions of the Metropolis-Hastings sampler for δ.

zt_mean

The posterior means of the reparametrized coefficients ζ.

zt_mdir

The posterior mean directions of the reparametrized coefficients ζ.

zt_CCI

The credible intervals of the reparametrized coefficients ζ.

lppd

Ingredient for information criteria; Log posterior predictive density.

n_par

Ingredient for information criteria; Number of parameters.

ll_th_estpars

Ingredient for information criteria; Log-likelihood of the dataset at estimated parameter set.

ll_each_th_curpars

Ingredient for information criteria; Log-likelihood of each data point at each sampled parameter set.

ll_th_curpars

Ingredient for information criteria; Log-likelihood of the dataset at each sampled parameter set.

th_hat

An n-vector of predicted angles.

b0_chain

A Q-vector of sampled circular intercepts.

kp_chain

A Q-vector of sampled concentration parameters.

bt_chain

A matrix of sampled circular regression coefficients.

dt_chain

A matrix of sampled group difference parameters.

zt_chain

A matrix of sampled reparametrized circular regression coefficients.

mu_chain

A matrix of sampled group means.

AIC_Bayes

A version of the AIC where posterior estimates are used to compute the log-likelihood.

p_DIC

Ingredient for DIC.

p_DIC_alt

Ingredient for DIC.

DIC

The DIC.

DIC_alt

The alternative formulation of the DIC as given in Bayesian Data Analysis, Gelman et al. (2003).

p_WAIC1

Ingredient for WAIC1.

p_WAIC2

Ingredient for WAIC2.

WAIC1

The first formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).

WAIC2

The second formulation of the WAIC as given in Bayesian Data Analysis, Gelman et al. (2003).

DeltaIneqBayesFactors

A matrix of inequality Bayes factors for group difference parameters.

BetaIneqBayesFactors

A matrix of inequality Bayes factors for regression parameters.

BetaSDDBayesFactors

A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group difference parameters.

MuIneqBayesFactors

A matrix of inequality Bayes factors for group mean parameters.

MuSDDBayesFactors

A matrix of equality Bayes factors (Savage-Dickey Density ratio) for group mean parameters.

SavedIts

Number of iterations returned, without thinned iterations and burn-in.

TotalIts

Number of iterations performed, including thinning and burn-in.

TimeTaken

Seconds taken for analysis.

BetaBayesFactors

Matrix of Bayes factors for regression parameters.

MuBayesFactors

Matrix of Bayes factors for mean parameters.

all_chains

A matrix with all sampled values of all parameters.

Call

The matched call.

thin

Thinning factor used.

burnin

Burn-in used.

data_th

The original dataset.

data_X

Matrix of used continuous predictors.

data_d

Matrix of used categorical predictors.

data_stX

Matrix of used standardized categorical predictors.

r

Used parameter of the link function.

See Also

print.circGLM, plot.circGLM, coef.circGLM, BF.circGLM, residuals.circGLM, predict.circGLM, predict_function.circGLM, mcmc_summary.circGLM, IC_compare.circGLM.

Examples

dat <- generateCircGLMData()
m   <- circGLM(th ~ ., dat)
print(m)
print(m, type = "all")
plot(m, type = "tracestack")

keesmulder/CircGLMBayes documentation built on July 21, 2022, 3:43 a.m.