Nothing
#' linreg.stan
#' @description Fits one of three models of linear regression to a dataset in a Bayesian
#' framework. Bayesian sampling is conducted in the Stan software via the 'rstan'
#' package. Users supply vectors of observations and, unless an "ols" model is chosen,
#' a covariance matrix. Users can alter parameters for model-parameter prior distributions
#' and Bayesian sampling settings. See details.
#' @details The models that can be chosen are:
#' 1. "ols": basic ordinary least-squares regression.
#'
#' \code{Y = XB + epsilon}; where \code{epsilon~N(0,randsig2*I)} and randsig2 is the scaling parameter for identity matrix I.
#'
#' 2. "pgls": phylogenetic generalized least-squares regression in which residuals
#' covary according to covariance matrix C. This is a lineage-pair covariance
#' matrix (if analyzing lineage-pair data) or a phylogenetic vcv matrix
#' (if analyzing species data).
#'
#' \code{Y = XB + epsilon}; where \code{epsilon~N(0,physig2*C)} and physig2 is the scaling parameter for covariance matrix C.
#'
#' 3. "pgls.mm" a mixed-effects version of pgls in which the residuals have both
#' uncorrelated and correlated components, where the latter are structured according
#' to covariance matrix C.
#'
#' \code{Y = XB + epsilon + u}; where \code{epsilon~N(0,randsig2*I)} and \code{u~N(0,physig2*C)}.
#'
#'
#' **NOTE**: For models 2 and 3, the user must supply a covariance matrix.
#'
#' **Prior Distributions for Regression Model Parameters**:
#' The underlying stan models assume the following prior distributions for model
#' parameters
#' 1. Regression coefficients: Gaussian prior (users can set prior mean and sd).
#' 2. `physig2`: lognormal prior (users can set prior mean and sd).
#' 3. `randsig2`: cauchy prior (users can set location and scale parameters of prior).
#' @usage linreg.stan(des, y, model="ols", covmat=NULL, iter=2000,
#' chains=4, cores=4, coef.u=0, coef.sd=10, physig2.u=-1, physig2.sd=1,
#' randsig2.loc=0, randsig2.sc=2.5, ...)
#'
#' @param des A vector of predictor variable observations OR, in the case of multiple predictors, a matrix in which each column is a vector of observations of a given predictor. Function will add a column of 1s to make this a design matrix whose first column corresponds to the model intercept (unless such a column already exists).
#' @param y A vector of response variable observations.
#' @param model Choice of "ols", "pgls", or "pgls.mm"; defaults to ols.
#' @param covmat Covariance matrix for model residuals (a lineage-pair covariance matrix if analyzing lineage-pair data or a phylogenetic vcv matrix if analyzing species data).
#' @param iter Number of iterations to run on each chain; defaults to 2000 (more are often necessary).
#' @param chains Number of Markov chains to run; defaults to 4.
#' @param cores Number of cores to use.
#' @param ... additional arguments passed to \code{rstan::sampling()}, including control parmeters (see \code{rstan::sampling()} documentation)
#' @param coef.u Mean of the Gaussian prior for each preditor variable coefficient; defaults to 0.
#' @param coef.sd SD of the Gaussian prior for each preditor variable coefficient; defaults to 10.
#' @param physig2.u Mean of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to -1.
#' @param physig2.sd SD of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to 1.
#' @param randsig2.loc Location parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 0.
#' @param randsig2.sc Scale parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 2.5.
#' @return A list containing two elements: (1) the posterior distribution of model parameters, and (2) the log-likelihood of the posteriors for use in downstream analyses (e.g. the calculation of model fitting metrics like loo or waic)
#' @examples
#' \donttest{
#' ## Fit regression models with and without covariance matrix
#' # Note: data were simulated with Coef[1]=1 (intercept), Coef[2]=0.8 (slope)
#' # Load a data simulated with a non-independent response observations
#' data(data3)
#' # Also load the lineage-pair covariance matrix that arose from those simulations
#' data(sim.cov.pairs)
#' # Fit an OLS model
#' result1 = linreg.stan(des=data3[,3], y=data3[,4], cores=2)
#' # Fit an pgls model
#' result2 = linreg.stan(des=data3[,3], y=data3[,4], model="pgls", covmat=sim.cov.pairs, cores=2)
#'
#' # Compare posterior parameter estimates
#' result1[[1]]
#' result2[[1]]
#'
#' # Compare the fit of the two models via loo and waic
#' loo1 = loo::loo(result1[[2]])
#' loo2 = loo::loo(result2[[2]])
#' waic1 = loo::waic(result1[[2]])
#' waic2 = loo::waic(result2[[2]])
#' loo1
#' loo2
#' waic1
#' waic2
#' loo::loo_compare(loo1, loo2)
#' loo::loo_compare(waic1, waic2)
#'
#' # Extend the comparison by fitting a pgls.mm model
#' result3 = linreg.stan(des=data3[,3], y=data3[,4], model="pgls.mm",
#' covmat=sim.cov.pairs, cores=2)
#'
#' # Compare posterior parameter estimates
#' result1[[1]]
#' result2[[1]]
#' result3[[1]]
#'
#' # Compare the fit of the three models via loo and waic
#' loo3 = loo::loo(result3[[2]])
#' waic3 = loo::waic(result3[[2]])
#' loo::loo_compare(loo1, loo2, loo3)
#' loo::loo_compare(waic1, waic2, waic3)
#'}
#' @references
#' Anderson, S. A. S., et al. *In review*. The comparative analysis of lineage-pair data.
#' @export
linreg.stan = function(des, y, model="ols", covmat=NULL, iter=2000, chains=4, cores=4, coef.u=0,
coef.sd=10, physig2.u=-1, physig2.sd=1, randsig2.loc=0, randsig2.sc=2.5, ...) {
# Check if covariance matrix supplied appropriately
if(is.null(covmat) & model %in% c("pgls", "pgls.mm")) stop("User must supply a covariance matrix for this model")
if(!is.null(covmat) & model == "ols") stop("covariance matrix supplied for 'ols' model. Choose alternative model or don't supply covmat")
# Add an intercept column to design matrix if it's not already present
if(!all(as.matrix(des)[,1]==1)) des=cbind(rep(1, length(y)), des)
# Code the model
model.type = switch(model, ols = 1, pgls=2, pgls.mm=3, stop("'model' given to 'linreg.stan' must be one of 'ols', 'pgls', or 'pgls.mm'"))
# Prep stan data
stan.dat = list(N=length(y), K=ncol(des), Y=y, X=des, coef_mean=coef.u, coef_sd=coef.sd, sig2_mean=physig2.u, sig2_sd=physig2.sd, sig2_loc=randsig2.loc, sig2_sc=randsig2.sc, model_type=model.type)
# Add covariance matrix if necessary
stan.dat$Cp = if(model=="ols") diag(length(y)) else covmat
# Fit the model
fit = sampling(object = stanmodels$linreg, data = stan.dat, iter = iter, chains = chains, cores=cores, ...)
# Get the parameters from the summary
pars = if(!is.null(covmat)) rstan::summary(fit)$summary[c(1:4,dim(fit)[3]),] else pars = rstan::summary(fit)$summary[c(1:3,dim(fit)[3]),]
ll = loo::extract_log_lik(fit, parameter_name = "loglik")
# Package and return the results list
res = list(pars=round(pars,2), ll=ll)
return(res)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.