GridLMM_posterior: Evaluate the posterior of a linear mixed model using the...
In deruncie/GridLMM: Efficient Mixed Models for GWAS with multiple Random Effects

View source: R/GridLMM.R

GridLMM_posterior

R Documentation

Evaluate the posterior of a linear mixed model using the Grid-sampling algorithm

Description

Performs approximate posterior inference of a linear mixed model by evaluating the posterior over a grid of values of the variance component proportions. The grid evaluation uses a heuristic to avoid traversing the entire grid. This should work well as long as the posterior is unimodal.

Usage

GridLMM_posterior(
  formula,
  data,
  weights = NULL,
  relmat = NULL,
  normalize_relmat = TRUE,
  h2_divisions = 10,
  h2_prior = function(h2s, n) 1/n,
  a = 0,
  b = 0,
  inv_prior_X = 0,
  target_prob = 0.99,
  thresh_nonzero = 10,
  thresh_nonzero_marginal = 0,
  V_setup = NULL,
  save_V_folder = NULL,
  diagonalize = T,
  mc.cores = my_detectCores(),
  verbose = T
)

Arguments

`formula`	A two-sided linear formula as used in `lmer` describing the fixed-effects and random-effects of the model on the RHS and the response on the LHS. Note: correlated random-effects are not implemented, so using one or two vertical bars (`\|`) or one is identical. At least one random effect is needed.
`data`	A data frame containing the variables named in `formula`.
`weights`	An optional vector of observation-specific weights.
`relmat`	A list of matrices that are proportional to the (within) covariance structures of the group level effects. The names of the matrices should correspond to the columns in `data` that are used as grouping factors. All levels of the grouping factor should appear as rownames of the corresponding matrix.
`normalize_relmat`	Should the relmats be normalized to mean(diag)==1?
`h2_divisions`	Starting number of divisions of the grid for each variance component.
`h2_prior`	Function that takes two arguments: 1) A vector of `h2s` (ie variance component proportions), and 2) An integer giving the number of vertices in the full grid. The function should return a non-negative value giving the prior weight to the grid cell corresponding to `h2s`.
`a`, `b`	Shape and Rate parameters of the Gamma prior for the residual variance `\sigma^2`. Setting both to zero gives a limiting "default" prior.
`inv_prior_X`	Vector of values for the prior precision of each of the fixed effects (including an intercept). Will be recycled if necessary.
`target_prob`	See Details.
`thresh_nonzero`	See Details.
`thresh_nonzero_marginal`	See Details.
`V_setup`	Optional. A list produced by a GridLMM function containing the pre-processed V decompositions for each grid vertex, or the information necessary to create this. Generally saved from a previous run of GridLMM on the same data.
`save_V_folder`	Optional. A character vector giving a folder to save pre-processed V decomposition files for future / repeated use. If null, V decompositions are stored in memory
`diagonalize`	If TRUE and the model includes only a single random effect, the "GEMMA" trick will be used to diagonalize V. This is done by calculating the SVD of K, which can be slow for large samples.
`mc.cores`	Number of cores to use for parallel evaluations.
`verbose`	Should progress be printed to the screen?
`svd_K`	If TRUE and `nrow(K) < nrow(Z)`, then the SVD is done on `K` instead of `ZKZ^T`
`drop0_tol`	Values closer to zero than this will be set to zero when forming sparse matrices.

Details

Posterior inference involves an adaptive grid search. Generally, we start with a very coarse grid (with as few as 2-3 vertices per variance component) and then progressively increase the grid resolution focusing only on regions of high posterior probability. This is controlled by h2_divisions, target_prob, thresh_nonzero, and thresh_nonzero_matrginal. The sampling algorithm is as follows:

Start by evaluating the posterior at each vertex of a trial grid with resolution m
Find the minimum number of vertices needed to sum to target_prob of the current (discrete) posterior. Repeat for the marginal posteriors of each variance component#'
If these numbers are smaller than thresh_nonzero or thresh_nonzero_matrginal, respectively, form a new grid by increasing the grid resolution to m/2. Otherwise, STOP.
Begin evaluating the posterior at the new grid only at those grid vertices that are adjacent (in any dimension) to any of the top grid vertices in the old grid.
Re-evaluate the distribution of the posterior over the new grid. If any new vertices contribute to the top target_prob fraction of the overall posterior, include these in the "top" set and return to step 4. Note - the prior weights for the grid vertices must be updated each time the grid increases in resolution.
Repeat steps 4-5 until no new grid vertices contribute to the "top" set.
Repeat steps 2-6 until a STOP is reached at step 3.

Note: Default parameters for priors give flat (improper) priors. These should be used with care, especially for calculations of Bayes Factors.

Value

A list with three elements:

`h2s_results`	A data frame with each row an evaluated grid vertex, with the first `l` columns giving the `h^2`'s, and the final column the corresponding posterior mass
`h2s_solutions`	A list with the parameters of the NIG distribution for each grid vertex
`V_setup`	The `V_setup` object for this model. Can be re-passed to this function (or other GridLMM functions) to re-fit the model to the same data.