View source: R/GridLMM_GWAS_set.R
GridLMM_GWAS_set | R Documentation |
Performs a GWAS of set-tests using GridLMM algorithm. Can perform LRTs, Wald-tests, or calculate Bayes Factors. By default, uses the targeted grid search heuristic (fast algorithm), though can perform a full grid search as well.
GridLMM_GWAS_set(
formula,
data,
weights = NULL,
X,
X_ID = "ID",
set_matrix,
centerX = FALSE,
scaleX = FALSE,
fillNAX = FALSE,
X_map = NULL,
relmat = NULL,
normalize_relmat = TRUE,
h2_step = 0.01,
h2_start = NULL,
h2_start_tolerance = 0.001,
max_steps = 100,
method = c("REML"),
algorithm = c("Fast", "Full"),
inv_prior_X = NULL,
target_prob = 0.99,
proximal_markers = NULL,
proximal_Xs = NULL,
V_setup = NULL,
save_V_folder = NULL,
diagonalize = T,
mc.cores = my_detectCores(),
verbose = T
)
formula |
A two-sided linear formula as used in |
data |
A data frame containing the variables named in |
weights |
An optional vector of observation-specific weights. |
X |
Matrix of markers with |
X_ID |
Column of |
centerX, scaleX, fillNAX |
TRUE/FALSE for each. Applied to the |
X_map |
Optional. Data frame with information on each marker such as chromosome, position, etc. Will be appended to the results |
relmat |
Either:
1) A list of matrices that are proportional to the (within) covariance structures of the group level effects.
2) A list of lists with elements ( |
h2_step |
Step size of the grid |
h2_start |
Optional. Matrix with each row a vector of |
h2_start_tolerance |
Optional. Grid size for GridLMM_ML in finding ML/REML solutions for the mull model. |
max_steps |
Maximum iterations of the heuristic algorithm per marker. |
method |
One of 'REML', 'ML', or 'BF'. 'REML' wimplies a Wald-test. 'ML' implies Maximum Likelihood evaluation, with the LRT. 'BF' does posterior evaluation and calculates Bayes Factors. |
algorithm |
Either 'Fast' or 'Full'. See details. |
inv_prior_X |
Vector of values for the prior precision of each of the fixed effects (including an intercept). Will be recycled if necessary. |
target_prob |
see Details |
proximal_markers |
A list of integer vectors with length equal to the number of columns of |
proximal_Xs |
Optional. A list of matrices to be used for downdating GRMs. If multiple GRMs are calculated from markers,
this list can have multiple elements. Each matrix should have rownames like |
V_setup |
Optional. A list produced by a GridLMM function containing the pre-processed V decompositions for each grid vertex, or the information necessary to create this. Generally saved from a previous run of GridLMM on the same data. |
save_V_folder |
Optional. A character vector giving a folder to save pre-processed V decomposition files for future / repeated use. If null, V decompositions are stored in memory |
diagonalize |
If TRUE and the model includes only a single random effect, the "GEMMA" trick will be used to diagonalize V. This is done by calculating the SVD of K, which can be slow for large samples. |
mc.cores |
Number of processor cores used for parallel evaluations. Note that this uses 'mclapply', so the memory requires grow rapidly with |
verbose |
Should progress be printed to the screen? |
test_formula |
test_formula One-sided formula for the alternative model (ML or BF), or full model (REML) to be applied to each test (ie marker, or column of |
reduced_formula |
One-sided formula for the reduced model. Same format as |
GridLMM performs approximate likelihood or posterior-based inference for linear mixed models efficiently by
finding solutions to many models in parallel. Rather than optimizing to high precision for each separate model, GridLMM
finds "good enough" solutions that satisfy many tests at once - so the expensive calculations can be re-used. It does
this by trying variance components on a grid, and selecting the best grid cell for each model. The Full
algorithm
performs a full grid search over all variance component parameters. The Fast
algorithm uses heuristics to reduce
the number of grid cells that need to be evaluated - focusing from the maximum likelihood solutions under a null model
with no markers, and then working out to neighboring grid cells from there.
Posterior inference involves an adaptive grid search. Generally, we start with a very coarse grid (with as few as 2-3 vertices per variance component)
and then progressively increase the grid resolution focusing only on regions of high posterior probability. This is controlled
by h2_divisions
, target_prob
, thresh_nonzero
, and thresh_nonzero_matrginal
. The sampling algorithm is as follows:
Start by evaluating the posterior at each vertex of a trial grid with resolution m
Find the minimum number of vertices needed to sum to target_prob
of the current (discrete) posterior.
Repeat for the marginal posteriors of each variance component#'
If these numbers are smaller than thresh_nonzero
or thresh_nonzero_matrginal
, respectively, form a new grid
by increasing the grid resolution to m/2
. Otherwise, STOP.
Begin evaluating the posterior at the new grid only at those grid vertices that are adjacent (in any dimension) to any of the top grid vertices in the old grid.
Re-evaluate the distribution of the posterior over the new grid. If any new vertices contribute to the top target_prob
fraction of the
overall posterior, include these in the "top" set and return to step 4.
Note - the prior weights for the grid vertices must be updated each time the grid increases in resolution.
Repeat steps 4-5 until no new grid vertices contribute to the "top" set.
Repeat steps 2-6 until a STOP is reached at step 3.
A list with two elements:
results |
A data frame with each row the results of the association test for a column of |
setup |
A list with several objects needed for re-running the model, including |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.