knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
#library(GAPIT3documentation) knitr::opts_chunk$set(fig.align = "center")
https://doi.org/10.3389/fgene.2020.566314
$$ \gamma = X\alpha + K\beta + \epsilon $$
Fixed effects: $X, \alpha$
Random effects: $K, \beta$, and $\epsilon$
$$ y_{i} = s_{i} + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } $$
The t-test is not implemented in GAPIT3, but is provided as a conceptual starting point for the more complex models that are implemented in GAPIT3. The t-test is a good starting point because the models implemented in GAPIT3 can be seen as models the build on the t-test by adding parameters.
The General Linear Model [@lipka2012gapit] (not to be confused with the generalized linear model)
$$ y = s + Q + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } $$
The $Q$ matrix may be obtained from clustering algorithms such as 'admixture', 'DAPC', 'kmeans', 'STRUCTURE', or 'fastSTRUCTURE.'
The Mixed Linear Model [@flint2005maize; @lipka2012gapit]
$$ y = s_{i} + Q + K + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } \ $$
The MLM model is 'mixed' in that it incorporates 'fixed effects' (population structure, $Q$) and 'random effects' (kinship, $K$) into the model.
Multiple Loci Mixed Model [@segura2012efficient]
$$ y = s_{i} + S + Q + K + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } $$
Uses forward and backward stepwise regression
Iterative!
Settlement of MLM Under Progressively Exclusive Relationship [@tang2016gapit]. SUPER is an advanced version of FaST-Select [@wang2014super].
$$ y = s_{i} + Q + K + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } $$
$K$ is derived from associated markers, The 'association' is determined by selecting markers that exhibit low linkage disequilibrium (a form of correlation) with markers of interest. The value of 'low' is specified by the user.
Iterative!
Linkage disequilibrium
SUPER is an advanced version of FaST-Select, The major difference between SUPER and FaST-Select is that SUPER uses bin approach to select associated markers.
Fixed and random model Circulating Probability Unification [@liu2016iterative,]
$$ y = s_{i} + S + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } \ y = K + \epsilon_{dnorm} $$
FarmCPU completely removes the confounding among kinship and markers by using a fixed effect model without a kinship derived either from all markers, or associated markers. Instead, kinship is derived from associated markers is used to select more associated markers using a maximum likelihood method.
This process overcome the model overfitting problems of stepwise regression. FarmCPU uses both fixed effect model and the random effect model iteratively
Iterative!
Linkage disequilibrium
Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway [@huang2019blink]
$$ y = s_{i} + S + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } \ y = S + \epsilon_{ \mathcal{N}(0,\,\sigma^{2}) } $$
The random effect model in FarmCPU to select associated markers using maximum likelihood method remains a high computing cost for large number of individuals. BLINK approximates the maximum likelihood using Bayesian Information Content (BIC) [BJK: Bayesian Information Criterion] in a fixed effect model to decrease the computational burden.
Iterative!
Linkage disequilibrium
Other major options
Compressed MLM (CMLM) individuals are replaced by groups determined by statistical clustering [@zhang2010mixed].
The optimal compression level (group number) is determined by a series of mixed models. The user specifies the search range and interval (group.from, group.to and group.by). Otherwise, MLM.
The mixed linear model (MLM) is an extreme case of CMLM where each individual is considered it's own group. The general linear model (GLM) is another extreme case of CMLM where all individuals are considered to be part of one group.
Enriched CMLM (ECMLM),
P3D/EMMAx In addition to implementing compression, GAPIT uses EMMAx/P3D1,17 to reduce computing time for MLM, CMLM, ECMLM, and SUPER. If specified, the additive genetic (σ2a) and residual (σ2e) variance components will be estimated prior to conducting GWAS. These estimates are then used for each SNP where a mixed model is fitted.
EMMA [@kang2008efficient].
Genomic best linear unbiased prediction (gBLUP)
Compressed best linear unbiased prediction (cBLUP)
SUPER genomic best linear unbiased prediction (sBLUP)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.