7. BLUP estimation from Linear Mixed Model | R Documentation |
Solves the Linear Mixed Model and calculates the Best Linear Unbiased Predictor (BLUP)
fitBLUP(y, X = NULL, Z = NULL, K = NULL,
U = NULL, d = NULL, varU = NULL,
varE = NULL, intercept = TRUE, BLUP = TRUE,
method = c("REML","ML"),
interval = c(1E-9,1E9), tol = 1E-8,
maxiter = 1000, n.regions = 10,
verbose = TRUE)
y |
(numeric matrix) Response variable. It can contain >1 columns, each of them will be fitted separately |
X |
(numeric matrix) Design matrix for fixed effects. When |
Z |
(numeric matrix) Design matrix for random effects. When |
K |
(numeric matrix) Kinship relationship matrix |
U, d |
(numeric matrix/vector) Eigenvectors and eigenvalues from eigen value decomposition of G = U D U' |
varU, varE |
(numeric) Genetic and residual variances. When both |
intercept |
|
BLUP |
|
method |
(character) Either 'REML' (Restricted Maximum Likelihood) or 'ML' (Maximum Likelihood) |
tol |
(numeric) Maximum error between two consecutive solutions (convergence tolerance) when finding the root of the log-likelihood's first derivative |
maxiter |
(integer) Maximum number of iterations to run before convergence is reached |
interval |
(numeric vector) Range of values in which the root is searched |
n.regions |
(numeric) Number of regions in which the searched 'interval' is divided for local optimization |
verbose |
|
The basic linear mixed model that relates phenotypes with genetic values is of the form
y = X b + Z u + e
where y is a vector with the response, b is the vector of fixed effects, u is the vector of the (random) genetic values of the genotypes, e is the vector of environmental residuals (random error), and X and Z are design matrices conecting the fixed and genetic effects with replicates. Genetic values are assumed to follow a Normal distribution as u ~ N(0,σ2uK), and the error terms are assumed e ~ N(0,σ2eI).
The vector of genetic values g = Z u will therefore follow g ~ N(0,σ2uG) where G = Z K Z'. In the un-replicated case, Z = I is an identity matrix, and hence g = u and G = K.
The predicted values utrn = (ui), i = 1,2,...,ntrn, corresponding to observed data (training set) are derived as
utrn = B (ytrn - Xtrnb)
where B is a matrix of weights given by
B = σ2uGtrn (σ2uGtrn + σ2eI)-1
where Gtrn is the sub-matrix corresponding to the training set. This matrix can be rewritten as
B = Gtrn (Gtrn + θI)-1
where θ = σ2e/σ2u is the residual/genetic variances ratio representing a ridge-like shrinkage parameter.
The matrix H = Gtrn + θI in the above equation can be used to obtain predictions corresponding to un-observed data (testing set), utst, by
B = Gtst,trn (Gtrn + θI)-1
where Gtst,trn is the sub-matrix of G corresponding to the testing set (in rows) and training set (in columns).
Solutions are found using the GEMMA (Genome-wide Efficient Mixed Model Analysis) approach (Zhou & Stephens, 2012). First, the Brent's method is implemented to solve for the genetic/residual variances ratio (i.e., 1/θ) from the first derivative of the log-likelihood (either REML or ML). Then, variances σ2u and σ2e are calculated. Finally, b is obtained using Generalized Least Squares.
Returns a list object that contains the elements:
b
: (vector) fixed effects solutions (including the intercept).
u
: (vector) random effects solutions.
varU
: random effect variance.
varE
: residual variance.
h2
: heritability.
convergence
: (logical) whether Brent's method converged.
method
: either 'REML' or 'ML' method used.
VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.
Zhou X, Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824
require(SFSI)
data(wheatHTP)
index = which(Y$trial %in% 1:10) # Use only a subset of data
Y = Y[index,]
M = scale(M[index,])/sqrt(ncol(M)) # Subset and scale markers
G = tcrossprod(M) # Genomic relationship matrix
y = as.vector(scale(Y[,"E1"])) # Scale response variable
# Training and testing sets
tst = which(Y$trial %in% 1:2)
trn = which(!Y$trial %in% 1:2)
yNA <- y
yNA[tst] <- NA
fm = fitBLUP(yNA, K=G)
plot(y[tst],fm$u[tst]) # Predicted vs observed values in testing set
cor(y[tst],fm$u[tst]) # Prediction accuracy in testing set
cor(y[trn],fm$u[trn]) # Prediction accuracy in training set
fm$varU # Genetic variance
fm$varE # Residual variance
fm$h2 # Heritability
fm$b # Intercept
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.