Linear Mixed Models using Gibbs Sampling
Description
This function runs linear mixed models of the following form:
\mathbf{y} = \mathbf{Xb} + \mathbf{Z}_{1}\mathbf{u}_1 + \mathbf{Z}_{2}\mathbf{u}_2 + \mathbf{Z}_{3}\mathbf{u}_3 + ... + \mathbf{Z}_{k}\mathbf{u}_k + \mathbf{e}
The function allows to include an artificial number of independent random effects with each of them being assumed to follow: MVN(\mathbf{0},\mathbf{I}σ^2_{u_k}). If the covariance structure of one random effect is assumed to follow some \mathbf{G} then it it necessary to construct the design matrix for that random effect as described in Waldmann et al. (2008): \mathbf{F} = \mathbf{ZG}^{1/2}.
Usage
1 2 
Arguments
y 
vector of phenotypes 
X 
Fixed effects design matrix of type: 
random 
list of design matrices for random effects  every element of the list represents one random effect and may be of type: 
par_random 
list of options for random effects. If passed, the list must have as many elements as

niter 
number of iterations 
burnin 
number of iterations to be discarded as burnin 
verbose 
prints progress to the screen 
scale_e 
scale parameter for the inverse chisquare prior for the residuals 
df_e 
degrees of freedom for the inverse chisquare prior for the residuals 
seed 
seed for the random number generator. If omitted, a seed will be generated based on machine and time 
Details
At this point the function allows to specify the method for any random term as: 'random' or 'BayesA'. 'random' assumes a common variance for all levels of the random effect, 'BayesA' assumes every level of the random effect to have its own distribution and variance as described in Meuwissen et al. (2001). A wider range of methods is available in the excellent BGLRpackage, which also allows phenotypes to be discrete (de los Campos et al. 2013).
The focus of this function is to allow solving highdimensional problems that are mixtures of sparse and dense features in the design matrices.
The computational expensive parts of the Gibbs Sampler are parallelized as described in Fernando et al. (2014).
Note that the parallel performance highly depends on the number of observations and features present in the design matrices.
It is highly recommended to set the number of threads for less than 10000 observations (length of phenotype vector) to 1 using: set_num_threads(1)
before
running a model. Even for larger sample sizes the parallel performance still depends on the dimension of the feature matrices.
Good results in terms of parallel scaling were observed starting from 50000 observations and more than 10000 features (i.e. number of markers).
Single threaded performance is very good thanks to smart computations during gibbs sampling (Fernando, 2013 (personal communication), de los Campos et al., 2009)
and the use of efficient Eigenmethods for dense and sparse algebra.
The function is capable of running Single Step Bayesian Regression (Fernando et al., 2014).
Value
List of 3 + number of random effects:
Residual_Variance 
List of 4:

Predicted 
numeric vector of predicted values 
Effect_1 
List of 4:

Susequently as many additional items as random effects of the following form
Effect_k 
List of 4:

Author(s)
Claas Heuer
Credits: Xiaochen Sun (Iowa State University, Ames) gave strong assistance in the theoretical parts and contributed in the very first implementation of the Gibbs Sampler. Essential parts were adopted from the BayesCimplementation of Rohan Fernando and the BLRpackage of Gustavo de los Campos. The idea of how to parallelize the single site Gibbs Sampler came from Rohan Fernando (2013).
References
de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J. M. Cotes. "Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree." Genetics 182, no. 1 (May 1, 2009): 37585. doi:10.1534/genetics.109.101501.
Waldmann, Patrik, Jon Hallander, Fabian Hoti, and Mikko J. Sillanpaa. "Efficient Markov Chain Monte Carlo Implementation of Bayesian Analysis of Additive and Dominance Genetic Variances in Noninbred Pedigrees." Genetics 179, no. 2 (June 1, 2008): 110112. doi:10.1534/genetics.107.084160.
Meuwissen, T., B. J. Hayes, and M. E. Goddard. "Prediction of Total Genetic Value Using GenomeWide Dense Marker Maps." Genetics 157, no. 4 (2001): 181929.
de los Campos, Gustavo, Paulino Perez Rodriguez, and Maintainer Paulino Perez Rodriguez. "Package 'BGLR,'" 2013. ftp://128.31.0.28/pub/CRAN/web/packages/BGLR/BGLR.pdf.
See Also
clmm.CV, cGBLUP, cGWAS.emmax
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  ### Running a model with an additive and dominance effect
## Not run:
# generate random data
rand_data(500,5000)
### compute the relationship matrices
G.A < cgrm.A(M,lambda=0.01)
G.D < cgrm.D(M,lambda=0.01)
### generate the list of design matrices for clmm
random = list(t(chol(G.A)),t(chol(G.D)))
### specify options
par_random = list(list(method="random",scale=var(y)/2,df=5),
list(method="random",scale=var(y)/10,df=5))
### run
fit < clmm(y,random=random,par_random=par_random,niter=5000,burnin=2500)
### inspect results
str(fit)
## End(Not run)
