clmm: Linear Mixed Models using Gibbs Sampling
In cgenpp: Parallel genomic evaluations

Description Usage Arguments Details Value Author(s) References See Also Examples

This function runs linear mixed models of the following form:

\mathbf{y} = \mathbf{Xb} + \mathbf{Z}_{1}\mathbf{u}_1 + \mathbf{Z}_{2}\mathbf{u}_2 + \mathbf{Z}_{3}\mathbf{u}_3 + ... + \mathbf{Z}_{k}\mathbf{u}_k + \mathbf{e}

The function allows to include an artificial number of independent random effects with each of them being assumed to follow: MVN(\mathbf{0},\mathbf{I}σ^2_{u_k}). If the covariance structure of one random effect is assumed to follow some \mathbf{G} then it it necessary to construct the design matrix for that random effect as described in Waldmann et al. (2008): \mathbf{F} = \mathbf{ZG}^{1/2}.

1 2	clmm(y, X = NULL , random = NULL, par_random = NULL, niter=10000, burnin=5000,scale_e=0,df_e=-2, verbose = FALSE, seed = NULL)

`y`	vector of phenotypes
`X`	Fixed effects design matrix of type: `matrix` or `dgCMatrix`. If omitted a column-vector of ones will be assigned
`random`	list of design matrices for random effects - every element of the list represents one random effect and may be of type: `matrix` or `dgCMatrix`
`par_random`	list of options for random effects. If passed, the list must have as many elements as `random`. Every element must be a list of 3: scale - scale parameter for the inverse chi-square prior df - degrees of freedom for the inverse chi-square prior method - method to be used for the random effects, may be: `random` or `BayesA`
`niter`	number of iterations
`burnin`	number of iterations to be discarded as burnin
`verbose`	prints progress to the screen
`scale_e`	scale parameter for the inverse chi-square prior for the residuals
`df_e`	degrees of freedom for the inverse chi-square prior for the residuals
`seed`	seed for the random number generator. If omitted, a seed will be generated based on machine and time

At this point the function allows to specify the method for any random term as: 'random' or 'BayesA'. 'random' assumes a common variance for all levels of the random effect, 'BayesA' assumes every level of the random effect to have its own distribution and variance as described in Meuwissen et al. (2001). A wider range of methods is available in the excellent BGLR-package, which also allows phenotypes to be discrete (de los Campos et al. 2013).

The focus of this function is to allow solving high-dimensional problems that are mixtures of sparse and dense features in the design matrices. The computational expensive parts of the Gibbs Sampler are parallelized as described in Fernando et al. (2014). Note that the parallel performance highly depends on the number of observations and features present in the design matrices. It is highly recommended to set the number of threads for less than 10000 observations (length of phenotype vector) to 1 using: set_num_threads(1) before running a model. Even for larger sample sizes the parallel performance still depends on the dimension of the feature matrices. Good results in terms of parallel scaling were observed starting from 50000 observations and more than 10000 features (i.e. number of markers). Single threaded performance is very good thanks to smart computations during gibbs sampling (Fernando, 2013 (personal communication), de los Campos et al., 2009) and the use of efficient Eigen-methods for dense and sparse algebra. The function is capable of running Single Step Bayesian Regression (Fernando et al., 2014).

List of 3 + number of random effects:

Residual_Variance

List of 4:

Posterior_Mean - Mean estimate of the residual variance
Posterior - Distribution of residual variance
scale_prior - scale parameter that has been assigned
df_prior - degrees of freedom that have been assigned

Predicted

numeric vector of predicted values

Effect_1

List of 4:

type - dense or sparse design matrix
method - method that has been used = "fixed"
scale_prior - scale parameter that has been assigned
df_prior - degrees of freedom that have been assigned
posterior - list of 1 = mean of the solution for fixed effects

Susequently as many additional items as random effects of the following form

Effect_k

List of 4:

type - dense or sparse design matrix
method - method that has been used
scale_prior - scale parameter that has been assigned
df_prior - degrees of freedom that have been assigned
posterior - list of 3
- estimates_mean - mean solutions for random effects
- variance_mean - mean variance
- variance - distribution of variance

Claas Heuer

Credits: Xiaochen Sun (Iowa State University, Ames) gave strong assistance in the theoretical parts and contributed in the very first implementation of the Gibbs Sampler. Essential parts were adopted from the BayesC-implementation of Rohan Fernando and the BLR-package of Gustavo de los Campos. The idea of how to parallelize the single site Gibbs Sampler came from Rohan Fernando (2013).

de los Campos, G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel, and J. M. Cotes. "Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree." Genetics 182, no. 1 (May 1, 2009): 375-85. doi:10.1534/genetics.109.101501.

Waldmann, Patrik, Jon Hallander, Fabian Hoti, and Mikko J. Sillanpaa. "Efficient Markov Chain Monte Carlo Implementation of Bayesian Analysis of Additive and Dominance Genetic Variances in Noninbred Pedigrees." Genetics 179, no. 2 (June 1, 2008): 1101-12. doi:10.1534/genetics.107.084160.

Meuwissen, T., B. J. Hayes, and M. E. Goddard. "Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps." Genetics 157, no. 4 (2001): 1819-29.

de los Campos, Gustavo, Paulino Perez Rodriguez, and Maintainer Paulino Perez Rodriguez. "Package 'BGLR,'" 2013. ftp://128.31.0.28/pub/CRAN/web/packages/BGLR/BGLR.pdf.

clmm.CV, cGBLUP, cGWAS.emmax

### Running a model with an additive and dominance effect
## Not run: 
# generate random data
rand_data(500,5000)

### compute the relationship matrices
G.A <- cgrm.A(M,lambda=0.01)
G.D <- cgrm.D(M,lambda=0.01)

### generate the list of design matrices for clmm
random = list(t(chol(G.A)),t(chol(G.D)))

### specify options
par_random = list(list(method="random",scale=var(y)/2,df=5),
		  list(method="random",scale=var(y)/10,df=5))

### run 
fit <- clmm(y,random=random,par_random=par_random,niter=5000,burnin=2500)

### inspect results
str(fit)

## End(Not run)