R/lmmlite is an R package for fitting linear mixed models in the context of genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping. At present, it is not intended for "production" use. I wrote this to try to get a better understanding of how to fit these models, and I've not yet put any effort into checking arguments and omitting cases with missing values.
The code closely follows Nick Furlotte's pylmm.
The sort of model I want to fit is $y = X\beta + \epsilon$ where $\epsilon$ is multivariate normal with mean 0 and variance $\sigma^2_g K + \sigma^2_e I$, where $K$ is a known kinship matrix and $I$ is the identity. In pylmm, the focus is on $\sigma^2 = \sigma^2_g + \sigma^2_e$ and the heritability $h^2 = \sigma^2_g / \sigma^2$.
library(lmmlite) data(recla)
We'll focus here on the recla
dataset that I include with the
package. This is data on a set of diversity outcross mice, from
Recla et al. (2014) and
Logan et al. (2013). The data are
originally from the
QTL Archive/Mouse Phenotype Database
and I've also
placed them online
in the format used by R/qtl2. I used
qtl2geno to calculate genotype
probabilities from the MUGA SNP array data. The data include
r nrow(recla$kinship)
mice, and form a list with three components:
the estimated kinship matrix, a matrix with r ncol(recla$pheno)
phenotypes, and a covariate matrix that contains an intercept (all
1's) and sex (0 for female and 1 for male).
library(lmmlite) data(recla)
The package includes two implementations of the fit of linear mixed models (LMMs): plain R, and C++ (using the Rcpp and RcppEigen packages, the latter being an interface to the Eigen C++ linear algebra library).
R/lmmlite includes just four functions: eigen_rotation()
,
getMLsoln()
, calcLL()
, and fitLMM()
. In each case, use
use_cpp=FALSE
to use the plain R implementation and use_cpp=TRUE
(the default) to use the C++ implementation.
The first function, eigen_rotation()
, calculates the eigen
decomposition of the kinship matrix and "rotates" the phenotypes and
covariates by pre-multiplying them with the tranpose of the matrix of
eigenvectors. This rotation greatly simplifies later calculations by
turning a general least squares problem into a weighted least squares
problem.
Care must be taken with missing values in the phenotypes and covariates. In practice, one must work with batches of phenotypes that have a common missing data pattern, and first remove individuals with missing values. Here, we'll focus on the first phenotype.
e <- eigen_rotation(recla$kinship, recla$pheno[,1], recla$covar)
The output is a list containing four components: Kva
is the vector
of eigenvalues of the kinship matrix, Kva_t
is the transposed matrix
of eigenvectors, y
is the rotated phenotype matrix, and X
is the rotated
covariate matrix.
The next function getMLsoln()
will estimates the coefficients
($\beta$) and residual variance ($\sigma^2 = \sigma^2_g + \sigma^2_e$)
for a given value of the heritability ($h^2 = \sigma^2_g / \sigma^2$).
It takes, as input, a single value for the heritability, and then
Kva
, y
, and X
, as returned from eigen_rotation()
. A further
argument reml
, indicates whether REML will later be used, in which
case the log determinant of the matrix $X' W X$, where $W$ is a
diagonal matrix of weights, is calculated.
ml_soln <- getMLsoln(0.5, e$Kva, e$y, e$X)
The result is a list containing beta
and sigmasq
, with the
residual sum of squares (RSS) and log det($X' W X$) included as
attributes.
The function calcLL()
calculates the log likelihood for a given
value of the heritability. (It can also take a vector of
heritabilities.) It takes the same arguments as getMLsoln()
, and in
practice one would probably not call getMLsoln()
directly.
hsq <- seq(0, 1, by=0.01) ll <- calcLL(hsq, e$Kva, e$y, e$X)
We can plot the results as follows.
plot(hsq, ll, type="l", lwd=2, las=1, xlab="heritability", ylab="log likelihood")
If you call calcLL()
with a single heritability value, the results
include the estimates of beta
and sigmasq
as attributes.
one_ll <- calcLL(0.5, e$Kva, e$y, e$X) attr(one_ll, "beta") attr(one_ll, "sigmasq")
The final function, fitLMM()
, optimizes the log likelihood to
estimate the heritability. The result is a list containing beta
,
sigmasq
, hsq
, sigmasq_g
, sigmasq_e
, and loglik
.
(out <- fitLMM(e$Kva, e$y, e$X))
The fitLMM()
function includes an argument check_boundary
; if
TRUE
(the default), the 0 and 1 boundaries are checked explicitly, in seeking to
estimate the heritability.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.