Manipulation of genetic data (SNPs), computation of Genetic Relationship Matrix, Linkage Disequilibrium, etc. Efficient algorithms for Linear Mixed Model (AIREML, diagonalisation trick).
Gaston offers functions for efficient manipulation of large genotype (SNP) matrices, and state-of-the-art implementation of algorithms to fit Linear Mixed Models, that can be used to compute heritability estimates or to perform association tests.
Thanks to the packages
functions are mainly written in C++.
Many functions are multithreaded;
the number of threads can be setted through
It is advised to try several values for the number of threads, as
using too many threads might be conterproductive due to an important
Some functions have a
verbose argument, which controls the
function verbosity. To mute all functions at once you can use
options(gaston.verbose = FALSE).
An S4 class for genotype matrices is defined, named
Each row corresponds to an individual, and each column to a SNP. They can
be read from files using
and saved using
write.bed.matrix. The function
In first approach, a bed.matrix behaves as a "read-only" matrix containing only
0, 1, 2 and NAs, unless the genotypes are standardized (use
They are stored in a compact form, each genotype being coded on 2 bits (hence
4 genotypes per byte).
Bed.matrices can be converted to numerical matrices with
and multiplied with numeric vectors or matrices with
feature can be used e.g. to simulate quantitative phenotypes, see a basic example in the example
It is possible to subset bed.matrices just as base matrices, writing e.g.
x[1:100,] to extract the first 100 individuals, or
for extract the SNPs 1000 to 1999 for these 100 individuals. The use of logical
vectors for subsetting is allowed too. The functions
select.snps can also be used for
subsetting with a nice syntax.
Some basic descriptive statistics can be added to a bed.matrix with
gaston 1.4, this function is called by default by all functions that create a bed.matrix, unless
options(gaston.auto.set.stats = FALSE) was set.
Hardy-Weinberg Equilibrium can be tested for all SNPs with
If X is a standardized n x q genotype matrix, a Genetic Relationship Matrix (GRM) of the individuals can be computed as
GRM = XX<e2><80><99>/(q-1)
where q is the number of SNPs.
This computation is done by the function
GRM. The eigen decomposition of the GRM produces
the Principal Components (PC) of the data set. If needed, the loadings
corresponding to the PCs can be retrieved using
Doing the above crossproduct in the reverse order produces a moment estimate of the Linkage Disequilibrium:
LD = X<e2><80><99>X/(n-1)
where n is the number of individuals. This computation is done by the function
LD (usually, only parts of the whole LD matrix is computed). This method is
also used by
LD.thin to extract a set of SNPs in low linkage disequilibrium
(it is often recommended to perform this operation before computing the GRM).
lmm.aireml is a function for linear mixed models parameter estimation
and BLUP computations.
The model considered is of the form
Y = X beta + omega_1 + ... + omega_k + epsilon
with omega_i ~ N(0, tau_i K_i) for i = 1, ..., k and epsilon ~ N(0, sigma^2 I_n).
Note that very often in genetics a mixed model is written as
Y = X beta + Zu + epsilon
with Z a standardized genotype matrix, and u ~ N(0, tau I_q). In that case, denoting omega = Zu, omega ~ N(0,tau ZZ') and letting K=ZZ' we get a mixed model of the previous form.
When k=1 in the above general model (only one random term omega), the likelihood
can be computed very efficiently using the eigen decomposition of
K = var(omega). This "diagonalization trick"
is used in
lmm.diago, to compute
the likelihood and for parameter estimation, respectively.
Two small functions complete this set of functions:
simulate data under a linear mixed model, and
random.pm, to generate
random positive matrices. Both are used in examples and can be useful for data simulation.
Herv<c3><a9> Perdry and Claire Dandine-Roulland
Maintainer: Herv<c3><a9> Perdry
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.