| lmebreed | R Documentation |
Fits linear or generalized linear mixed models incorporating known relationships (e.g., genetic relationship matrices) and customized random effects (e.g., overlay models). Big-data genomic models can be fitted using the eigen decomposition proposed by Lee and Van der Werf (2006).
lmebreed(formula, data, REML = TRUE, control = list(), start = NULL,
verbose = 1L, subset, weights, na.action, offset, contrasts = NULL,
calc.derivs = FALSE, nIters=100,
# lmebreed special params
family = NULL, relmat = list(), addmat = list(), trace=1L,
dateWarning=TRUE, rotation=FALSE, rotationK=NULL, coefOutRotation=8,
returnFormula=FALSE, suppressOpt=FALSE, ...)
formula |
as in |
data |
as in |
REML |
as in |
control |
as in |
start |
as in |
verbose |
as in |
subset |
as in |
weights |
as in |
na.action |
as in |
offset |
as in |
contrasts |
as in |
calc.derivs |
as in |
nIters |
the number of iterations used by the |
family |
as in |
relmat |
an optional named list of relationship matrices between levels of a
given random effect (not the inverse).
Internally the Cholesky decomposition of those matrices is computed to adjust the
incidence matrices. The names of the elements in the list must correspond to
the names of slope factors for random-effects terms in the |
addmat |
an optional named list of customized incidence matrices.
The names of the elements must correspond to the names of grouping factors for
random-effects terms in the |
trace |
integer scalar. If > 0 verbose output is generated during the progress of the model fit. |
dateWarning |
a logical value indicating if you want to be warned when a new version of lme4breeding is available on CRAN. Default is TRUE. |
rotation |
a logical value indicating if you want to compute the eigen decomposition of the relationship matrix to rotate the response vector y and the fixed effect matrix X in order to accelerate the computation. This argument requires the dataset to be balanced and without missing data for the slope variable, intercept variables, and the response involved in the model. See details to understand more about this argument and vignettes for examples. |
rotationK |
an integer value indicating the number of eigen vectors to compute when the rotation argument is set to TRUE. By default all eigen vectors are computed. |
coefOutRotation |
a numeric value denoting the inter-quantile outlier coefficient to be used in the rotation of the response when using the eigen decomposition to avoid overshooting. |
returnFormula |
a logical value indicating if you want to only get the results from
the |
suppressOpt |
a logical value indicating if you want to force the model to use
your customized variance components without estimating them. It skips the
optimize(g)Lmer step to force the customized variance components. The variance components
should be provided in the
which returns a vector with theta values. |
... |
as in |
All arguments to this function are the same as those to the function
lmer except relmat and addmat which must be
named lists. Each name must correspond to the name of a grouping factor in a
random-effects term in the formula. The observed levels
of that factor must be contained in the rownames and columnames of the relmat.
Each relmat is the relationship matrix restricted
to the observed levels and applied to the model matrix for that term. The incidence
matrices in the addmat argument must match the dimensions of the final fit (pay
attention to missing data in responses).
When you use the relmat argument the square root of the relationship matrix
will be computed internally. Therefore, to recover the correct BLUPs for those effects
you need to use the ranef function which internally multiple the
obtained BLUPs by the square root of the relationship matrix one more time to recover the correct BLUPs.
The argument rotation applies the eigen decomposition proposed by Lee and Van der Werf in 2016
and makes the genetic evaluation totally sparse leading to incredible gains in speed compared
to the classical approach. Internally, the eigen decomposition UDU' is carried in the relationship
matrix. The U matrix is then taken to the n x n level (n being the number of records), and post-multiplied
by a matrix of records presence (n x n) using the element-wise multiplication of two matrices (Schur product).
By default is not activated since this may not provide the exact same variance components than other software due to
numerical reasons. If you would like to obtain the exact same variance components than other software you will
have to keep rotation=FALSE. This will slow down considerably the speed. Normally when the rotation is
activated and variance components differ slightly with other software they will still provide extremely similar
estimates at the speed of hundreds or thousands of times faster. Please consider this.
Additional useful functions are; tps for spatial kernels, rrm
for reduced rank matrices, atcg1234 for conversion of genetic markers,
overlay for overlay matrices, reshape for moving wide
format multi-trait datasets into long format.
Normally, using the optimizer argument inside the lmerControl keep
in mind that the number of iterations is called differently depending on the optimizer.
For Nelder_Mead, bobyqa and nlminbwrap is
called "maxfun" whereas for nloptwrap is called "maxeval".
This should be passed inside a list in the optCtrl argument. For example:
lmebreed(... ,
control = lmerControl(
optimizer="Nelder_Mead",
optCtrl=list(maxfun=100)
), ...
)
But here, we have created the nIters argument to control the number of iterations
in the optimizer. To predict values for unobserved levels you will need to impute the data and update
your model with the new dataset and the initial starting values:
newModel <- update(oldModel, suppressOpt = TRUE,
start=getME(oldModel, 'sigma'),
data=imputedData)
It is also worth mentioning that when the user does not provide the (g)lmerControl we take the freedom to specify it to avoid some common warning or error messages such as calculating the second derivatives for big models or not allowing to have a single record per treatment level:
control <- (g)lmerControl(
calc.derivs = FALSE,
restart_edge = FALSE,
check.nobs.vs.nlev = "ignore",
check.nobs.vs.rankZ = "ignore",
check.nobs.vs.nRE="ignore"
)
This is important to notice because if the user decides to specify its own controls then the user should also consider some of these arguments that are the defaults in lmebreed.
Methods
Some important methods to keep in mind are:
fixef: allows you to extract fixed coefficients (BLUEs)
vcov: allows you to extract the standard errors of fixed effects
ranef: allows you to extract random coefficients (BLUPs) and their standard errors
getCi: allows you to extract the predicted error variance (PEV) matrix from a model
mkMmeIndex: alllows you to extract the indices of the different fixed and random coefficients
predict: allows you to extract fixed coefficients (BLUEs)
Example Datasets
The package has been equiped with several datasets to learn how to use the lme4breeding package:
* DT_big simulated dataset containing 1K individuals in 50 environments to show how to fit big models.
* DT_btdata dataset contains an animal (birds) model.
* DT_cornhybrids dataset to perform genomic prediction in hybrid single crosses
* DT_cpdata dataset to fit genomic prediction models within a biparental population coming from 2 highly heterozygous parents including additive, dominance and epistatic effects.
* DT_fulldiallel dataset with examples to fit full diallel designs.
* DT_gryphon data contains an example of an animal model including pedigree information.
* DT_halfdiallel dataset with examples to fit half diallel designs.
* DT_h2 to calculate heritability
* DT_ige dataset to show how to fit indirect genetic effect models.
* DT_legendre simulated dataset for random regression model.
* DT_mohring datasets with examples to fit half diallel designs.
* DT_polyploid to fit genomic prediction and GWAS analysis in polyploids.
* DT_sleepstudy dataset to know how to translate lme4 models to lme4breeding models.
* DT_technow dataset to perform genomic prediction in hybrid single crosses
* DT_wheat dataset to do genomic prediction in single crosses in species displaying only additive effects.
a lmebreed object.
Giovanny Covarrubias-Pazaran
Giovanny Covarrubias-Pazaran (2024). lme4breeding: enabling genetic evaluation in the age of genomic data. To be submitted.
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48.
Lee & Van der Werf (2016). MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics, 32(9), 1420-1422.
data(DT_example)
DT <- DT_example
A <- A_example
ansMain <- lmebreed(Yield ~ Env + (1|Name),
relmat = list(Name = A ),
data=DT)
vc <- VarCorr(ansMain); print(vc,comp=c("Variance"))
sigma(ansMain)^2 # error variance
BLUP <- ranef(ansMain, condVar=TRUE)
PEV <- lapply(BLUP, function(x){attr(x, which="postVar")}) # take sqrt() for SEs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.