Fit a dyadic mixed model to pedigree data

Share:

Description

Dyadic mixed model analysis with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
dmm(mdf, fixform = Ymat ~ 1, components = c("VarE(I)", "VarG(Ia)"),
    cohortform = NULL, posdef = T, gls = F,
    glsopt = list(maxiter = 200, bdamp = 0.8, stoptol = 0.01),
    dmeopt = "qr", ncomp.pcr = "rank", relmat = "inline", dmekeep = F,
    dmekeepfit = F, traitspairwise=F, traitsblockwise=F,...)

## Default S3 method:
dmm(mdf, fixform = Ymat ~ 1, components = c("VarE(I)", "VarG(Ia)"),
    cohortform = NULL, posdef = T, gls = F,
    glsopt = list(maxiter = 200, bdamp = 0.8, stoptol = 0.01),
    dmeopt = "qr", ncomp.pcr = "rank", relmat = "inline", dmekeep = F,
    dmekeepfit = F, traitspairwise=F, traitsblockwise=F,...)

Arguments

mdf

Either a dataframe or an object of class mdf which is a list containing a dataframe and one or more relationship matrices, as made by function mdf() If missing the variables are searched for in the standard way.

fixform

A formula specifying the fixed-effect part of the model and the response variate(s). Response should be a matrix for multi-trait models. Default is Ymat ~ 1 that is a response matrix called Ymat and a fitted mean effect.

components

A simple vector specifying each of the components to be partitioned from the residual variation after fitting fixed effects. Residual is assumed to be the level of variation attributed to individuals. The components given here are assumed to sum to phenotypic variance so that if there are cross-effect covariances (eg "CovG(Ia,Ma)" and "CovG(Ma,Ia)") they need to be both present. The default is a simple partitioning into individual environmental ("VarE(I)") and individual additive genetic ("VarG(Ia)") variation. It is necessary to actually specify the individual environmental variance, as it is actually fitted as a parameter in the dyadic model, not obtained from residual variation. For a full list of available components see make.ctable() function. The default is components = c("VarE(I)", "VarG(Ia)"). For a brief introduction to the notation for components see Note section below. For a comprehensive definition of notation see 'dmmOverview.pdf' Section 6.

cohortform

A formula specifying the effects which define cohort grouping of individuals. For example cohortform = ~ Year. A cohort is a grouping of individuals experiencing the same external environmental conditions, eg a group of sheep born and reared together, commonly referred to as a drop. Cohort should not contain DId - the dam's Id. If one needs to consider littermates, function dmm() provides a means of combining maternal environmental and cohort variance components to achieve this. The default is NULL - ie no cohort defined.

posdef

A logical flag: should the matrices of variance components be constrained to be positive definite? If TRUE each matrix of cross-trait (co)variances for each "Varxxx" component defined in components will be individually positive definite, and each cross-effect covariance (if "Covxxx" components are defined) will be constrained such that the corresponding correlation is in the bounds -1 to 1. If FALSE all components will be as estimated. The default is TRUE.

gls

A logical flag: should dmm() go on after fitting fixed effects by OLS and estimating components, to re-fit the fixed effects by GLS and re-estimate the components. If TRUE the option posdef=T is enforced, as the GLS iteration will fail if matrixes do not remain positive definite. Default is FALSE - ie do the OLS analysis only.

glsopt

A list object containing variables used to control the GLS iteration :

maxiter

Maximum number of iterations. Default 200.

bdamp

Factor used to damp the setting of new GLS-b coefficients at each round of iteration. A value of 1.0 means no damping.

stoptol

Value below which the sum of absolute deviations of new from old coefficients must fall to achieve convergence.

The GLS iteration normally converges very rapidly. If it does not, consider changing the model, before fiddling with these parameters.

dmeopt

One of four regression techniques used to solve the dyadic model equations (DME's) to estimate components:

"qr"

The default option is to use the QR algorithm directly on the dyadic model equations. This is most efficient, but does not produce a fit object for looking at further statistics such as with anova or plot or resid.

"lm"

This option calls the lm() function to solve the DME's. This is equivalent to QR, but lm() produces a fit object which can optionally be part of the returned dmm object.

"lmrob"

This option calls the lmrob() function from package robustbase to solve the DME's. Robust regression only works for single-trait models. A fit object can be returned.

"pcr"

This option calls the mvr() function from package pls with argument method="svdpc". Principal component regression is intended to be used where there are multicollinearities among the components to be estimated. The number of principal components is set to the rank of the DME matrix, but can be overwritten (see ncomp.pcr argument). A fit object can be returned.

If gls=TRUE the same dmeopt option is also used during the GLS iteration.

ncomp.pcr

Number of principal components to use during a principal components regression (see dmeopt argument). Default is the rank of the DME matrix

relmat

One of two ways of setting up the relationship matrices required to estimate the variance components:

"inline"

The additive genetic relationship matrix will be calculated by inline code each time dmm() is run. OK for small datasets. Do not use if non-additive relationship matrices are required. This is the default.

"withdf"

The required relationship matrices are assumed to be pre-stored in the object of class mdf defined in the first argument. See function mdf() for setting up relationship matrices in the mdf object. Function mdf() makes extensive use of the package nadiv.

dmekeep

Logical flag: should the dyadic model equations be returned as part of the dmm() return object? Default is FALSE. The DME's may be a large object.

dmekeepfit

Logical flag: should the fit object from solving the DME's be returned as part of the dmm() return object. Default is FALSE. The fit object may be large.

traitspairwise

Logical flag: should the traits be analysed two at a time in all permutations? Default is FALSE, in which case traits are all analysed simultaneously. This option is useful if traits have different replication. If this option is TRUE dmm() will return an object of class dmmarray, being an array of class dmm objects with the rows and columns named by the trait names. See the Value section for the structure of an object of class dmmarray. For this option the dataframe specified as argument mdf must be made with function mdf() and must contain the matrix of traits 'Ymat' plus the individual traits as columns (keep=T option for mdf()).

traitsblockwise

Logical flag: should the traits be analysed in defined blocks of traits in all permutations of pairs of blocks? Default is FALSE. This option is useful if blocks of traits have different replication. If this option is TRUE, the ellipsis option defining blocks must be present. For this option the dataframe specified as argument mdf must be made with function mdf() and must contain the matrix of traits 'Ymat' plus the individual traits as columns (keep=T option for mdf()).

...

Ellipsis argument: if traitsblockwise is TRUE this argument should contain a number of block definitions of the form

Block1=c("Trait1","Trait2"),Block2=c("Trait3","Trait4"),...

, specifying the traits to be present in each block. In this case dmm() will return an object of class dmmblockarray, being an array of class dmm objects with the rows and columns named "Block1" and "Block2", together with lists of the trait names present in each block. See the Value section for the structure of an object of class dmmblockarray.

Details

The minimum requirement to use dmm() directly on a simple dataframe is that it contain columns named "Id", "SId", "DId", and "Sex" plus any fixed effects and traits. The "Id" column must contain identifiers which are unique, numeric, and sequential ( ie they must be numbered 1 to n with unit increments, no duplicates and no gaps). Any fixed effects must be factors, and traits must be numeric. Every "SId" and "DId" code must appear also in the "Id" column even if this results in NA's in every other column. If these requirements are not met, process the dataframe with mdf() before using dmm() Also if any relationship matrix other than additive is required, pre-processing with mdf() is necessary.

Missing values for either traits or fixed effects are simply omitted by dmm() before any processing. There is an heirarchy of models fitted by dmm(). There is one fixed model and one dyadic model, for all traits, and only individuals for which all traits are present are included in the model fit steps. In contrast, all individuals are included in the pedigree and in setting up relationship matrices. Hence the number of individuals with data, may be less than the number of individuals in the pedigree. If options traitspairwise or traitsblockwise are used both the fixed model and the dyadic model will be the same for all traitpairs (or traitblocks) but the replication may differ, so missing values for some traits or sets of traits can be handled in this way.

The (co)variance which is partitioned into components is always the residual (co)variance from the fixed effects model. This is assumed to represent the observed variation among individuals. There is, at this stage, no provision for models with more than one error level, so split plot and repeated measures designs are not provided for. There is nothing to stop one formulating the appropriate fixed effects model and doing the aov() step, but partitioning of any (co)variance other than residual is not at present provided.

The naming conventions for components may seem a little strange. They are designed to be all ASCII and therefore usable by R as rownames or colnames. The function make.ctable() returns a list of all available components ( as returnobject$all), as well as a spectrum of sublists which are used internally. The available components are fully documented in the pdf file dmmOverview.pdf Section 6. Most of the names are obvious (eg "VarG(Ia)" means variance-genetic-individual-additive). The term individual distinguishes individual or direct genetic or environmental effects from maternal genetic or environmental effects.

It is important for the proper estimation of phenotypic (co)variance that any cross-effect covariance components are fitted in symmetric pairs ( for example "CovE(I,M)" and "CovE(M,I)"). For one trait these will be identical, so the covariance will simply enter twice in the sum, as required. However cross-trait-cross-effect covariances will not be identical and the sum, which is a phenotypic covariance, requires that the symmetric pair be present.

In addition to the value returned, dmm() makes a number of lines of screen output which show each processing step and some minimal model check numbers.

Value

An object of class dmm is returned whenever options traitspairwise and traitsblockwise are both FALSE (ie a normal multi-trait analysis). This object is basically a list of some or all of the following items:

aov

An object of class aov containing the results of fitting the fixed effects (specified in argument fixform) by OLS using a call to the aov() function

mdf

Not currently used - ignore.

fixform

Formula specifying fixed effects fitted.

b

Coefficients fitted for fixed effects.

seb

Standard errors for fixed effect coefficients.

vara

Matrix of (co)variances of individuals after adjusting for fixed effects fitted by OLS.

totn

Total number of individuals in the analysis (ie with data)

degf

Degrees of freedom remaining after fitting fixed effects.

dme.wmat

The dyadic model equations matrix. Present only if dmekeep=TRUE. The name 'wmat' refers to the matrix W in equations 12 and 13 of the document dmmOverview.pdf.

dme.psi

The dyadic model equations right hand sides (or traits) matrix. Present only if dmekeep=TRUE. The name 'psi' refers to the matrix Ψ in equations 12 and 13 of the document dmmOverview.pdf.

dme.fit

The fit object from solving dyadic model equations. Its form depends on the dmeopt argument. For dmeopt="qr" (the default) it just contains the QR transform of the DME's. For dmeopt="lm" it contains the object returned by function lm() For dmeopt="lmrob" it contains the object returned by function lmrob() For dmeopt="pcr" it contains the object returned by function mvr() Present only if dmekeepfit=TRUE.

dme.mean

Means of columns of dyadic model equation matrix

dme.var

Variances of columns of dyadic model equation matrix

dme.correl

Correlations between columns of dyadic model equation matrix.

pcr.loadings

Loadings from principal component regression. Only present when dmeopt="pcr".

dmeopt

Value of dmeopt argument in call to dmm() function. Specifies method used to solve DME's, and hence the type of fit object (if one is present).

siga

Variance component estimates obtained by solving DME's.

sesiga

Standard errors of variance component estimates.

vard

Residual (co)variance matrix from solving DME's

degfd

Degrees of freedom for residual covariance matrix.

component

Vector of component names from the component argument in call to dmm() function, with the element "VarP(I)" appended.

correlation

Estimated genetic or environmental correlation coefficient corresponding to each component

correlation.variance

Estimated sampling variance of each correlation coefficient.

correlation.se

Estimated standard error of each correlation coefficient.

fraction

Estimated proportion of variance (relative to the total phenotypic (co)variance)) corresponding to each component. For example the proportion corresponding to component "VarG(Ia)" is the usual additive genetic heritability.

fraction.variance

Estimated sampling variance of each proportion.

fraction.se

Estimated standard error of each proportion.

variance.components

Variance component estimates ( as in siga) but with their total which is phenotypic (co)variance appended.

variance.components.se

Standard errors of variance component estimates, including phenotypic (co)variance.

phenotypic.variance

Phenotypic (co)variances as a trait x trait matrix.

phenotypic.variance.se

Standard errors of phenotypic (co)variances as a matrix.

observed.variance

Observed variance, adjusted for fixed effects, in current population. Will differ from phenotypic variance because related animals are correlated. All the estimated components, and their sum ( which is phenotypic variance) are estimates of what the components would be in a population of unrelated individuals.

call

The call made to dmm() function to generate this object

gls

Another list containing all the above items, but for fixed effects fitted by GLS instead of OLS. Will only be present if gls=TRUE and if the gls iteration converged successfully.

If option traitspairwise is TRUE, the value returned by dmm() is an object of class dmmarray, which is an array of which each element is an object of class dmm representing an analysis for one pair of traits. The rows and columns of the array are named using trait names.

If option traitsblockwise is TRUE, the value returned by dmm() is an object of class dmmblockarray, which is a list of two items named array and blocks. List item array is an array of which each element is an object of class dmm representing an analysis for one pair of blocks of traits. The rows and columns of the array are named using block names. List item blocks is a list with one element per block, each containig the set of trait names present in the block.

The functions condense.dmmarray and condense.dmmblockarray are available to facilitate recombining an array of dmm objects into a single object of class dmm with the variance component and genetic parameter estimates appropriately assembled into multi-trait matrices. For example, one would use these functions to prepare an input file for function gresponse.

Note

Two methods of estimating fixed effects are offered by dmm() - termed OLS-b and GLS-b. OLS-b is computationally simple and non-iterative and is the default. Use OLS-b for preliminary runs until the set of components to be estimated from the dyadic model equations is settled. Use GLS-b for the final run. OLS-b leads to MINQUE estimates of the variance components and OLS estimates of the fixed effects. GLS-b leads to bias-corrected-ML estimates of the variance components , and GLS estimates of the fixed effects.

The notation for (co)variance components was designed to use ASCII characters only so that it could be usable as dimnames in R. Some examples should make it clear

"VarE(Ia)"

variance environmental individual additive

"VarG(Ia)"

variance genetic individual additive

"VarG(Ma)"

variance genetic maternal additive

"CovG(Ia,Ma)"

covariance genetic individual additive x maternal additive

"VarGs(Ia)"

variance genetic sexlinked individual additive

For a full coverage of notation see the document dmmOverview.pdf Section 6.

Author(s)

Neville Jackson

References

The document dmmOverview.pdf has a bibliography of literature references.

See Also

Functions mdf(), make.ctable(), condense.dmmarray(), condense.dmmblockarray(). Packages nadiv, robustbase, pls

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
library(dmm)
# Prepare the dataset sheep.df
data(sheep.df)
sheep.mdf <- mdf(sheep.df,pedcols=c(1:3),factorcols=c(4:6),ycols=c(7:9),
             sexcode=c("M","F"),relmat=c("E","A","D"))
# The above code renumbers the pedigree Id's, makes columns "Year","Tb","Sex"
#    into factors,
#    assembles columns "CWW",Diam","Bwt" into a matrix (called 'Ymat')
#    for multivariate processing,
#    and sets up the environmental, additive genetic, and dominance genetic
#    relationship matrices.

# a simple model with individual environmenmtal and 
#    additive genetic components (default)
sheep.fit <- dmm(sheep.mdf, Ymat ~ 1 + Year + Sex,gls=TRUE)
# view the components and fixed effect coefficients ( 2 traits only)
summary(sheep.fit,traitset=c("Cww","Diam"),gls=TRUE)
# view the genetic parameters
gsummary(sheep.fit,traitset=c("Cww","Diam"))

rm(sheep.df)
rm(sheep.mdf)
rm(sheep.fit)

# Note: sheep.df is a small demo dataset. The results are illustrative, 
# but not meaningful.
# for a tutorial and fully documented examples see {\em dmmOverview.pdf}

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.