mdf: Prepare a dataframe for use with dmm function
In dmm: Dyadic Mixed Model for Pedigree Data

mdf	R Documentation

Prepare a dataframe for use with dmm function

Description

The function mdf() converts an R dataframe to one which meets the requirements of function dmm(), and may optionally append to that dataframe one or more relationship matrices obtained using package nadiv. Conversion involves renumbering pedigree Id's, removing duplicates, adding base animals, setting up columns to be fixed factors, putting multivariate traits into a matrix, defining the heterogametic sex, and optionally calling nadiv functions to append relationship matrices.

Usage

mdf(df, pedcols = c(1:3), factorcols = NULL, ycols = NULL, sexcode = NULL,
    keep = F, relmat = NULL)

Arguments

`df`	A dataframe object with columns labelled: Id An identifier for each individual SId An identifier for each sire DId An identifier for each dam Sex A coding for sex of each individual Fixed effect names Codings for each fixed effect Observation names Numerical values for each trait
`pedcols`	A vector specifying which columns of `df` contain the pedigree information (ie Id, SId, and DId). The vector can contain either column numbers, or column names. The dafault is c(1:3).
`factorcols`	A vector specifying which columns of `df` contain codes for factors which are to be used as either fixed effects or in defining cohort. The default is NULL.
`ycols`	A vector specifying which columns of `df` contain observations which are to become traits in a matrix. The default is NULL. The matrix is always called 'Ymat'.
`sexcode`	A vector of length 2 specifying the codings used for Sex, with the heterogametic sex code given first position. This should always be specified. The default is NULL. If the `Sex` column in the dataframe `df` is a character vector, then `sexcode` should be a charcter vector. If the `Sex` column in the dataframe `df` is an integer vector, then `sexcode` should be an integer vector. If the `Sex` column in the dataframe `df` is a character vector coerced to a factor, then `sexcode` should be a charcter vector. If the `Sex` column in the dataframe `df` is an integer vector coerced to a factor, then `sexcode` should be an integer vector.
`keep`	A logical variable. Are columns not specified by `pedcols`, `factorcols`, or `ycols` to be retained in the output object? Default is FALSE - ie unused columns are discarded.
`relmat`	A vector listing the relationship matrices to be generated and appended to the dataframe thus creating a return object of class `mdf`. Each relationship matrix has a code letter or name as follows: "E" An environmental correlation matrix. At present this produces an identity matrix - ie no environmental correlation effects. Must always be included. "A" Additive genetic relationship matrix. "D" Dominance relationship matrix. "Dsim" Dominance relationship matrix by the simulation method (see `nadiv`). "AA" Additive x additive epistatic relationship matrix. "AD" Additive x dominance epistatic relationship matrix. "DD" Dominance x dominance relationship matrix. "S" Sex linked additive genetic relationship matrix with no global dosage compensation ('ngdc' option see `nadiv`) "S.hori" Sex linked additive genetic relationship matrix with 'hori' dosage compensation model ( see `nadiv`) "S.hedo" Sex linked additive genetic relationship matrix with 'hedo' dosage compensation model ( see `nadiv`) "S.hoha" Sex linked additive genetic relationship matrix with 'hoha' dosage compensation model ( see `nadiv`) "S.hopi" Sex linked additive genetic relationship matrix with 'hopi' dosage compensation model ( see `nadiv`) Default is NULL - ie no relationship matrices constructed.

Details

If planning to use numerical observations as covariates in the fixed effects model under dmm() use argument keep=TRUE, so that the covariate columns are retained in the returned dataframe object.

The following actions are performed by mdf():

remove any Id's which are NA or duplicate (including first duplicate)
add SId's which do not match any Id as base Id's
add DId's which do not match any Id as base Id's
renumber all Id's
retain original Id's as row names
if keep=TRUE retain unused columns of dataframe
if keep=FALSE do not retain unused columns of dataframe
always retain Id, SId, DId, and factors
Sex should be one of the factors
transform Sex codes to NA if not in argument sexcode[]
take first entry in sexcode[] as the heterogametic sex
make columns in factorcols into factors
make columns in ycols into a matrix of traits called 'Ymat'
if relmat argument is present, compute the relationship matrices specified and make a returned list object mdf containing the modified dataframe as mdf$df and the relationship matrices as mdf$rel
if relmat argument is not present simply return the modified dataframe

Value

The return object is of class mdf if relationship matrices are requested, and is of class dataframe if relationship matrices are not requested.

An object of class mdf is a list containing the following items:

df: A dataframe conforming to the requirements of function dmm()
rel: A list of relationship matrices

An object of class dataframe as returned by function mdf() is a dataframe conforming to the requirements of function dmm()

Note

Individuals which appear in the SId or DId columns, but not in the Id column are assumed to be 'base individuals', ie they have unknown sire and dam. They will be given an Id and added to the dataframe, but their SId and DId and all data except for Sex coding will be set to NA, so they will be assumed unrelated and will not contribute data. It is important that 'base individuals' be present for relationship matrices to be calculated correctly.

Author(s)

Neville Jackson

Examples

library(dmm)

# prepare a multi-trait dataset from sheep.df
data(sheep.df)
# look at its structure
str(sheep.df)
# needs some work - Id, SId, DId are alphanumeric
#                 - Year is numeric and we want it as a factor
#                 - there are 3 traits (Cww,Diam,Bwt) to put into a trait matrix
sheep.mdf1 <- mdf(sheep.df,pedcols=c(1:3), factorcols=c(4:6), ycols=c(7:9),
             sexcode=c("M","F"))
# note the screen messages - it also had to add 2 base Id's for 2 of the dams
str(sheep.mdf1)
# so it returned a dataframe object with 44 observations
# and one of the columns is a matrix called 'Ymat'

# prepare a dataset requiring relationship matrices
sheep.mdf2 <- mdf(sheep.df,pedcols=c(1:3), factorcols=c(4:6), ycols=c(7:9),
             sexcode=c("M","F"),relmat=c("E","A"))
# note the screen messages - it now makes an object of class mdf
str(sheep.mdf2)
# so it returned a list object with 2 items
#    df - the dataframe
#   rel - a list of relationship matrices ( note those not requested are NULL)
#

dmm documentation built on June 22, 2024, 10:38 a.m.