mdf: Prepare a dataframe for use with dmm function

mdfR Documentation

Prepare a dataframe for use with dmm function

Description

The function mdf() converts an R dataframe to one which meets the requirements of function dmm(), and may optionally append to that dataframe one or more relationship matrices obtained using package nadiv. Conversion involves renumbering pedigree Id's, removing duplicates, adding base animals, setting up columns to be fixed factors, putting multivariate traits into a matrix, defining the heterogametic sex, and optionally calling nadiv functions to append relationship matrices.

Usage

mdf(df, pedcols = c(1:3), factorcols = NULL, ycols = NULL, sexcode = NULL,
    keep = F, relmat = NULL)

Arguments

df

A dataframe object with columns labelled:

Id

An identifier for each individual

SId

An identifier for each sire

DId

An identifier for each dam

Sex

A coding for sex of each individual

Fixed effect names

Codings for each fixed effect

Observation names

Numerical values for each trait

pedcols

A vector specifying which columns of df contain the pedigree information (ie Id, SId, and DId). The vector can contain either column numbers, or column names. The dafault is c(1:3).

factorcols

A vector specifying which columns of df contain codes for factors which are to be used as either fixed effects or in defining cohort. The default is NULL.

ycols

A vector specifying which columns of df contain observations which are to become traits in a matrix. The default is NULL. The matrix is always called 'Ymat'.

sexcode

A vector of length 2 specifying the codings used for Sex, with the heterogametic sex code given first position. This should always be specified. The default is NULL. If the Sex column in the dataframe df is a character vector, then sexcode should be a charcter vector. If the Sex column in the dataframe df is an integer vector, then sexcode should be an integer vector. If the Sex column in the dataframe df is a character vector coerced to a factor, then sexcode should be a charcter vector. If the Sex column in the dataframe df is an integer vector coerced to a factor, then sexcode should be an integer vector.

keep

A logical variable. Are columns not specified by pedcols, factorcols, or ycols to be retained in the output object? Default is FALSE - ie unused columns are discarded.

relmat

A vector listing the relationship matrices to be generated and appended to the dataframe thus creating a return object of class mdf. Each relationship matrix has a code letter or name as follows:

"E"

An environmental correlation matrix. At present this produces an identity matrix - ie no environmental correlation effects. Must always be included.

"A"

Additive genetic relationship matrix.

"D"

Dominance relationship matrix.

"Dsim"

Dominance relationship matrix by the simulation method (see nadiv).

"AA"

Additive x additive epistatic relationship matrix.

"AD"

Additive x dominance epistatic relationship matrix.

"DD"

Dominance x dominance relationship matrix.

"S"

Sex linked additive genetic relationship matrix with no global dosage compensation ('ngdc' option see nadiv)

"S.hori"

Sex linked additive genetic relationship matrix with 'hori' dosage compensation model ( see nadiv)

"S.hedo"

Sex linked additive genetic relationship matrix with 'hedo' dosage compensation model ( see nadiv)

"S.hoha"

Sex linked additive genetic relationship matrix with 'hoha' dosage compensation model ( see nadiv)

"S.hopi"

Sex linked additive genetic relationship matrix with 'hopi' dosage compensation model ( see nadiv)

Default is NULL - ie no relationship matrices constructed.

Details

If planning to use numerical observations as covariates in the fixed effects model under dmm() use argument keep=TRUE, so that the covariate columns are retained in the returned dataframe object.

The following actions are performed by mdf():

  • remove any Id's which are NA or duplicate (including first duplicate)

  • add SId's which do not match any Id as base Id's

  • add DId's which do not match any Id as base Id's

  • renumber all Id's

  • retain original Id's as row names

  • if keep=TRUE retain unused columns of dataframe

  • if keep=FALSE do not retain unused columns of dataframe

  • always retain Id, SId, DId, and factors

  • Sex should be one of the factors

  • transform Sex codes to NA if not in argument sexcode[]

  • take first entry in sexcode[] as the heterogametic sex

  • make columns in factorcols into factors

  • make columns in ycols into a matrix of traits called 'Ymat'

  • if relmat argument is present, compute the relationship matrices specified and make a returned list object mdf containing the modified dataframe as mdf$df and the relationship matrices as mdf$rel

  • if relmat argument is not present simply return the modified dataframe

Value

The return object is of class mdf if relationship matrices are requested, and is of class dataframe if relationship matrices are not requested.

An object of class mdf is a list containing the following items:

df

A dataframe conforming to the requirements of function dmm()

rel

A list of relationship matrices

An object of class dataframe as returned by function mdf() is a dataframe conforming to the requirements of function dmm()

Note

Individuals which appear in the SId or DId columns, but not in the Id column are assumed to be 'base individuals', ie they have unknown sire and dam. They will be given an Id and added to the dataframe, but their SId and DId and all data except for Sex coding will be set to NA, so they will be assumed unrelated and will not contribute data. It is important that 'base individuals' be present for relationship matrices to be calculated correctly.

Author(s)

Neville Jackson

See Also

Functions dmm(), pedrenum(). Package nadiv

Examples

library(dmm)

# prepare a multi-trait dataset from sheep.df
data(sheep.df)
# look at its structure
str(sheep.df)
# needs some work - Id, SId, DId are alphanumeric
#                 - Year is numeric and we want it as a factor
#                 - there are 3 traits (Cww,Diam,Bwt) to put into a trait matrix
sheep.mdf1 <- mdf(sheep.df,pedcols=c(1:3), factorcols=c(4:6), ycols=c(7:9),
             sexcode=c("M","F"))
# note the screen messages - it also had to add 2 base Id's for 2 of the dams
str(sheep.mdf1)
# so it returned a dataframe object with 44 observations
# and one of the columns is a matrix called 'Ymat'

# prepare a dataset requiring relationship matrices
sheep.mdf2 <- mdf(sheep.df,pedcols=c(1:3), factorcols=c(4:6), ycols=c(7:9),
             sexcode=c("M","F"),relmat=c("E","A"))
# note the screen messages - it now makes an object of class mdf
str(sheep.mdf2)
# so it returned a list object with 2 items
#    df - the dataframe
#   rel - a list of relationship matrices ( note those not requested are NULL)
#
 

dmm documentation built on July 26, 2023, 5:23 p.m.