make.main: Making Design Matrix of Main Effects from Genotypic Data
In nyiuab/BhGLM: Bayesian hierarchical GLMs and survival models, with applications to Genomics and Epidemiology

make.main

R Documentation

Making Design Matrix of Main Effects from Genotypic Data

Description

This function is to construct a design matrix of main effects from genotypic data of genetic markers. The genotypic data can include missing values. Different genetic models can be used, which transform the three-level (or two-level for a backcross) genotypic data to main-effect predictors.

Usage

make.main(geno, model = c("Cockerham", "codominant", "additive", "dominant", "recessive", "overdominant"),
          fill.missing = TRUE, ind.group = NULL, geno.order = TRUE, 
          loci.names = c("marker", "position"), imprint = TRUE, verbose = FALSE, ...)

Arguments

`geno`	For human association data, it is a matrix or data frame of genotypes with dimension `n*J`, where `n` is the number of individuals and `J` is the number of markers. The genotype data can be any three numbers or characters that indicate three genotypes (but the heterozygote should be the second by sort). For experimental crosses, it is an object of class `cross`. See `read.cross` in the package `qtl` for details.
`model`	a genetic model to construct main-effect predictors.
`fill.missing`	logical. If `TRUE`, fill in missing genotypic data. The default is `TRUE`. If `FALSE`, individuals with missing data will be removed from the analysis.
`ind.group`	a vector of length `n`, indicating groups of individuals (e.g., case-control status, race). If provided, we fill in missing data based on observed genotype data for each group separately. The default is `NULL`; we don't use group information.
`geno.order`	logical. If `TRUE`, re-code genotypes as 1: common homozygote, 2: heterozygote, 3: rare homozygote.
`loci.names`	the way to name main-effect predictors; use marker names or chromosome positions.
`imprint`	logical. Indicates whether consider imprinting effects.
`verbose`	logical. If `TRUE`, print out markers removed or with only two genotypes.

Details

This function provides different genetic models to code the main-effect predictors.

Denote common homozygote (i.e., the homozygote with higher frequency), heterozygote, and rare homozygote for each SNP by c, h, and r, respectively.

The Cockerham model defines two main effects for each SNP (with suffix 'a' and 'd'): an additive predictor as -1, 0, and 1 for c, h, and r, and a dominance predictor as -0.5 for c and r and 0.5 for h.

The codominant model also introduces two main effects for each SNP (with suffix 'r' and 'h'), with the two main-effect predictors being two indicator variables with the common homozygote c chosen as the reference group: 'r' and 'h' represent indicators for rare homozygote and heterozygote, respectively.

The additive model defines a main-effect predictor for each SNP, equal to 0, 1, 2 for c, h, r, respectively.

The dominant model defines a main-effect predictor for each SNP, equal to 1 for r and h, and 0 for c.

The recessive model defines a main-effect predictor for each SNP, equal to 1 for r, and 0 for h and c.

The overdominant model defines a main-effect predictor for each SNP, equal to 1 for h, and 0 for c and r.

For missing genotypes, we first calculate the genotypic probabilities of missing genotypes conditioning on the observed marker data, and then use these conditional probabilities to construct the main-effect predictors as above. For QTL mapping in experimental crosses, we use the multipoint method as implemented in R/qtl (see calc.genoprob) and R/qtlbim (see qb.genoprob). For human association data, we simply replace missing genotypes by their expected values (i.e., dosages) based only on the observed genotypes for that marker.

This function removes markers with only one genotype or more than three genotypes, and for markers with only two genotypes, always uses genotype indicator variables.

Value

This function returns a data frame consisting of values of all main-effect predictors.

Author(s)

Nengjun Yi, nyi@uab.edu

Examples

library(BhGLM)

x = sim.x(n=100, m=10, genotype=6:10)
geno = x[, 6:10] #get genotype data
x.g = make.main(geno=geno, model="additive", fill.missing=T)
x.g = make.main(geno=geno, model="Cockerham", fill.missing=T)

nyiuab/BhGLM documentation built on June 12, 2024, 9:28 p.m.