snp.rhs.estimates: Fit GLMs with SNP genotypes as independent variable(s)

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

This function fits a generalized linear model with phenotype as dependent variable and with a series of SNPs (or small sets of SNPs) as predictor variables. Optionally, one or more potential confounders of a phenotype-genotype association may be included in the model. In order to protect against misspecification of the variance function, "robust" estimates of the variance-covariance matrix of estimates may be calculated in place of the usual model-based estimates.

Usage

1
2
3
4
snp.rhs.estimates(formula, family = "binomial", link, weights, subset,
data = parent.frame(), snp.data,
   rules = NULL, sets = NULL, robust = FALSE, uncertain = FALSE, control
= glm.test.control())

Arguments

formula

The model formula, with phenotype as dependent variable and any potential confounders as independent variables. Note that parameter estimates are not returned for these model terms

family

A string defining the generalized linear model family. This currently should (partially) match one of "binomial", "Poisson", "Gaussian" or "gamma" (case-insensitive)

link

A string defining the link function for the GLM. This currently should (partially) match one of "logit", "log", "identity" or "inverse". The default action is to use the "canonical" link for the family selected

data

The dataframe in which the model formula is to be interpreted

snp.data

An object of class "SnpMatrix" or "XSnpMatrix" containing the SNP data

rules

Optionally, an object of class "ImputationRules"

sets

Either a vector of SNP names (or numbers) for the SNPs to be added to the model formula, or a logical vector of length equal to the number of columns in snp.data or a list of short vectors defining sets of SNPs to be included (see Details)

weights

"Prior" weights in the generalized linear model

subset

Array defining the subset of rows of data to use

robust

If TRUE, robust tests will be carried out

uncertain

If TRUE, uncertain genotypes are used and scored by their posterior expectations. Otherwise they are treated as missing

control

An object giving parameters for the IRLS algorithm fitting of the base model and for the acceptable aliasing amongst new terms to be tested. See glm.test.control

Details

Homozygous SNP genotypes are coded 0 or 2 and heterozygous genotypes are coded 1. For SNPs on the X chromosome, males are coded as homozygous females. For X SNPs, it will often be appropriate to include sex of subject in the base model (this is not done automatically). The "robust" option causes Huber-White estimates of the variance-covariance matrix of the parameter estimates to be returned. These protect against mis-specification of the variance function in the GLM, for example if binary or count data are overdispersed,

If a data argument is supplied, the snp.data and data objects are aligned by rowname. Otherwise all variables in the model formulae are assumed to be stored in the same order as the columns of the snp.data object.

Usually SNPs to be fitted in models will be referenced by name. However, they can also be referenced by number, indicating the appropriate column in the input snp.data. They can also be referenced by a logical selection vector of length equal to the number of columns in snp.data.

If the rules argument is supplied, SNPs may be imputed using these rules and included in the model.

Value

An object of class GlmEstimates

Note

A factor (or several factors) may be included as arguments to the function strata(...) in the formula. This fits all interactions of the factors so included, but leads to faster computation than fitting these in the normal way. Additionally, a cluster(...) call may be included in the base model formula. This identifies clusters of potentially correlated observations (e.g. for members of the same family); in this case, an appropriate robust estimate of the variance of the parameter estimates is used.

If uncertain genotypes (e.g. as a result of imputation) are used, the interpretation of the regression coefficients is questionable; the regression coefficient for an imperfectly measurement of a variable is not a biased (attenuated) estimate of the coefficient of the variable measured.

Author(s)

David Clayton dc208@cam.ac.uk

See Also

GlmEstimates-class, snp.lhs.estimates, snp.rhs.tests, SnpMatrix-class, XSnpMatrix-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data(testdata)
test <- snp.rhs.estimates(cc~strata(region), family="binomial",
   data=subject.data, snp.data= Autosomes, sets=1:10)
print(test)
test2 <- snp.rhs.estimates(cc~region+sex, family="binomial",
   data=subject.data, snp.data= Autosomes, sets=1:10)
print(test2)
test.robust <- snp.rhs.estimates(cc~strata(region), family="binomial",
   data=subject.data, snp.data= Autosomes, sets=1:10, robust=TRUE)
print(test.robust)

NikNakk/snpStats documentation built on May 7, 2019, 6:18 p.m.