snp.rhs.tests: Score tests with SNP genotypes as independent variable

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/glm-test.R

Description

This function fits a generalized linear model with phenotype as dependent variable and, optionally, one or more potential confounders of a phenotype-genotype association as independent variable. A series of SNPs (or small groups of SNPs) are then tested for additional association with phenotype. In order to protect against misspecification of the variance function, "robust" tests may be selected.

Usage

1
2
3
snp.rhs.tests(formula, family = "binomial", link, weights, subset, data = parent.frame(),
   snp.data, rules=NULL, tests=NULL, robust = FALSE, uncertain=FALSE, 
   control=glm.test.control(), allow.missing=0.01, score=FALSE)

Arguments

formula

The base model formula, with phenotype as dependent variable

family

A string defining the generalized linear model family. This currently should (partially) match one of "binomial", "Poisson", "Gaussian" or "gamma" (case-insensitive)

link

A string defining the link function for the GLM. This currently should (partially) match one of "logit", "log", "identity" or "inverse". The default action is to use the "canonical" link for the family selected

data

The dataframe in which the base model is to be fitted

snp.data

An object of class "SnpMatrix" or "XSnpMatrix" containing the SNP data

rules

An object of class "ImputationRules". If supplied, the rules coded in this object are used, together with snp.data, to calculate tests for imputed SNPs

tests

Either a vector of SNP names (or numbers) for the SNPs to be tested, or a logical vector of length equal to the number of columns in snp.data, or a list of short numeric or character vectors defining groups of SNPs to be tested (see Details)

weights

"Prior" weights in the generalized linear model

subset

Array defining the subset of rows of data to use

robust

If TRUE, robust tests will be carried out

uncertain

If TRUE, uncertain genotypes are used and scored by their posterior expectations. Otherwise they are treated as missing

control

An object giving parameters for the IRLS algorithm fitting of the base model and for the acceptable aliasing amongst new terms to be tested. See glm.test.control

allow.missing

The maximum proportion of SNP genotype that can be missing before it becomes necessary to refit the base model

score

Is extended score information to be returned?

Details

The tests used are asymptotic chi-squared tests based on the vector of first and second derivatives of the log-likelihood with respect to the parameters of the additional model. The "robust" form is a generalized score test in the sense discussed by Boos(1992). The "base" model is first fitted, and a score test is performed for addition of one or more SNP genotypes to the model. Homozygous SNP genotypes are coded 0 or 2 and heterozygous genotypes are coded 1. For SNPs on the X chromosome, males are coded as homozygous females. For X SNPs, it will often be appropriate to include sex of subject in the base model (this is not done automatically).

If a data argument is supplied, the snp.data and data objects are aligned by rowname. Otherwise all variables in the model formulae are assumed to be stored in the same order as the columns of the snp.data object.

Usually SNPs to be used in tests will be referenced by name. However, they can also be referenced by number, a positive number indicating the appropriate column in the input snp.data, and a negative number indicating (minus) a position in the rules list. They can also be referenced by a logical selection vector of length equal to the number of columns in snp.data. Sets of tests involving more than one SNP are referenced by a list and can use a mixture of observed and imputed SNPs. If the tests argument is missing, single SNP tests are carried out; if a rules is given, all imputed SNP tests are calculated, otherwise all SNPs in the input snp.data matrix are tested. But note that, for single SNP tests, the function single.snp.tests will often achieve the same result much faster.

Value

An object of class GlmTests or GlmTestsScore depending on whether score is set to FALSE or TRUE in the call.

Note

A factor (or several factors) may be included as arguments to the function strata(...) in the formula. This fits all interactions of the factors so included, but leads to faster computation than fitting these in the normal way. Additionally, a cluster(...) call may be included in the base model formula. This identifies clusters of potentially correlated observations (e.g. for members of the same family); in this case, an appropriate robust estimate of the variance of the score test is used.

Author(s)

David Clayton dc208@cam.ac.uk

References

Boos, Dennis D. (1992) On generalized score tests. The American Statistician, 46:327-333.

See Also

GlmTests-class, GlmTestsScore-class, single.snp.tests, snp.lhs.tests, impute.snps, ImputationRules-class, SnpMatrix-class, XSnpMatrix-class

Examples

1
2
3
4
data(testdata)
slt3 <- snp.rhs.tests(cc~strata(region), family="binomial",
   data=subject.data, snp.data= Autosomes, tests=1:10)
print(slt3)

Example output

Loading required package: survival
Loading required package: Matrix
       Chi.squared Df    p.value
173760  1.01538462  1 0.31361630
173761  1.46259571  1 0.22651757
173762  1.92028786  1 0.16582493
173767  0.77609738  1 0.37833736
173769  2.92614948  1 0.08715513
173770          NA  0         NA
173772  1.11008326  1 0.29206385
173774  0.66697270  1 0.41410906
173775  0.96730037  1 0.32535438
173776  0.09831885  1 0.75385649

snpStats documentation built on Nov. 8, 2020, 10:59 p.m.