association.test: Association Test

View source: R/gwas.r

association.testR Documentation

Association Test

Description

Association tests between phenotype and SNPs.

Usage

 
association.test(x, Y = x@ped$pheno, X = matrix(1, nrow(x)),
                 method = c("lm", "lmm"), response = c("quantitative", "binary"), 
                 test = c("score", "wald", "lrt"), K, eigenK, beg = 1, 
                 end = ncol(x), p = 0, tol = .Machine$double.eps^0.25, ...)

Arguments

x

A bed.matrix

Y

The phenotype vector. Default is the column (pheno) of x@ped

X

A covariable matrix. The default is a column vector of ones, to include an intercept in the model

method

Method to use: "lm" for (generalized) linear model, and "lmm" for (generalized) linear mixed model

response

Is "Y" a quantitative or a binary phenotype?

test

Which test to use. For binary phenotypes, test = "score" is mandatory

K

A Genetic Relationship Matrix (as produced by GRM), or a list of such matrices. For test = "score".

eigenK

Eigen decomposition of the Genetic Relationship Matrix (as produced by the function eigen). For test = "wald" or "lrt".

beg

Index of the first SNP tested for association

end

Index of the last SNP tested for association

p

Number of Principal Components to include in the model with fixed effect (for test = "wald" or "lrt")

tol

Parameter for the likelihood maximization (as in optimize)

...

Additional parameters for lmm.aireml or logistic.mm.aireml (if test = "score").

Details

Tests the association between the phenotype and requested SNPs in x.

If method = "lm" and response = "quantitative" are used, a simple linear regression is performed to test each SNP in the considered interval. Precisely, the following model is considered for each SNP,

Y = (X|PC)\alpha + G\beta + \varepsilon

with \varepsilon \sim N(0,\sigma^2 I_n) , G the genotype vector of the SNP, X the covariates matrix, and PC the matrix of the first p principal components. A Wald test is performed, independently of the value of test.

Ifmethod = "lm" and response = "binary", a similar model is used for a logistic regression (Wald test).

If method = "lmm" and response = "quantitative", the following model in considered for each SNP

Y = (X|PC)\alpha + G\beta + \omega + \varepsilon

with \omega \sim N(0,\tau K) and \varepsilon \sim N(0,\sigma^2 I_n) . G is the genotype vector of the SNP, K is a Genetic Relationship Matrix (GRM) X the covariates matrix, and PC the matrix of the first p principal components.

If test = "score", all parameters are estimated with the same procedure as in lmm.aireml and the argument K is used to specify the GRM matrix (or a list of GRM matrices for inclusion of several random effects in the model). If p is positive, the paramater eigenK needs to be given as well. For Wald and LRT tests the procedure used is the same as in lmm.diago and eigenK is used to specify the GRM matrix.

If method = "lmm" and response = "binary", the following model in considered for each SNP

\mbox{logit}(P[Y=1| X, G, \omega]) = X\alpha + G\beta + \omega

with \omega \sim N(0,\tau K) . G is the genotype vector of the SNP, K is a Genetic Relationship Matrix (GRM), X the covariable matrix. A score test is performed, independently of the value of test. All parameters under null model are estimated with the same procedure as in logistic.mm.aireml. In case of convergence problems of the null problem, the user can try several starting values (in particular with parameter tau, trying e.g. tau = 0.1 or another value). It is possible to give a list of matrices in parameter K for inclusion of several random effects in the model. If p is positive, the paramater eigenK needs to be given as well.

Note: this function is not multithreaded. Wald test with Linear Mixed Models are computationally intensive, to run a GWAS with such tests consider using association.test.parallel in package gaston.utils (on github). Association tests with dosages can be done with association.test.dosage and association.test.dosage.parallel in the same package.

Value

A data frame, giving for each considered SNP, its position, id, alleles, and some of the following columns depending on the values of method and test:

score

Score statistic for each SNP

h2

Estimated value of \tau \over {\tau + \sigma^2}

beta

Estimated value of \beta

sd

Estimated standard deviation of the \beta estimation

LRT

Value of the Likelihood Ratio Test

p

The corresponding p-value

See Also

qqplot.pvalues, manhattan, lmm.diago, lmm.aireml, logistic.mm.aireml, optimize

Examples


# Load data
data(TTN)
x <- as.bed.matrix(TTN.gen, TTN.fam, TTN.bim)
standardize(x) <- "p"

# simulate quantitative phenotype with effect of SNP #631
set.seed(1)
y <- x %*% c(rep(0,630),0.5,rep(0,ncol(x)-631)) + rnorm(nrow(x))

# association test with linear model 
test <- association.test(x, y, method="lm", response = "quanti")

# a p-values qq plot
qqplot.pvalues(test)

# a small Manhattan plot 
# hihlighting the link between p-values and LD with SNP #631
lds <- LD(x, 631, c(1,ncol(x)))
manhattan(test, col = rgb(lds,0,0), pch = 20)

# use y to simulate a binary phenotype
y1 <- as.numeric(y > mean(y))

# logistic regression
t_binary <- association.test(x, y1, method = "lm", response = "binary")
# another small Manhattan plot
manhattan(t_binary, col = rgb(lds,0,0), pch = 20)


gaston documentation built on May 29, 2024, 7:33 a.m.