assocRegression: Association testing with regression

View source: R/assocRegression.R

assocRegressionR Documentation

Association testing with regression

Description

Run association testing with regression

Usage

assocRegression(genoData,
                outcome,
                model.type = c("linear", "logistic", "poisson", "firth"),
                gene.action = c("additive", "dominant", "recessive"),
                covar = NULL,
                ivar = NULL,
                scan.exclude = NULL,
	       	CI = 0.95,
                robust = FALSE,
                LRtest = FALSE,
                PPLtest = TRUE,
                effectAllele = c("minor", "alleleA"),
                snpStart = NULL,
                snpEnd = NULL,
                block.size = 5000,
                verbose = TRUE)

Arguments

genoData

a GenotypeData object

outcome

the name of the phenotype of interest (a column in the scan annotation of genoData)

model.type

the type of model to be run. "linear" uses lm, "logistic" uses glm with family=binomial(), "poisson" uses glm with family=poisson(), and "firth" uses logistf.

gene.action

"additive" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 1, and homozygous major allele samples = 0. "dominant" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 2, and homozygous major allele samples = 0. "recessive" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 0, and homozygous major allele samples = 0. (If effectAllele="alleleA", the coding reflects alleleA instead of the minor allele.)

covar

a vector of the names of the covariates to adjust for (columns in the scan annotation of genoData)

ivar

the name of the variable in covar to include as an interaction with genotype

scan.exclude

a vector of scanIDs for scans to exclude

CI

a value between 0 and 1 defining the confidence level for the confidence interval calculations

robust

logical for whether to use sandwich-based robust standard errors for the "linear" or "logistic" method. The default value is FALSE, and uses model based standard errors. The standard error estimates are returned and also used for Wald Tests of significance.

LRtest

logical for whether to perform Likelihood Ratio Tests in addition to Wald tests (which are always performed). Applies to linear, logistic, or poisson main effects only. NOTE: Performing the LR tests adds a noticeable amount of computation time.

PPLtest

logical for whether to use the profile penalized likelihood to compute p values for the "firth" method (in addition to Wald tests, which are always performed).

effectAllele

whether the effects should be returned in terms of the minor allele for the tested sample (effectAllele="minor") or the allele returned by getAlleleA(genoData) (effectAllele="alleleA"). If the minor allele is alleleB for a given SNP, the difference between these two options will be a sign change for the beta estimate.

snpStart

index of the first SNP to analyze, defaults to first SNP

snpEnd

index of the last SNP to analyze, defaults to last SNP

block.size

number of SNPs to read in at once

verbose

logical for whether to print status updates

Details

When using models without interaction terms, the association tests compare the model including the covariates and genotype value to the model including only the covariates (a test of genotype effect). When using a model with an interaction term, tests are performed for the interaction term separately as well as a joint test of all the genotype terms (main effects and interactions) to detect any genotype effect. All tests and p-values are always computed using Wald tests with p-values computed from Chi-Squared distribtuions. The option of using either sandwich based robust standard errors (which make no model assumptions) or using model based standard errors for the confidence intervals and Wald tests is specified by the robust parameter. The option of also performing equivalent Likelihood Ratio tests is available and is specified by the LRtest parameter.

For logistic regression models, if the SNP is monomorphic in either cases or controls, then the slope parameter is not well-defined, and the result will be NA.

Note: Y chromosome SNPs must be analyzed separately because they only use males.

Value

a data.frame with some or all of the following columns:

snpID

the snpIDs

chr

chromosome SNPs are on

effect.allele

which allele ("A" or "B") is the effect allele

EAF

effect allele frequency

MAF

minor allele frequency

n

number of samples used to analyze each SNP

n0

number of controls (outcome=0) used to analyze each SNP

n1

number of cases (outcome=1) used to analyze each SNP

Est

beta estimate for genotype

SE

standard error of beta estimate for the genotype

LL

Lower limit of confidence interval for Est

UL

Upper limit of confidence interval for Est

Wald.Stat

chi-squared test statistic for association

Wald.pval

p-value for association

LR.Stat

likelihood ratio test statistic for association

LR.pval

p-value for association

PPL.Stat

profile penalized likelihood test statistic for association

PPL.pval

p-value for association

GxE.Est

beta estimate for the genotype*ivar interaction parameter (NA if this parameter is a factor with >2 levels)

GxE.SE

standard error of beta estimate for the genotype*ivar interaction parameter

GxE.Stat

Wald test statistic for the genotype*ivar interaction parameter

GxE.pval

Wald test p-value for the genotype*ivar interaction parameter

Joint.Stat

Wald test statistic for jointly testing all genotype parameters

Joint.pval

Wald test p-value for jointly testing all genotype parameters

Author(s)

Tushar Bhangale, Matthew Conomos, Stephanie Gogarten

See Also

GenotypeData, lm, glm, logistf, vcovHC, lrtest

Examples

library(GWASdata)
data(illuminaScanADF)
scanAnnot <- illuminaScanADF

# exclude duplicated subjects
scan.exclude <- scanAnnot$scanID[scanAnnot$duplicated]

# create some variables for the scans
scanAnnot$sex <- as.factor(scanAnnot$sex)
scanAnnot$age <- rnorm(nrow(scanAnnot), mean=40, sd=10)
scanAnnot$case.cntl.status <- rbinom(nrow(scanAnnot), 1, 0.4)
scanAnnot$blood.pressure[scanAnnot$case.cntl.status==1] <- rnorm(sum(scanAnnot$case.cntl.status==1), mean=100, sd=10)
scanAnnot$blood.pressure[scanAnnot$case.cntl.status==0] <- rnorm(sum(scanAnnot$case.cntl.status==0), mean=90, sd=5)

# create data object
gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(gdsfile)
genoData <-  GenotypeData(gds, scanAnnot=scanAnnot)

## linear regression
res <- assocRegression(genoData,
		       outcome="blood.pressure",
                       model.type="linear",
                       covar=c("sex", "age"),
                       scan.exclude=scan.exclude,
 		       snpStart=1, snpEnd=100)

## logistic regression
res <- assocRegression(genoData,
		       outcome="case.cntl.status",
                       model.type="logistic",
                       covar=c("sex", "age"),
                       scan.exclude=scan.exclude,
 		       snpStart=1, snpEnd=100)

close(genoData)

smgogarten/GWASTools documentation built on Nov. 10, 2024, 9:54 p.m.