assocRegression: Association testing with regression
In GWASTools: Tools for Genome Wide Association Studies

Description Usage Arguments Details Value Author(s) See Also Examples

Run association testing with regression

assocRegression(genoData,
                outcome,
                model.type = c("linear", "logistic", "poisson", "firth"),
                gene.action = c("additive", "dominant", "recessive"),
                covar = NULL,
                ivar = NULL,
                scan.exclude = NULL,
	       	CI = 0.95,
                robust = FALSE,
                LRtest = FALSE,
                PPLtest = TRUE,
                effectAllele = c("minor", "alleleA"),
                snpStart = NULL,
                snpEnd = NULL,
                block.size = 5000,
                verbose = TRUE)

`genoData`	a `GenotypeData` object
`outcome`	the name of the phenotype of interest (a column in the scan annotation of `genoData`)
`model.type`	the type of model to be run. "linear" uses `lm`, "logistic" uses `glm` with `family=binomial()`, "poisson" uses `glm` with `family=poisson()`, and "firth" uses `logistf`.
`gene.action`	"additive" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 1, and homozygous major allele samples = 0. "dominant" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 2, and homozygous major allele samples = 0. "recessive" coding sets the marker variable for homozygous minor allele samples = 2, heterozygous samples = 0, and homozygous major allele samples = 0. (If `effectAllele="alleleA"`, the coding reflects alleleA instead of the minor allele.)
`covar`	a vector of the names of the covariates to adjust for (columns in the scan annotation of `genoData`)
`ivar`	the name of the variable in `covar` to include as an interaction with genotype
`scan.exclude`	a vector of scanIDs for scans to exclude
`CI`	a value between 0 and 1 defining the confidence level for the confidence interval calculations
`robust`	logical for whether to use sandwich-based robust standard errors for the "linear" or "logistic" method. The default value is `FALSE`, and uses model based standard errors. The standard error estimates are returned and also used for Wald Tests of significance.
`LRtest`	logical for whether to perform Likelihood Ratio Tests in addition to Wald tests (which are always performed). Applies to linear, logistic, or poisson main effects only. NOTE: Performing the LR tests adds a noticeable amount of computation time.
`PPLtest`	logical for whether to use the profile penalized likelihood to compute p values for the "firth" method (in addition to Wald tests, which are always performed).
`effectAllele`	whether the effects should be returned in terms of the minor allele for the tested sample (`effectAllele="minor"`) or the allele returned by `getAlleleA(genoData)` (`effectAllele="alleleA"`). If the minor allele is alleleB for a given SNP, the difference between these two options will be a sign change for the beta estimate.
`snpStart`	index of the first SNP to analyze, defaults to first SNP
`snpEnd`	index of the last SNP to analyze, defaults to last SNP
`block.size`	number of SNPs to read in at once
`verbose`	logical for whether to print status updates

When using models without interaction terms, the association tests compare the model including the covariates and genotype value to the model including only the covariates (a test of genotype effect). When using a model with an interaction term, tests are performed for the interaction term separately as well as a joint test of all the genotype terms (main effects and interactions) to detect any genotype effect. All tests and p-values are always computed using Wald tests with p-values computed from Chi-Squared distribtuions. The option of using either sandwich based robust standard errors (which make no model assumptions) or using model based standard errors for the confidence intervals and Wald tests is specified by the robust parameter. The option of also performing equivalent Likelihood Ratio tests is available and is specified by the LRtest parameter.

For logistic regression models, if the SNP is monomorphic in either cases or controls, then the slope parameter is not well-defined, and the result will be NA.

Note: Y chromosome SNPs must be analyzed separately because they only use males.

a data.frame with some or all of the following columns:

`snpID`	the snpIDs
`chr`	chromosome SNPs are on
`effect.allele`	which allele ("A" or "B") is the effect allele
`EAF`	effect allele frequency
`MAF`	minor allele frequency
`n`	number of samples used to analyze each SNP
`n0`	number of controls (outcome=0) used to analyze each SNP
`n1`	number of cases (outcome=1) used to analyze each SNP
`Est`	beta estimate for genotype
`SE`	standard error of beta estimate for the genotype
`LL`	Lower limit of confidence interval for Est
`UL`	Upper limit of confidence interval for Est
`Wald.Stat`	chi-squared test statistic for association
`Wald.pval`	p-value for association
`LR.Stat`	likelihood ratio test statistic for association
`LR.pval`	p-value for association
`PPL.Stat`	profile penalized likelihood test statistic for association
`PPL.pval`	p-value for association
`GxE.Est`	beta estimate for the genotype*ivar interaction parameter (`NA` if this parameter is a factor with >2 levels)
`GxE.SE`	standard error of beta estimate for the genotype*ivar interaction parameter
`GxE.Stat`	Wald test statistic for the genotype*ivar interaction parameter
`GxE.pval`	Wald test p-value for the genotype*ivar interaction parameter
`Joint.Stat`	Wald test statistic for jointly testing all genotype parameters
`Joint.pval`	Wald test p-value for jointly testing all genotype parameters

Tushar Bhangale, Matthew Conomos, Stephanie Gogarten

GenotypeData, lm, glm, logistf, vcovHC, lrtest

library(GWASdata)
data(illuminaScanADF)
scanAnnot <- illuminaScanADF

# exclude duplicated subjects
scan.exclude <- scanAnnot$scanID[scanAnnot$duplicated]

# create some variables for the scans
scanAnnot$sex <- as.factor(scanAnnot$sex)
scanAnnot$age <- rnorm(nrow(scanAnnot), mean=40, sd=10)
scanAnnot$case.cntl.status <- rbinom(nrow(scanAnnot), 1, 0.4)
scanAnnot$blood.pressure[scanAnnot$case.cntl.status==1] <- rnorm(sum(scanAnnot$case.cntl.status==1), mean=100, sd=10)
scanAnnot$blood.pressure[scanAnnot$case.cntl.status==0] <- rnorm(sum(scanAnnot$case.cntl.status==0), mean=90, sd=5)

# create data object
gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(gdsfile)
genoData <-  GenotypeData(gds, scanAnnot=scanAnnot)

## linear regression
res <- assocRegression(genoData,
		       outcome="blood.pressure",
                       model.type="linear",
                       covar=c("sex", "age"),
                       scan.exclude=scan.exclude,
 		       snpStart=1, snpEnd=100)

## logistic regression
res <- assocRegression(genoData,
		       outcome="case.cntl.status",
                       model.type="logistic",
                       covar=c("sex", "age"),
                       scan.exclude=scan.exclude,
 		       snpStart=1, snpEnd=100)

close(genoData)

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Reading in Phenotype and Covariate Data...
Running analysis with 43 Samples
Beginning Calculations...
Block 1 of 1 Completed - 0.3984 secs
Reading in Phenotype and Covariate Data...
Running analysis with 43 Samples
Beginning Calculations...
Block 1 of 1 Completed - 0.444 secs