Robust G-G and G-E Interaction with Finely-Matched Case-Control Data.
Performs a conditional likelihood-based analysis of matched case-control data typically modeling a particular SNP and a set of covariates that could include environmental covariates or/and other genetic variables. Three alternative analysis options are included: (i) Conditional Logistic Regression (CLR): This method is classical CLR that does not try to utilize G-G or G-E independence allowing the joint distribution of the covariates in the model to be completely unrestricted (non-parametric) (ii) Constrained Conditional Logistic (CCL) : This method performs CLR analysis of case-control data under the assumption of gene-environment (or/and gene-gene) independence not in the entire population but within finely matched case-control sets. (iii) Hybrid Conditional Logistic (HCL): This method is suitable if nearest neighbor matching (see the reference by Bhattacharjee et al. 2010) is performed without regard to case-control status. The likelihood (like CCL) assumes G-G/G-E independence within matched sets but in addition borrows some information across matched sets by using a parametric model to account for heterogeneity in disease across strata.
Data frame containing all the data. No default.
Name of the binary response variable coded as 0 (controls) and 1 (cases). No default.
A vector of variable names or a formula, generally coding a single SNP variable (see details). No default.
Vector of variable names or a formula for all covariates of interest
which need to be included in the model as main effects. The default is NULL, so that only the
Character vector of variable names or a formula for all covariates of interest that will interact with the SNP variable. The default is NULL, so that no interactions will be in the model.
Integer matching variable with at most 10 subjects per stratum (e.g. CC matching using
Integer matching variable with at most 8 subjects per stratum (e.g. NN matching using
Control options for Newton-Raphson optimizer. List containing members "maxiter" (default 100) and "reltol" (default 1e-5).
To compute HCL, the data is first fit using standard logistic regression. The estimated parameters from the standard logistic regression are then
used as the initial estimates for Newton-Raphson iterations with exact gradient and hessian. Similarly for CCL, the data is first fit using
cc.var to obtain the CLR estimate as an intial estimate and Newton-Raphson is used to maximize
snp.logistic parametrically models the SNP variable, this function is non-parametric and hence offers somewhat
more flexibility. The only constraint on
snp.vars is that it is independent of
int.vars within homogenous matched sets. It can be any
genetic or non-genetic variable or a collection of those. For example 3 SNPs coded as general, dominant and additive can be specified through a single
formula e.g., "snp.vars= ~ (SNP1==1) + (SNP1 == 2) + (SNP2 >= 1)+ SNP3." However, when multiple variables are used in
snp.vars results should be interpreted carefully.
snp.effects can only be applied if a single SNP variable is coded.
int.vars consists of variables that interact with the SNP variable and
can be assumed to be independent of
snp.vars within matched sets. Those interactions for which independence is
not assumed can be included in
main.vars (as product of appropriate variables).
Both CCL and HCL provide considerable gain in power compared to standard CLR. CCL derives more power by generating
pseudo-controls under the assumption of G-G/G-E independence within matched case-control sets. HCL makes the same assumption but allows each matched set to
have any number of cases and controls unlike classical case-control matching. By comparing across matched sets, it is able to estimate the intercept parameter and
improve efficiency of estimating main effects compared to CLR and CCL. At the same time behaves similar to CCL for interactions by assuming
G-G/G-E independence only within mathced sets. For both these methods, the power increase for interaction depends on sizes of the matched sets
nn.var, which is currently limited to 8, to avaoid both memory and speed issues.
The authors would like to acknowledge Bijit Kumar Roy for his help in designing the internal data structure and algorithm for HCL/CCL likelihood computations.
A list containing sublists with names CLR, CCL, and HCL. Each sublist contains the parameter estimates (parms), covariance matrix (cov), and log-likelihood (loglike).
Chatterjee N, Zeynep K and Carroll R. Exploiting gene-environment independence in family-based case-control studies:
Increased power for detecting associations, interactions and joint-effects. Genetic Epidemiology 2005; 28:138-156.
Bhattacharjee S., Wang Z., Ciampa J., Kraft P., Chanock S, Yu K., Chatterjee N.
Using Principal Components of Genetic Variation for Robust and Powerful Detection of Gene-Gene Interactions in Case-Control and Case-Only studies.
American Journal of Human Genetics 2010, 86(3):331-342.
Breslow, NE. and Day, NE. Conditional Logistic Regression for Matched Sets. In "Statistical methods in cancer research. Volume I - The analysis of case-control studies." 1980, Lyon: IARC Sci Publ;(32):247-279.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
# Use the ovarian cancer data data(Xdata, package="CGEN") # Fake principal component columns set.seed(123) Ydata <- cbind(Xdata, PC1=rnorm(nrow(Xdata)), PC2=rnorm(nrow(Xdata))) # Match using PC1 and PC2 mx <- getMatchedSets(Ydata, CC=TRUE, NN=TRUE, ccs.var="case.control", dist.vars=c("PC1","PC2"), size = 4) # Append columns for CC and NN matching to the data Zdata <- cbind(Ydata, CCStrat=mx$CC, NNStrat=mx$NN) # Fit using variable names ret1 <- snp.matched(Zdata, "case.control", snp.vars = "BRCA.status", main.vars=c("oral.years", "n.children"), int.vars=c("oral.years", "n.children"), cc.var="CCStrat", nn.var="NNStrat") # Compute a Wald test for the main effect of BRCA.status and its interactions getWaldTest(ret1, c("BRCA.status", "BRCA.status:oral.years", "BRCA.status:n.children")) # Fit the same model as above using formulas. ret2 <- snp.matched(Zdata, "case.control", snp.vars = ~ BRCA.status, main.vars=~oral.years + n.children, int.vars=~oral.years + n.children, cc.var="CCStrat",nn.var="NNStrat") # Compute a summary table for the models getSummary(ret2)