colGxE: Genotypic TDT for Gene-Environment Interactions

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/colGxE.R


Performs a genotypic TDT for gene-environment interactions for each SNP represented by a column of a matrix in genotype format and a binary environmental factor. If alpha1 is set to a value smaller than 1, then the two-step procedure of Gauderman et al. (2010) will be used to first select all SNPs showing a p-value smaller than alpha1 in a logistic regression of the environmental factor against the sums of the codings for the parents' genotypes at the respective SNP. In the second step, the genotypic TDT is then applied to the selected SNPs.

If unstructured = TRUE, all fully parameterized model is considered and a likelihood ratio test is performed.

While colGxE computes the p-values based on asymptotic ChiSquare-distributions, colGxEPerms can be used to determine permutation-based p-values for the basic genotypic TDT (i.e. for colGxE using alpha = 1 and unstructured = FALSE.


colGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), 
   alpha1 = 1, size = 50, addGandE = TRUE, whichLRT = c("both", "2df", "1df", "none"),
   add2df = TRUE, addCov = FALSE, famid = NULL, unstructured = FALSE)
colGxEPerms(mat.snp, env, model = c("additive", "dominant", "recessive"),
   B = 10000, size = 20, addPerms = TRUE, famid = NULL, rand = NA)



a numeric matrix in which each column represents a SNP. Each column must be a numeric vector of length 3 * t representing a SNP genotyped at t trios. Each of the t blocks must consist of the genotypes of father, mother, and offspring (in this order). The genotypes must be coded by 0, 1, and 2. Missing values are allowed and need to be coded by NA. This matrix might be generated from a ped-file by, e.g., employing ped2geno.


a vector of length t (see mat.snp) containing for each offspring the value of a binary environmental variable, which must take the values 0 and 1.


type of model that should be fitted. Abbreviations are allowed. Thus, e.g., model = "dom" will fit a dominant model, and model = "r" an recessive model.


a numeric value between 0 and 1 (excluding 0). If alpha1 = 1, all SNPs will be tested with a genotypic TDT. Otherwise, the two-step procedure of Gauderman et al. (2010) will be used to select all SNPs showing a p-value smaller than or equal to alpha1 in a logistic regression in which the environmental factor is used as response and the sums over the codings for the genotypes of the parents are employed as predictor. The genotypic TDT will then be applied to the selected SNPs. Since a logistic regression is employed in the first step, which requires a numerical determination of the parameter estimates, the two-step procedure will not lead to a reduction in computing time, but will increase the computing time.


the number of SNPs considered simultaneously when computing the parameter estimates.


should the relative risks and their confidence intervals for the exposed cases be added to the output?


character string specifying which likelihood ratio test should be added to the output. If "2df", 2 degree of freedom likelihood ratio tests comparing the fitted models (containing one parameter for the SNP and one for the gene-environment interaction) with models containing no factor will be performed. If "1df", one degree of freedom likelihood ratio tests comparing the fitted model (containing two parameters, one for the SNP and the other for the interaction) with models only containing the respective SNP will be added to the output. If "both" (default), both tests will be performed, whereas none test will be done, if whichLRT = "none".


should the results of a 2 df Wald test for testing both the SNP and the interaction effect simultaneously be added to the model?


should the covariance between the parameter estimations for the SNP and the gene-environment interaction be added to the output? Default is addCov = FALSE, as this covariance is given by the negative variance of the parameter estimate for the SNP.


a vector of the same length as env specifying the family IDs for the corresponding values of the environmental variable in env. Can be used to reorder the vector env when the order of the trios differs between env and mat.snp.


should a fully parameterized model be fitted? If TRUE, a 2 df likelihood ratio test is performed comparing a gTDT model containing one indicator variable for the heterozygous genotype and one for the homozygous variant genotype with a gTDT model additionally containing two terms for the interactions between these variables and the environmental factor. In this case, only the arguments mat.snp, env, and famid are considered.


number of permutations.


should the matrices containing the permuted values of the test statistics for the SNP and the gene-environment interaction be added to the output?


integer for setting the random number generator into a reproducible state.


A conditional logistic regression model including two parameters, one for G, and the other for GxE, is fitted, where G is specified according to model.


For colGxE with unstructured=FALSE, an object of class colGxE consisting of the following numeric matrices with two columns (one for each parameter):


the estimated parameter,


the estimated standard deviation of the parameter estimate,


Wald statistic,


the relative risk, i.e.\ in the case of trio data, exp(coef) (see Schaid, 1996),


the lower bound of the 95% confidence interval for RR,


the upper bound of the 95% confidence interval for RR,


the number of trios affecting the parameter estimation,


vector containing the values of the environmental factor,




the value of addGandE,


a logical vector specifying which of the likelihood ratio tests and if the 2 df Wald test was performed,

and depending on the specifications in colGxE


numeric vector containing the covariances,


a numeric matrix with two columns, in which the first column contains the values of the 1 df likelihood ratio test statistic and the second the corresponding p-values,


a numeric matrix with two columns, in which the first column contains the values of the 2 df Wald test statistics and the second the corresponding p-values,


a numeric matrix with two columns, in which the first column contains the values of the 2 df likelihood ratio test statistic and the seocnd the corresponding p-values.

For colGxE with unstructured=TRUE, an object of class colGxEunstruct consisting of the following vectors:


the loglikelihoods of the models containing only the two main effects,


the loglikelihoods of the models additionally containing the two main effects and the two interaction effects,


the values of the test statistic of the likelihood ratio test,


the corresponding p-values.

For colGxEPerms,


a matrix with two columns containing the values of gTDT statistics for the main effects of the SNPs and the gene-environment interactions when considering the original, unpermuted case-pseudo-control status,


a matrix with two columns comprising the permutation-based p-values corresponding to the test statistics in stat,

and if addPerms = TRUE


a matrix with B columns containing the values of the gTDT statistic for the SNPs when considering the B permutations of the case-pseudo-control status,


a matrix with B columns containing the values of the gTDT statistic for the gene-environment interactions when considering the B permutations of the case-pseudo-control status.


Holger Schwender,


Gauderman, W.J., Thomas, D.C., Murcray, C.E., Conti, D., Li, D., and Lewinger, J.P. (2010). Efficient Genome-Wide Association Testing of Gene-Environment Interaction in Case-Parent Trios. American Journal of Epidemiology, 172, 116-122.

Schaid, D.J. (1996). General Score Tests for Associations of Genetic Markers with Disease Using Cases and Their Parents. Genetic Epidemiology, 13, 423-449.

Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.

See Also

colTDT, ped2geno


# Load the simulated data for the analysis.

# Set up a vector with the binary environmental variable.
# Here, we consider the gene-gender interactions and
# assume that the children in the first 50 trios are
# girls, and the remaining 50 are boys.
sex <- rep(0:1, each = 50)

# Test the interaction of sex with each of the SNPs in mat.test
gxe.out <- colGxE(mat.test, sex)

# By default, an additive mode of inheritance is considered.
# If, e.g., a dominant mode should be considered, then this can
# be done by calling
gxeDom.out <- colGxE(mat.test, sex, model="dominant")

Example output

trio documentation built on Nov. 8, 2020, 7:41 p.m.