Test of gene-environment interaction between a SNP and a continuous non-genetic covariate from case-parent trio data.

Share:

Description

Performs permutation test of gene-environment interaction based on the associated penalized maximum likelihood estimates obtained by fitting a generalized additive model to case-parent trio data.

Usage

1
2
3
test.trioGxE(object, data = NULL, nreps, level = 0.05, early.stop = FALSE, 
             fix.sp = FALSE, output = NULL, return.data = FALSE, 
             return.object = FALSE, ...)

Arguments

object

A returned object from trioGxE function. When NULL, a data set of case-parent trios must be provided (through 'data' argument).

data

Trio data to be passed into trioGxE when 'object' is not provided.

nreps

Desired number of permutation replicates.

fix.sp

When TRUE, the approximated null distribution of the test statistic is obtained by computing the test statistic by fitting each simulated data set under fixed values of the smoothing parameters. When FALSE (Default), the null distribution is obtained by fitting each simulated data set while simultaneously estimating the smoothing parameter values.

level

Desired significance level for the test.

early.stop

When TRUE, sampling is terminated early when the number of test statistics that are more extreme than or as extreme as the observed test statistic equals nreps*level. \code{nreps} \times \code{level} values larger than the observed test statistic are obtained.

output

A character string specifying the name of the output file that writes the values of the test statistics calculated for the actual and simulated data set. When NULL (Default), no written output file is produced.

return.data

When TRUE, the original data set is returned.

return.object

When TRUE, the fitting object for the original data set is returned.

...

Arguments passed to trioGxE: when data is provided, instead of trioGxE class object, parameters of trioGxE must be provided through ....

Details

Suppose k_1 and k_2 are the numbers of knots used to represent the interaction functions f_1 and f_2, respectively, via cubic regression spline functions. Let \bm{c}_1 = (c_{11},...,c_{1K_1-1})^\prime and \bm{c}_2 = (c_{21},...,c_{2K_2-1})^\prime are the spline coefficient vectors for f_1 and f_2 that satisfy model identifiability constraints.

The function test.trioGxE calculates test statistic T,

T = t(\hat{\bm{c}}){\rm V}^{-1}(\bm{c})\hat{\bm{c}},

where \bm{c}=(\bm{c}_1^{\prime},\bm{c}_2^{\prime})^{\prime} and V_c is a square matrix of size (k_1+k_2-2), formed by extracting the rows and columns, corresponding to the spline coefficients from the Bayesian posterior variance-covariance matrix calculated in trioGxE.

If the actual data were fitted under the co-dominant penetrance mode (i.e., object$penmod="codominant"), the test statistic T represents an overall test of GxE, where

{\rm H}_0: \bm{c}=0.

Depending on the context, an investigator may also want to perform individual tests: {\rm H}_{01}: \bm{c}_1 = \bm{0} and {\rm H}_{02}: \bm{c}_2 = \bm{0}. For example, when the null hypothesis is rejected, the user may want to know which of the two interaction function is not zero (i.e., which curve is not flat). For the individual tests, test.trioGxE calculates the permutation p-values based on the Monte-Carlo distributions of the individual test statistics T_1 and T_2, where

T_h = t(\hat{\bm{c}}_h){\rm V}^{-1}(\bm{c}_h)\hat{\bm{c}}_h, h=1,2.

Under the dominant, log-additive (multiplicative) or recessive penetrance model, T can be viewed as an individual test since \bm{c}_2=\bm{0}, \bm{c}_1=\bm{c}_2 and \bm{c}_1=\bm{0}, respectively, under the dominant, log-additive and recessive models. For example, under the dominant penetrance model, T\equiv{T_1} because \bm{c}_2=\bm{0}, and T_2=0.

As the analysis is conditional on parental genotypes, the distribution of the test statistic under {\rm H}_0 is calculated by shuffling the column that holds the values of the non-genetic covariate within mating types. This can be justifiable based on the fact that under no interaction, the SNP and the non-genetic covariate are independent of each other within a random affected trio when they are independent within a trio from the general population (Umbach and Weinberg, 2000).

The distribution of the test statistics can be obtained in two ways: either under fixed smoothing parameters (fixed.sp=TRUE) or under varying smoothing parameters (fixed.sp=FALSE). Under the fixed smoothing parameters, the penalized iteratively re-weighted least squares procedure is performed for each simulated data set under the same smoothing parameter values. Under varying smoothing parameters, smoothing parameters are estimated for each simulated data set. Therefore, the test under fixed.sp=FALSE accounts for the extra uncertainty introduced by the smoothing parameter estimation.

To save computation time, the user can use ‘early-termination’ option (Besag and Clifford, 1991). Under this option, sampling is terminated when the number of the simulated data sets reaches nreps*{level} < nreps when the evidence is not strong enough to reject the null hypothesis at the given significance level (level). For example, if the user specifies nreps=1000 and level=0.05, the test terminates when the number of data sets that have test statistic values that are more extreme than or as extreme as the observed test statistic value reaches 50.

Value

GxE.test

Either a 3- or 1-column matrix. When the actual data was fitted under a co-dominant penetrance mode (i.e., object$penmod = "codominant"), a 3-column matrix is returned, where the first column holds the values for T for the original and the generated data sets, and the second and third columns hold the values of T_1 and T_2, respectively for the same data sets. When the actual data was fitted under a non-co-dominant penetrance mode (e.g., dominant), GxE.test is retuned as a matrix with a sigle column holding T.

p.value

If object$penmod = "codominant", it is returned as a vector holding three values, where the first value indicates the overall p-value obtained from the distribution of T, and the other two values indicate the individual p-values obtained from the distributions of T_1 and T_2. Under object$penmod is dominant, additive or recessive, it is returned as a single p-value calculated based on T.

Author(s)

Ji-Hyung Shin <shin@sfu.ca>, Brad McNeney <mcneney@sfu.ca>, Jinko Graham <jgraham@sfu.ca>

References

Umbach, D. and Weinberg, C. (2000). The use of case-parent triads to study joint effects of genotype and exposure. Am J Hum Gen, 66:251-61.

Besag, J. and P. Clifford (1991). Sequential Monte Carlo p-values. Biometrika, 78:301-304.

See Also

trioGxE, plot.trioGxE, trioSim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(hypoTrioDat)
example.fit <- trioGxE(hypoTrioDat, pgenos = c("parent1","parent2"), cgeno = "child",
                       cenv = "attr",penmod="codominant", k=c(5,5))
# A toy example with 'few' permutation replicates
example.test <- test.trioGxE(example.fit, nreps=10, early.stop = FALSE, 
                              output=NULL)
 

## Not run: 

## More proper examples of permutation tests with 5000 replicates

## Example1: does not generate an output file containing test statistic values
example.test1 <- test.trioGxE(example.fit, nreps=5000, early.stop = TRUE, 
                              output=NULL)
## Example 2: generates an output file 'myoutput.out' containing test statistic values 
example.test2 <- test.trioGxE(example.fit, nreps=5000, early.stop = TRUE, 
                              output="myoutput.out")

## End(Not run)