Generalized additive model estimation of geneenvironment interaction using data from caseparent trios
Description
trioGxE
estimates statistical interaction (GxE) between a single nucleotide polymorphism (SNP)
and a continuous environmental or nongenetic attributes
in caseparent trio data by fitting a generalized additive
model (GAM) using a penalized iteratively reweighted least squares algorithm.
Usage
1 2 3 4 5 
Arguments
data 
a data frame with columns for parental genotypes, child genotypes and child environmental/nongenetic attribute. See ‘Details’ section for the required format.  
pgenos 
a length2 vector of character strings specifying the names of the two columns
in  
cgeno 
a character string specifying the name of the column in  
cenv 
a character string specifying the name of the column in  
penmod 
the penetrance mode of the genetic and interaction effects:  
k 
an optional vector or single value specifying the
desired number(s) of knots to be used for the cubic spline basis construction of smooth function(s) representing GxE.
When  
knots 
knot positions for the cubic spline basis construction.
When  
sp 
smoothing parameters for the interaction functions.
When  
lsp0 
an optional length2 numeric vector or a single numeric value used for
choosing trial values of log smoothing parameters in the grid search for
the optimal smoothing parameters.
When  
lsp.grid 
trial values of log smoothing parameters used in the grid search for smoothing parameters.
When  
control 
a list of convergence parameters for the penalized iteratively reweighted least squares (PIRLS) procedure:
 
testGxE 
a logical specifying whether the fitting is for testing interaction.
Default is  
return.data 
a logical specifying whether the original data should be returned.
If  
... 
sets the arguments for 
Details
trioGxE
fits data from caseparent trios to a GAM with smooth functions
representing geneenvironment interaction (G \times E).
The data
input must be a data frame containing the following 4 columns (of any order):
[,1]  number (0, 1 or 2) of copies of a putative risk allele carried by the mother 
[,2]  number of copies of a putative risk allele carried by the father 
[,3]  number of copies of a putative risk allele carried by the affected child (G) 
[,4]  value of a continuous environmental/nongenetic attribute measured on the child (E) 
The function trioGxE
does basic error checking to ensure that
only the trios that are consistent with Mendelian segregation law
with complete genotype, environment and parental genotype information.
The function determines which trios are from informative parental mating types.
An informative parental pair has at least one heterozygote; such parental pair can
have offspring that are genetically different.
Under the assumption that the parents transmit the alleles to their child under
Mendel's law, with no mutation, there are three types of informative mating types
G_p=1,2,3:
G_p=1:  if one parent is heterozygous, and the other parent is homozygous for the nonrisk allele 
G_p=2:  if one parent is heterozygous, and the other parent is homozygous for the risk allele 
G_p=3:  if both parents are heterozygous 
Since GxE occurs when genotype relative risks vary with nongenetic attribute values E=e, GxE inference is based on the attributespecific genotype relative risks, GRR_h(e), expressed as
{\rm GRR}_h(e) = \frac{P(D=1G=h,E=e)}{P(D=1G=h1,E=e)} = \exp{(γ_h + f_h(e))}, ~~h=1,2,
where D=1 indicates the affected status, γ_1 and γ_2 represent genetic main effect, and f_1(e) and f_2(e) are unknown smooth functions. The functions f_h(e) represent GxE since GRRs vary with E=e only when f_1(e)\neq 0 or f_2(e)\neq 0 vary with E. The expressions are followed by assuming a loglinear model of disease risk in Shin et al. (2010).
Under the codominant penetrance mode (i.e., penetrance="codominant"
),
GRR_1(e) and GRR_2(e) are estimated using the information
in the trios from the informative mating types G_p=1, 3 and those from G_p=2, 3, respectively.
Under a noncodominant penetrance mode, only one GRR function, GRR(e)=γ+f(e),
is estimated from an appropriate set of informative trios.
Under the dominant penetrance mode (i.e., penetrance="dominant"
),
because GRR_2(e) is 1 (i.e., γ_2=0 and f_2(e)=0),
GRR(e)\equiv GRR_1(e) is estimated based on the trios from G_p=1 and 3.
Under the recessive penetrance mode (i.e., penetrance="recessive"
),
because GRR_1(e) is 1 (i.e., γ_1=0 and f_1(e)=0),
GRR(e)\equiv GRR_2(e) is estimated based on the trios from G_p=2 and 3.
Under the multiplicative or logadditive penetrance model (penetrance="additive"
),
since GRR_1(e) = GRR_2(e) (i.e., γ_1=γ_2 and f_1=f_2),
GRR(e)\equiv GRR_1(e) \equiv GRR_2(e) is estimated based on all informative trios.
The interaction functions are approximated by cubic regression spline functions defined by
the knots specified through the arguments k
and knots
.
For each interaction function, at least three knots are chosen within the range of
the observed nongenetic attribute values.
Under the codominant mode, k[1]
and k[2]
knots, respectively, located at
knots[[1]]
and knots[[2]]
are used to construct the basis for
f_1(e) and f_2(e), respectively.
By default, a total of 5 knots are placed for each interaction function:
three interior knots at the 25th, 50th and 75th quantiles and two boundary knots
at the endpoints of the data in trios from G_p=1 or 3, for f_1(e),
and in trios from G_p=2 or 3, for f_2(e).
Similarly, under a noncodominant penetrance mode, when knots=NULL
,
k
knots are chosen based on the data in trios from
G_p=1 and 3 (dominant mode);
in trios from all informative mating types (logadditive mode);
and in trios from G_p=2 or 3 (recessive mode).
A standard model identifiability constraint is imposed on each interaction function,
which involves the sum of the interaction function over all observed attribute values of cases
in the appropriate set of informative trios.
For smoothing parameter estimation, trioGxE
finds the optimal values using either a double (if codominant) or
a single 1dimensional grid search constructed based on the arguments lsp0
and lsp.grid
.
When lsp0 = NULL
, trioGxE
takes the log smoothing parameter estimates obtained
from a likelihood approach that makes inference of GxE conditional
on parental mating types, nongenetic attribute and partial information on child genotypes (Duke, 2007).
When lsp.grid = NULL
, trioGxE
takes the following 6 numbers to be the trial values of
the logsmoothing parameter for each interaction function:
20 and 20, lsp0
and the quartiles of the truncated normal distributions
constructed based on lsp0
.
At each trial value in lsp.grid
, the prediction error criterion function,
UBRE (unbiased risk estimator, is minimized to find the optimal smoothing parameter.
For more details on how to estimate the smoothing parameters, see Appendix B.3 in Shin (2012).
For variance estimation, trioGxE
uses a Bayesian approach (Wood, 2006);
the resulting Bayesian credible intervals have been shown to have good frequentist coverage probabilities
as long as the bias is a relatively small fraction of the mean squared error (Marra and Wood, 2012)
Value
coefficients 
a vector holding the spline coefficient estimates for \hat{f}_{1} and/or \hat{f}_{2}. The length of the vector is equal to the total number of knots used for constructing the bases of the interaction curves. For example, under the default basis dimension with codominant penetrance mode, the vector has size 10 (i.e., 5 for f_1 and the other 5 for f_2).  
control 
a list of convergence parameters used for the penalized iteratively reweighted least squares (PIRLS) procedure  
data 
original data passed to  
edf 
a vector of effective degrees of freedom for the model parameters (see page 171 in Wood, 2006 for details).  
Gp 
a vector containing the values of parental mating types G_p (see ‘Details’ for the definition)  
lsp0 
log smoothing parameter(s) used in the grid search.
Not returned (i.e.,  
lsp.grid 
trial values of the log smoothing parameter(s) used in the grid search. Not returned when smoothing parameters were not estimated but provided by the user.  
penmod 
the penetrance mode under which the data were fitted.  
qrc 
a list containing the QRdecomposition results used for imposing the identifiability constraints.
See  
smooth 
a list with components:
 
sp 
optimal smoothing parameter values calculated from UBRE optimization or smoothing parameter values provided by the user.  
sp.user 
logical whether or not the smoothing parmeter values were provided by the user.
If  
terms 
list of character strings of column names in  
triodata 
Formatted data passed into the internal fitting functions. To be used in  
ubre 
the minimum value of the unbiased risk estimator (UBRE), measure of predictability for the interaction function estimators \hat{f}_{1} or \hat{f}_{2}. Not returned when smoothing parameters were not estimated but provided by the user.  
ubre.val 
a list or a vector of ubre values corresponding to the trial values
of smoothing parameter(s) in  
Vp 
Bayesian posterior variancecovariance matrix for the coefficients.
The size the matrix is the same as that of 
Author(s)
JiHyung Shin <shin@sfu.ca>, Brad McNeney <mcneney@sfu.ca>, Jinko Graham <jgraham@sfu.ca>
References
Duke, L. (2007):
A graphical tool for exploring SNPbyenvironment interaction in
caseparent trios,
M.Sc. Thesis, Statistics and Actuarial Science, Simon Fraser University:
URL http://www.stat.sfu.ca/content/dam/sfu/stat/alumnitheses/Duke2007.pdf.
Marra, G., Wood, S.N. (2012).
Coverage properties of confidence intervals for generalized additive model components.
Scand J Stat, 39: 5374.
Shin, J.H., McNeney, B. and Graham, J. (2010).
On the use of allelic transmission rates for assessing genebyenvironment interaction
in caseparent trios.
Ann Hum Gen, 74: 43951.
Shin, J.H. (2012):
Inferring geneenvironment interaction from caseparent trio data: evaluation of and
adjustment for spurious G\times E and development of a datasmoothing method to uncover
true G\times E,
Ph.D. Thesis, Statistics and Actuarial Science, Simon Fraser University:
URL https://theses.lib.sfu.ca/sites/all/files/public_copies/etd7214jshinetd7214jshin.pdf.
Wood, S. (2006): Generalized Additive Models: An Introduction with R, Boca Raton, FL: Chapman & Hall/CRC.
See Also
trioSim
, plot.trioGxE
, test.trioGxE
Examples
1 2 3 4 5 6 7 8  ## fitting a codominant model
data(hypoTrioDat)
simfit < trioGxE(data=hypoTrioDat,pgenos=c("parent1","parent2"),cgeno="child",cenv="attr",
k=c(5,5),knots=NULL,sp=NULL)
## fitting a dominant model to the hypothetical data
simfit.dom < trioGxE(data=hypoTrioDat,pgenos=c("parent1","parent2"),cgeno="child",cenv="attr",
penmod="dom",k=5,knots=NULL,sp=NULL)
