GRSxE | R Documentation |
Fitting and evaluating GRS (genetic risk scores) for testing the presence of GxE (gene-environment) interactions.
GRSxE(
X,
y,
E,
C = NULL,
test.type = "bagging",
B = 500,
replace = TRUE,
subsample = ifelse(replace, 1, 0.632),
test.ind = sample(nrow(X), floor(nrow(X)/2)),
grs.type = "rf",
grs.args = list()
)
X |
Matrix or data frame of genetic variables such as SNPs usually coded as 0-1-2. |
y |
Numeric vector of the outcome/phenotype. Binary outcomes such as a disease status should be coded as 0-1 (control-case). |
E |
Numeric vector of the environmental exposure. |
C |
Optional data frame containing potentially confounding variables to be adjusted for. |
test.type |
Testing type. The standard setting is |
B |
The number of bagging iterations if |
replace |
Should sampling with or without replacement be performed?
Only used if |
subsample |
Subsample fraction if |
test.ind |
Vector of indices in the supplied data for testing the GxE
interaction. Only used if |
grs.type |
Type of GRS to be constructed. Either |
grs.args |
Optional list of arguments passed to the GRS fitting procedure. |
The GRS is usually constructed through random forests for taking gene-gene interactions into account and using its OOB (out-of-bag) prediction mechanism. Alternatively, a classical GRS construction approach can be employed by fitting an elastic net. Bagging can also be applied to fit multiple elastic net models to also be able to perform OOB predictions.
The advantage of OOB predictions is that they allow the GRS model to be constructed on the full available data, while performing unbiased predictions also on the full available data. Thus, both the GRS construction and the GxE interaction testing can utilize all observations.
If desired, sampling can be performed without replacement in contrast to the classical bagging approach that utilizes bootstrap sampling.
Potentially confounding variables can also be supplied that will then be adjusted for in the GxE interaction testing.
This function uses a GLM (generalized linear model) for modelling the
marginal genetic effect, marginal environmental effect, the GRSxE interaction
effect, and potential confounding effects.
The fitted GLM is returned, which can be, e.g., inspected via
summary(...)
to retrieve the Wald test p-values for the individual
terms. The p-value corresponding to the G:E
term is the p-value
for testing the presence of a GRSxE interaction.
An object of class glm
is returned, in which G:E
describes the GRSxE term.
Lau, M., Kress, S., Schikowski, T. & Schwender, H. (2023). Efficient gene–environment interaction testing through bootstrap aggregating. Scientific Reports 13:937. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1038/s41598-023-28172-4")}
Lau, M., Wigmann C., Kress S., Schikowski, T. & Schwender, H. (2022). Evaluation of tree-based statistical learning methods for constructing genetic risk scores. BMC Bioinformatics 23:97. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1186/s12859-022-04634-w")}
Breiman, L. (1996). Bagging predictors. Machine Learning 24:123–140. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1007/BF00058655")}
Breiman, L. (2001). Random Forests. Machine Learning 45:5–32. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1023/A:1010933404324")}
Friedman J., Hastie T. & Tibshirani R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software 33(1):1–22. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.18637/jss.v033.i01")}
# Generate toy data
set.seed(101299)
maf <- 0.25
n.snps <- 10
N <- 500
X <- matrix(sample(0:2, n.snps * N, replace = TRUE,
prob = c((1-maf)^2, 1-(1-maf)^2-maf^2, maf^2)),
ncol = n.snps)
colnames(X) <- paste("SNP", 1:n.snps, sep="")
E <- rnorm(N, 20, 10)
E[E < 0] <- 0
# Generate outcome with a GxE interaction
y.GxE <- -0.75 + log(2) * (X[,"SNP1"] != 0) +
log(4) * E/20 * (X[,"SNP2"] != 0 & X[,"SNP3"] == 0) +
rnorm(N, 0, 2)
# Test for GxE interaction (Wald test for G:E)
summary(GRSxE(X, y.GxE, E))
# Generate outcome without a GxE interaction
y.no.GxE <- -0.75 + log(2) * (X[,"SNP1"] != 0) +
log(4) * E/20 + log(4) * (X[,"SNP2"] != 0 & X[,"SNP3"] == 0) +
rnorm(N, 0, 2)
# Test for GxE interaction (Wald test for G:E)
summary(GRSxE(X, y.no.GxE, E))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.