plmDE-package: Generalized Additive Partially Linear Models for Gene...

Description Details Author(s) References Examples

Description

This package is intended for the analysis of gene expression data which is accompanied by some quantitative measurements (such as weight or tumor size) for each sample. It provides a very flexible framework for testing numerous differential-expression-related hypotheses regarding such data. To properly formulate such hypotheses, one must have a solid grasp of the models on which they are founded, and I therefore provide an introduction to this methodology, which should facilitate successful use of the package.

In a disease for which severity level (or any specific trait of interest) can be numerically expressed by some measure S, it is reasonable to suppose the measured expression level of a gene Y in a profiling experiment (where I_D indicates the presence of the disease) can be described by the following generalized additive partially linear model:

g(E[Y | I_{D}, S]) = β_0 + β_1 I_{D} + I_{D}f(S)

where g is some specified link function and f is a function (with an intercept = 0) which describes the effects of interaction between the disease and its severity level on the expression of the gene. Because of the complex nature of the interactions between genes and their environment, few assumptions are placed on f. Given a expression profiling dataset of this sort, if we identify differentially expressed genes as those for which β_1 = f(S) = 0, we presumably obtain a set of genes whose differential expression is more likely to be attributed to their effect on S in the course of the disease than the set of genes identified as differentially expressed through only testing β_1 = 0 in a simpler model where f is set to 0.

Generalizing this scenario, suppose we now have groups D_1, … D_G and baseline group N into which each sample can be classified, as well as numerous quantitative covariates S_1, … S_C which are measured from each sample. Then, Y_j, the expression level of gene j in a sample X can be modeled as:

g(E[Y_{j} \mid data \ on \ X]) = β_{N,j} + ∑_{i = 1} ^ {G} {β_{i,j} I_{D_i} (X) } + ∑_{i = 1} ^ {C} {I_{D_i} f_{i,j}(S_{i,X})}

From such a model, we can test a number of hypotheses. An example of one that might be of interest would be: For each gene j, simultaneously test whether I_{D_r} f_{2,j} = I_{D_s} f_{2,j} and β_{r,j} = β_{s,j}. The genes whose expression levels are rejected by this test would be candidate members of the set whose expression is involved in changes in S_2 between groups D_r and D_s.

To test such a model, we first express f_{i,j} in terms of a linear combination of predefined basis functions. Then, we can fit a reduced model to the data in which we select one set of coefficients for these basis functions that best fits the expression level data of both groups at gene j, and we fit a full model which adds on top of the reduced model another subset of basis coefficients to better fit the expression levels from the second group. Since both the full and reduced model have been transformed into generalized linear models through the basis approximation, the significance of the additional coefficients in the full model over the reduced can easily be tested (using for example Chi-square or F tests).

This package contains methods to perform such tests, using B-splines as the basis functions, and methods for viewing the fit of the estimated functions on the expression data. All it requires from the user is the specification of the full and reduced models representing the test to be conducted on a dataset of gene expression measurements.

Details

Package: plmDE
Type: Package
Version: 1.0
Date: 2012-05-01
License: GPL Version 2 or newer

Author(s)

Jonas Mueller

Maintainer: <jonasmueller303@hotmail.com>

References

Wang, L., Xiang, L., Liang, H., and Carroll, R. Estimation and variable selection for generalized additive partial linear models. Annals of Statistics 39, 1827-51 (2011).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## create an object of type \code{plmDE} containing disease with 
## "control" and "disease" and measures of weight and severity:
ExpressionData = as.data.frame(matrix(abs(rnorm(10000, 1, 1.5)), ncol = 100))
names(ExpressionData) = sapply(1:100, function(x) paste("Sample", x))
Genes = sapply(1:100, function(x) paste("Gene", x))
DataInfo = data.frame(sample = names(ExpressionData), group = c(rep("Control", 50), 
rep("Diseased", 50)), weight = abs(rnorm(100, 50, 20)), severity = c(rep(0, 50), 
abs(rnorm(50, 100, 20))))
plmDEobject = plmDEmodel(Genes, ExpressionData, DataInfo)

## test whether severity and the indicator variable
## for disease are simultaneously significant:
test = fitGAPLM(plmDEobject, continuousCovariates.fullModel 
= c("weight", "severity"), compareToReducedModel = TRUE, 
indicators.reducedModel = NULL, continuousCovariates.reducedModel = "weight")

## find genes with most evidence for differential expression under the model:
mostDE(test)

## plot the model's fit on the expression data of the 5th gene:
plot(test, "weight", 5, plmDEobject)

plmDE documentation built on May 29, 2017, 6:37 p.m.