fitGAPLM: Fit a Generalized Additive Partially Linear Model on Gene...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/fitGAPLM.R

Description

Given an plmDE object containing preprocessed/normalized measures of the expression of a set of genes under different conditions as well as related values of quantitatively-measured covariates of interest, fitGAPLM tests each gene for differential expression under a model specified by the user. The test is conducted based on the significance of a full Model fit to the expression data when compared with the fit of a reduced model (F statistic). The variables of interest should be present in the full model and absent in the reduced. This method is very flexible and can fit count data (eg. expression measures from high-throughput sequencing) as well as microarray data. Using fitGAPLM, the user can choose to model the gene expression measures by any mixture of additive functions of the numerical variables with linear terms of the factorial information available. Each of these functions is approximated through a B-spline fit with the intercept of the spline constrained at zero for identifiability. Although fitGAPLM seems to take in a daunting amount of input, many of the inputs already set to sensible defaults, and models of the complexity represented in this class must be well thought out and each parameter requires careful consideration.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
fitGAPLM(dataObject, generalizedLM = FALSE, family = poisson(link = log),
 NegativeBinomialUnknownDispersion = FALSE, test = "LRT", weights = NULL, 
offset = NULL, pValueAdjustment = "fdr", significanceLevel = 0.05, 
indicators.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.fullModel = NULL, 
groups.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
groupFunction.fullModel = rep("AdditiveSpline", length(groups.fullModel)), 
fitSplineFromData.fullModel = TRUE, 
splineDegrees.fullModel = rep(3, length(groups.fullModel)), 
splineKnots.fullModel = rep(0, length(groups.reducedModel)), 
compareToReducedModel = FALSE, 
indicators.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.reducedModel = NULL, 
groups.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]),
 groupFunction.reducedModel = rep("AdditiveSpline", 
length(groups.reducedModel)), fitSplineFromData.reducedModel = TRUE,
 splineDegrees.reducedModel = rep(3, length(groups.reducedModel)), 
splineKnots.reducedModel = rep(0, length(groups.reducedModel)), 
splineKnotSpread = "quantile")

Arguments

dataObject

Object of type plmDE containing the gene expression and sample information.

generalizedLM

If TRUE, a link function is introduced to generalize the linear model. Use for gene-level count data.

family

One of the distribution families that may be used in the function glm. For gene-level count data, the negative binomial (see negative.binomial) is recommended to account for over dispersion.

NegativeBinomialUnknownDispersion

In the case of a negative binomial fit, has the dispersion of the data been estimated or does it remain unknown? If TRUE, then glm.nb from the MASS package is called, which includes routines for fitting the GLM and estimating the dispersion parameter.

test

The test that should be used in the case that a GLM is requested to estimate the significance of the model. See stat.anova for details.

weights

an optional vector of prior weights to be used in the fitting of the (generalized) linear model. Should be NULL or a numeric factor.

offset

an optional a priori known component to be included in the fitting of the (generalized) linear model. One or more offset terms may be included in the model.

pValueAdjustment

Choice of multiple testing correction method to be passed to p.adjust

significanceLevel

The significance level at which genes should be identified as differentially expressed.

indicators.fullModel

The indicator terms which should go into the full model. These must match the groups in the second column of the sample information in dataObject. Under the default setting, the indicators will consist of all groups except for the first one (used as the baseline for comparison).

continuousCovariates.fullModel

The quantitative covariates that should go into the full model. These must match the column names of the sample information in dataObject.

groups.fullModel

The subgroups of our sample for which we wish to estimate a function relating their measurement of continuousCovariates to their expression levels in dataObject.

groupFunction.fullModel

A vector of the same length as groups.fullModel which contains consists of strings matching: "AdditiveSpline", "AdditiveLinear", "CommonSpline", or "CommonLinear". If AdditiveSpline is chosen, then a B-spline basis is fitted to the continuousCovariate values of the corresponding group in groups.fullModel to estimate a function that represents the effect of this group's continuousCovariate values on their measured expression levels. This function implicitly assumes an indicator term so it evaluates to 0 for the measurements of continuousCovariate from other groups, and its overall effects are assumed to be additive with respect to the other parameters being estimated. If "AdditiveLinear" is selected, then this function is taken to be the identity function (no spline basis fit) times a parameter to be fit by the model. To estimate one function to account for the same effect across multiple groups, they must all be listed in groups.fullModel and their corresponding index in goupFunction must be set to "CommonSpline". Likewise to assume a linear effect across multiple groups, they must also be listed in groups.fullModel and the corresponding indices of groupFunction must read "CommonLinear",

fitSplineFromData.fullModel

Should the B-spline functions in the full model be automatically fitted based on the heuristic in fitBspline?

splineDegrees.fullModel

If fitSplineFromData.fullModel has not been selected, then the user may specify, in a vector format, the degree of each B-spline basis that is fitted to the groups.

splineKnots.fullModel

If fitSplineFromData.fullModel has not been selected, then the user may also specify, in a vector, the number of knots to include in each corresponding basis.

compareToReducedModel

If TRUE, then the user must specify a model that the full model should be tested against. Otherwise, the all terms (besides intercept) of the full model are simultaneously tested for significance.

indicators.reducedModel

See corresponding parameter for full model.

continuousCovariates.reducedModel

See corresponding parameter for full model.

groups.reducedModel

See corresponding parameter for full model.

groupFunction.reducedModel

See corresponding parameter for full model.

fitSplineFromData.reducedModel

See corresponding parameter for full model.

splineDegrees.reducedModel

See corresponding parameter for full model.

splineKnots.reducedModel

See corresponding parameter for full model.

splineKnotSpread

Determines whether B-spline knots are uniformly spread over the range of the data or over the quartiles of the data (takes values: "uniform" or "quantile"), but does not affect the fitBSpline method.

Value

Returns an object of type DEresults containing various information about the analysis.

allgenes

Data frame consisting of information on all the genes, their p-values and adjusted p-values, and whether or not this test identifies them as differentially expressed.

DEgenes

Data frame consisting of the genes which were expressed at significantly differing levels according to this model.

PredictorFormula.fullModel

contains the formula followed by the predictors in the B-spline-approximated linear model, but leaves the dependent variable term out.

PredictorFormula.reducedModel

contains the formula followed by the predictors in the reduced model (leaving out the dependent variable term).

modelForm.fullModel

contains the indicators and covariates incorporated into the full model.

modelForm.reducedModel

contains the indicators and covariates incorporated into the reduced model.

GLMinfo

tracks the glm parameters used in the fitting of this model (for plotting purposes).

Note

Because fitGAPLM is implemented in R rather than a compiled language, it tends to run slowly for larger expression assays (takes ~20 min to run an analysis on 65 samples of ~50,000 probes from the HG-U133 array). If the GAPLM is intended to be fit to Microarray data, the limmaPLM function should be used instead. However, fitGAPLM must be used if un-moderated F-test statistics or plots of the fitted functions are desired.

Author(s)

Jonas Mueller

References

Wang, L., Liu, X., Liang, H., and Carroll, R. J. Generalized Additive Partial Linear Models- Polynomial Spline Smoothing Estimation and Variable Selection Procedures. The Annals of Statistics 39:4, 1827-1851 (2011)

See Also

limmaPLM for analysis of microarray data. fitBspline for default spline fitting heuristic.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## create an object of type \code{plmDE} containing disease with 
## "control" and "disease" and measures of weight and severity:
ExpressionData = as.data.frame(matrix(abs(rnorm(10000, 1, 1.5)), ncol = 100))
names(ExpressionData) = sapply(1:100, function(x) paste("Sample", x))
Genes = sapply(1:100, function(x) paste("Gene", x))
DataInfo = data.frame(sample = names(ExpressionData), group = c(rep("Control", 50),
 rep("Diseased", 50)), weight = abs(rnorm(100, 50, 20)), severity = c(rep(0, 50),
 abs(rnorm(50, 100, 20))))
plmDEobject = plmDEmodel(Genes, ExpressionData, DataInfo)

## test whether severity and the indicator variable
## for disease are simultaneously significant:
test = fitGAPLM(plmDEobject, continuousCovariates.fullModel = 
c("weight", "severity"), compareToReducedModel = TRUE, 
indicators.reducedModel = NULL, continuousCovariates.reducedModel = "weight")

plmDE documentation built on May 29, 2017, 6:37 p.m.