fitGAPLM: Fit a Generalized Additive Partially Linear Model on Gene...
In plmDE: Additive partially linear models for differential gene expression analysis

Description Usage Arguments Value Note Author(s) References See Also Examples

Given an plmDE object containing preprocessed/normalized measures of the expression of a set of genes under different conditions as well as related values of quantitatively-measured covariates of interest, fitGAPLM tests each gene for differential expression under a model specified by the user. The test is conducted based on the significance of a full Model fit to the expression data when compared with the fit of a reduced model (F statistic). The variables of interest should be present in the full model and absent in the reduced. This method is very flexible and can fit count data (eg. expression measures from high-throughput sequencing) as well as microarray data. Using fitGAPLM, the user can choose to model the gene expression measures by any mixture of additive functions of the numerical variables with linear terms of the factorial information available. Each of these functions is approximated through a B-spline fit with the intercept of the spline constrained at zero for identifiability. Although fitGAPLM seems to take in a daunting amount of input, many of the inputs already set to sensible defaults, and models of the complexity represented in this class must be well thought out and each parameter requires careful consideration.

fitGAPLM(dataObject, generalizedLM = FALSE, family = poisson(link = log),
 NegativeBinomialUnknownDispersion = FALSE, test = "LRT", weights = NULL, 
offset = NULL, pValueAdjustment = "fdr", significanceLevel = 0.05, 
indicators.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.fullModel = NULL, 
groups.fullModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
groupFunction.fullModel = rep("AdditiveSpline", length(groups.fullModel)), 
fitSplineFromData.fullModel = TRUE, 
splineDegrees.fullModel = rep(3, length(groups.fullModel)), 
splineKnots.fullModel = rep(0, length(groups.reducedModel)), 
compareToReducedModel = FALSE, 
indicators.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]), 
continuousCovariates.reducedModel = NULL, 
groups.reducedModel = as.character(unique(dataObject$sampleInfo[,2])[-1]),
 groupFunction.reducedModel = rep("AdditiveSpline", 
length(groups.reducedModel)), fitSplineFromData.reducedModel = TRUE,
 splineDegrees.reducedModel = rep(3, length(groups.reducedModel)), 
splineKnots.reducedModel = rep(0, length(groups.reducedModel)), 
splineKnotSpread = "quantile")

`dataObject`	Object of type `plmDE` containing the gene expression and sample information.
`generalizedLM`	If `TRUE`, a link function is introduced to generalize the linear model. Use for gene-level count data.
`family`	One of the distribution families that may be used in the function `glm`. For gene-level count data, the negative binomial (see `negative.binomial`) is recommended to account for over dispersion.
`NegativeBinomialUnknownDispersion`	In the case of a negative binomial fit, has the dispersion of the data been estimated or does it remain unknown? If `TRUE`, then `glm.nb` from the MASS package is called, which includes routines for fitting the GLM and estimating the dispersion parameter.
`test`	The test that should be used in the case that a GLM is requested to estimate the significance of the model. See `stat.anova` for details.
`weights`	an optional vector of prior weights to be used in the fitting of the (generalized) linear model. Should be `NULL` or a numeric factor.
`offset`	an optional a priori known component to be included in the fitting of the (generalized) linear model. One or more `offset` terms may be included in the model.
`pValueAdjustment`	Choice of multiple testing correction method to be passed to `p.adjust`
`significanceLevel`	The significance level at which genes should be identified as differentially expressed.
`indicators.fullModel`	The indicator terms which should go into the full model. These must match the groups in the second column of the sample information in `dataObject`. Under the default setting, the indicators will consist of all groups except for the first one (used as the baseline for comparison).
`continuousCovariates.fullModel`	The quantitative covariates that should go into the full model. These must match the column names of the sample information in `dataObject`.
`groups.fullModel`	The subgroups of our sample for which we wish to estimate a function relating their measurement of `continuousCovariates` to their expression levels in `dataObject`.
`groupFunction.fullModel`	A vector of the same length as `groups.fullModel` which contains consists of strings matching: "AdditiveSpline", "AdditiveLinear", "CommonSpline", or "CommonLinear". If AdditiveSpline is chosen, then a B-spline basis is fitted to the `continuousCovariate` values of the corresponding group in `groups.fullModel` to estimate a function that represents the effect of this group's `continuousCovariate` values on their measured expression levels. This function implicitly assumes an indicator term so it evaluates to 0 for the measurements of `continuousCovariate` from other groups, and its overall effects are assumed to be additive with respect to the other parameters being estimated. If "AdditiveLinear" is selected, then this function is taken to be the identity function (no spline basis fit) times a parameter to be fit by the model. To estimate one function to account for the same effect across multiple groups, they must all be listed in `groups.fullModel` and their corresponding index in `goupFunction` must be set to "CommonSpline". Likewise to assume a linear effect across multiple groups, they must also be listed in `groups.fullModel` and the corresponding indices of `groupFunction` must read "CommonLinear",
`fitSplineFromData.fullModel`	Should the B-spline functions in the full model be automatically fitted based on the heuristic in `fitBspline`?
`splineDegrees.fullModel`	If `fitSplineFromData.fullModel` has not been selected, then the user may specify, in a vector format, the degree of each B-spline basis that is fitted to the groups.
`splineKnots.fullModel`	If `fitSplineFromData.fullModel` has not been selected, then the user may also specify, in a vector, the number of knots to include in each corresponding basis.
`compareToReducedModel`	If `TRUE`, then the user must specify a model that the full model should be tested against. Otherwise, the all terms (besides intercept) of the full model are simultaneously tested for significance.
`indicators.reducedModel`	See corresponding parameter for full model.
`continuousCovariates.reducedModel`	See corresponding parameter for full model.
`groups.reducedModel`	See corresponding parameter for full model.
`groupFunction.reducedModel`	See corresponding parameter for full model.
`fitSplineFromData.reducedModel`	See corresponding parameter for full model.
`splineDegrees.reducedModel`	See corresponding parameter for full model.
`splineKnots.reducedModel`	See corresponding parameter for full model.
`splineKnotSpread`	Determines whether B-spline knots are uniformly spread over the range of the data or over the quartiles of the data (takes values: "uniform" or "quantile"), but does not affect the `fitBSpline` method.

Returns an object of type DEresults containing various information about the analysis.

`allgenes`	Data frame consisting of information on all the genes, their p-values and adjusted p-values, and whether or not this test identifies them as differentially expressed.
`DEgenes`	Data frame consisting of the genes which were expressed at significantly differing levels according to this model.
`PredictorFormula.fullModel`	contains the formula followed by the predictors in the B-spline-approximated linear model, but leaves the dependent variable term out.
`PredictorFormula.reducedModel`	contains the formula followed by the predictors in the reduced model (leaving out the dependent variable term).
`modelForm.fullModel`	contains the indicators and covariates incorporated into the full model.
`modelForm.reducedModel`	contains the indicators and covariates incorporated into the reduced model.
`GLMinfo`	tracks the glm parameters used in the fitting of this model (for plotting purposes).

Because fitGAPLM is implemented in R rather than a compiled language, it tends to run slowly for larger expression assays (takes ~20 min to run an analysis on 65 samples of ~50,000 probes from the HG-U133 array). If the GAPLM is intended to be fit to Microarray data, the limmaPLM function should be used instead. However, fitGAPLM must be used if un-moderated F-test statistics or plots of the fitted functions are desired.

Jonas Mueller

Wang, L., Liu, X., Liang, H., and Carroll, R. J. Generalized Additive Partial Linear Models- Polynomial Spline Smoothing Estimation and Variable Selection Procedures. The Annals of Statistics 39:4, 1827-1851 (2011)

limmaPLM for analysis of microarray data. fitBspline for default spline fitting heuristic.

## create an object of type \code{plmDE} containing disease with 
## "control" and "disease" and measures of weight and severity:
ExpressionData = as.data.frame(matrix(abs(rnorm(10000, 1, 1.5)), ncol = 100))
names(ExpressionData) = sapply(1:100, function(x) paste("Sample", x))
Genes = sapply(1:100, function(x) paste("Gene", x))
DataInfo = data.frame(sample = names(ExpressionData), group = c(rep("Control", 50),
 rep("Diseased", 50)), weight = abs(rnorm(100, 50, 20)), severity = c(rep(0, 50),
 abs(rnorm(50, 100, 20))))
plmDEobject = plmDEmodel(Genes, ExpressionData, DataInfo)

## test whether severity and the indicator variable
## for disease are simultaneously significant:
test = fitGAPLM(plmDEobject, continuousCovariates.fullModel = 
c("weight", "severity"), compareToReducedModel = TRUE, 
indicators.reducedModel = NULL, continuousCovariates.reducedModel = "weight")