stattest: statistical tests for differential expression in ballgown
In alyssafrazee/ballgown: Flexible, isoform-level differential expression analysis

Description Usage Arguments Details Value Author(s) References Examples

Test each transcript, gene, exon, or intron in a ballgown object for differential expression, using comparisons of linear models.

stattest(
  gown = NULL,
  gowntable = NULL,
  pData = NULL,
  mod = NULL,
  mod0 = NULL,
  feature = c("gene", "exon", "intron", "transcript"),
  meas = c("cov", "FPKM", "rcount", "ucount", "mrcount", "mcov"),
  timecourse = FALSE,
  covariate = NULL,
  adjustvars = NULL,
  gexpr = NULL,
  df = 4,
  getFC = FALSE,
  libadjust = NULL,
  log = TRUE
)

`gown`	name of an object of class `ballgown`
`gowntable`	matrix or matrix-like object with `rownames` representing feature IDs and columns representing samples, with expression estimates in the cells. Provide the feature name with `feature`. You must provide exactly one of `gown` or `gowntable`. NB: gowntable is log-transformed within `stattest` if `log` is `TRUE`, so provide un-logged expression values in `gowntable`.
`pData`	Required if `gowntable` is provided: data frame giving phenotype data for the samples in the columns of `gowntable`. (Rows of `pData` correspond to columns of `gowntable`). If `gown` is used instead, it must have a non-null, valid `pData` slot (and the `pData` argument to `stattest` should be left `NULL`).
`mod`	object of class `model.matrix` representing the design matrix for the linear regression model including covariates of interest
`mod0`	object of class `model.matrix` representing the design matrix for the linear regression model without the covariates of interest.
`feature`	the type of genomic feature to be tested for differential expression. If `gown` is used, must be one of `"gene"`, `"transcript"`, `"exon"`, or `"intron"`. If `gowntable` is used, this is just used for labeling and can be whatever the rows of `gowntable` represent.
`meas`	the expression measurement to use for statistical tests. Must be one of `"cov"`, `"FPKM"`, `"rcount"`, `"ucount"`, `"mrcount"`, or `"mcov"`. Not all expression measurements are available for all features. Leave as default if `gowntable` is provided.
`timecourse`	if `TRUE`, tests whether or not the expression profiles of genomic features vary over time (or another continuous covariate) in the study. Default `FALSE`. Natural splines are used to fit time profiles, so you must have more timepoints than degrees of freedom used to fit the splines. The default df is 4.
`covariate`	string representing the name of the covariate of interest for the differential expression tests. Must correspond to the name of a column of `pData(gown)`. If `timecourse=TRUE`, this should be the study's time variable.
`adjustvars`	optional vector of strings representing the names of potential confounders. Must correspond to names of columns of `pData(gown)`.
`gexpr`	optional data frame that is the result of calling `gexpr(gown))`. (You can speed this function up by pre-creating `gexpr(gown)`.)
`df`	degrees of freedom used for modeling expression over time with natural cubic splines. Default 4. Only used if `timecourse=TRUE`.
`getFC`	if `TRUE`, also return estimated fold changes (adjusted for library size and confounders) between populations. Only available for 2-group comparisons at the moment. Default `FALSE`.
`libadjust`	library-size adjustment to use in linear models. By default, the adjustment is defined as the sum of the sample's log expression measurements below the 75th percentile of those measurements. To use a different library-size adjustment, provide a numeric vector of each sample's adjustment value. Entries of this vector correspond to samples in in rows of `pData`. If no library size adjustment is desired, set to FALSE.
`log`	if `TRUE`, outcome variable in linear models is log(expression+1), otherwise it's expression. Default TRUE.

At minimum, you need to provide a ballgown object or count table, the type of feature you want to test (gene, transcript, exon, or intron), the expression measurement you want to use (FPKM, cov, rcount, etc.), and the covariate of interest, which must be the name of one of the columns of the 'pData' component of your ballgown object (or provided pData). This covariate is automatically converted to a factor during model fitting in non-timecourse experiments.

By default, models are fit using log2(meas + 1) as the outcome for each feature. To disable the log transformation, provide 'log = FALSE' as an argument to 'stattest'. You can use the gowntable option if you'd like to to use a different transformation.

Library size adjustment is performed by default by using the sum of the log nonzero expression measurements for each sample, up to the 75th percentile of those measurements. This adjustment can be disabled by setting libadjust=FALSE. You can use mod and mod0 to specify alternative library size adjustments.

mod and mod0 are optional arguments. If mod is specified, you must also specify mod0. If neither is specified, mod0 defaults to the design matrix for a model including only a library-size adjustment, and mod defaults to the design matrix for a model including a library-size adjustment and covariate. Note that if you supply mod and mod0, covariate, timecourse, adjustvars, and df are ignored, so make sure your covariate of interest and all appropriate confounder adjustments, including library size, are specified in mod and mod0. By default, the library-size adjustment is the sum of all counts below the 75th percentile of nonzero counts, on the log scale (log2 + 1).

Full model details are described in the supplement of http://biorxiv.org/content/early/2014/03/30/003665.

data frame containing the columns feature, id representing feature id, pval representing the p-value for testing whether this feature was differentially expressed according to covariate, and qval, the estimated false discovery rate using this feature's signal strength as a significance cutoff. An additional column, fc, is included if getFC is TRUE.

Jeff Leek, Alyssa Frazee

http://biorxiv.org/content/early/2014/03/30/003665

data(bg)

# two-group comparison:
stat_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='group')

# timecourse test:
pData(bg) = data.frame(pData(bg), time=rep(1:10, 2)) #dummy time covariate
timecourse_results = stattest(bg, feature='transcript', meas='FPKM', 
  covariate='time', timecourse=TRUE)

# timecourse test, adjusting for group:
group_adj_timecourse_results = stattest(bg, feature='transcript', 
  meas='FPKM', covariate='time', timecourse=TRUE, adjustvars='group')

# custom model matrices:
### create example data:
set.seed(43)
sex = sample(c('M','F'), size=nrow(pData(bg)), replace=TRUE)
age = sample(21:52, size=nrow(pData(bg)), replace=TRUE)

### create design matrices:
mod = model.matrix(~ sex + age + pData(bg)$group + pData(bg)$time)
mod0 = model.matrix(~ pData(bg)$group + pData(bg)$time)

### build model: 
adjusted_results = stattest(bg, feature='transcript', meas='FPKM', 
  mod0=mod0, mod=mod)