regionGoodnessOfFit-methods: Calculate goodness-of-fit statistics
In Genominator: Analyze, manage and store genomic data

Description Usage Arguments Details Value Methods References Examples

A generic method for calculating chi-squared goodness-of-fit statistics (See details). Dispatches on either a data.frame or and ExpData object.

## S4 method for signature 'data.frame'
regionGoodnessOfFit(obj,
  denominator = colSums(obj),
  groups = rep("A", ncol(obj)))

## S4 method for signature 'ExpData'
regionGoodnessOfFit(obj, annoData,
  groups = rep("A", length(what)),
  what = getColnames(obj, all = FALSE),
  denominator = c("regions", "lanes"), 
  verbose = getOption("verbose"))

`obj`	`data.frame` or `ExpData`
`annoData`	A data.frame of annotation.
`groups`	A factor or character vector describing which are the replicates.
`denominator`	How to scale the columns to take into account sequencing depth.
`what`	Which columns to choose from the database. Default is all data columns.
`verbose`	Whether or not debugging / timing info should be printed.

This function implements the homogenous Poisson model across lanes as described in the article cited below. This model corresponds to common expression parameter across lanes scaled by a lane-specific offset. Goodness of fit to this model across replicates is a good indication of Poisson variation across lanes. Deviation from this is an indication of overdispersion between replicate lanes.

An list containing the statistics and degrees of freedom. See details. Technically, an S3 object with class genominator.goodness.of.fit

signature(obj = "ExpData"): Here obj represents the results of a call to summarizeByAnnotation or a data.frame with columns representing samples and rows representing regions, i.e. genes. Denominator is how we scale each column, therefore it this must be true: length(denominator) == ncol(obj). Finally, groups determines how columns are aggregated across one another, i.e. which columns are replicates.
signature(obj = "data.frame"): Here annoData is an annotation data frame. groups is as above. what represents the columns to select choose. denominator is either the total lane counts, or the lane counts restricted to annoData, or a vector of length length(groups)

James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Steffen Durinck, and Sandrine Dudoit, "Statistical Inference in mRNA-Seq: Exploratory Data Analysis and Differential Expression" (April 2009). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 247. http://www.bepress.com/ucbbiostat/paper247

ed <- ExpData(system.file(package = "Genominator", "sample.db"),
              tablename = "raw")
data("yeastAnno")
names(regionGoodnessOfFit(ed, yeastAnno))