summarizeByAnnotation: Summarize data based on genome annotation.
In Genominator: Analyze, manage and store genomic data

Description Usage Arguments Details Value Author(s) References See Also Examples

This function creates a summarization of columns of the data using specified SQLite functions, applying these summarization function to regions defined in an annotation data frame.

summarizeByAnnotation(expData, annoData,
  what = getColnames(expData, all = FALSE), fxs = c("TOTAL"),
  groupBy = NULL, splitBy = NULL, ignoreStrand = FALSE, bindAnno = FALSE,
  preserveColnames = TRUE, verbose = getOption("verbose"))

`expData`	An object of class `ExpData`.
`annoData`	A data frame which must contain the columns `chr`, `start`, `end` and `strand` which specifies annotation regions of interest.
`what`	Vector of names of data columns to be summarized.
`fxs`	Vector of strings giving the names of SQLite functions to call on the data column(s).
`groupBy`	Character vector refering to a column in `annoData`. Regions will be aggregated over distinct values of this column. Setting this argument will set `bindAnno` to `TRUE`. If `splitBy` is set, `meta.id` will override.
`splitBy`	String indicating column of `annoData` object on which to split results.
`ignoreStrand`	Logical indicating whether strand should be taken into account in aggregation. If `TRUE` strand will be ignored.
`bindAnno`	Logical indicating whether annotation information should be included in the output.
`preserveColnames`	Logical indicating whether column names should be preserved. Only possible when a single function is being applied.
`verbose`	Logical indicating whether details should be printed.

Most of the computation is done using SQLite. Depending on the use case, this approach may be significantly faster and use much less memory than the alternative: use splitByAnnotation to retrieve a list with all the data and then use R to summarize over each element of the list. It is (naturally) constrained to the use of operations expressible in (SQLite) SQL.

If meta.id is set to a column in annoData, all regions with the same value of the meta.id will be joined together; a standard use case is labelleing exons of a gene.

If splitBy is not specified, returns a data frame containing results of aggregation functions performed on each region defined in annoData. If splitBy is specified, returns a list of data frames with one entry for each unique value of the column which was split on.

James Bullard bullard@berkeley.edu, Kasper Daniel Hansen khansen@jhsph.edu

The SQLite website http://www.sqlite.org/lang_aggfunc.html has details on what mathematical functions are implemented.

See Genominator vignette for more information, as well as the ExpData-class.

ed <- ExpData(system.file(package = "Genominator", "sample.db"),
              tablename = "raw")
data("yeastAnno")
summarizeByAnnotation(ed, yeastAnno[1:50,])