blocksStats: Calculate statistics for regions in the genome
In markrobinsonuzh/Repitools: Epigenomic tools

blocksStats

R Documentation

Calculate statistics for regions in the genome

Description

For each region of interest or TSS, this routine interrogates probes or sequence data for either a high level of absolute signal or a change in signal for some specified contrast of interest. Regions can be surroundings of TSSs, or can be user-specified regions. The function determines if the start and end coordinates of anno should be used as regions or as TSSs, if the up and down coordinates are NULL or are numbers.

Usage

The ANY,data.frame method:
blocksStats{ANY,data.frame}(x, anno, ...)
The ANY,GRanges method:
blocksStats{ANY,GRanges}(x, anno, up = NULL, down = NULL, ...)

Arguments

x:: A GRangesList, AffymetrixCelSet, or a data.frame of data. Or a character vector of BAM paths to the location of the BAM files.
anno:: Either a data.frame or a GRanges giving the gene coordinates or regions of interest. If it is a data.frame, then the column names are (at least) chr, name, start, end. Column strand is also mandatory, if up and down are NULL.
seq.len:: If sequencing reads need to be extended, the fragment size to be used.
p.anno:: A data.frame with (at least) columns chr, position, and index. This is an optional parameter of the AffymetrixCelSet method, because it can be automatically retrieved for such array data. The parameter is also optional, if mapping is not NULL.
mapping:: If a mapping with annotationLookup or annotationBlocksLookup has already been done, it can be passed in, and avoids unnecessary re-conmputing of the mapping list within blocksStats.
chrs:: If p.anno is NULL, and is retrieved from an ACP file, this vector gives the textual names of the chromosomes.
log2.adj:: Whether to take $log_2$ of array intensities.
design:: A design matrix specifying the contrast to compute (i.e. The samples to use and what differences to take.).
up:: The number of bases upstream to consider in calculation of statistics. If not provided, the starts and ends in anno are used as region boundaries.
down:: The number of bases upstream to consider in calculation of statistics. If not provided, the starts and ends in anno are used as region boundaries.
lib.size:: A string that indicates whether to use the total lane count, total count within regions specified by anno, or normalisation to a reference lane by the negative binomial quantile-to-quantile method, as the library size for each lane. For total lane count use "lane", for region sums use "blocks", and for the normalisation use "ref".
robust:: Numeric. If it is 0, then a robust linear model is not fitted. If it is greater than 0, a robust linear model is used, and the number specifies the minimum number of probes a region has to have, for statistics to be reported for that region.
p.adj:: The method used to adjust p-values for multiple testing. Possible values are listed in p.adjust.
Acutoff:: If libSize is "ref", this argument must be provided. Otherwise, it must not. This parameter is a cutoff on the "A" values to take, before calculating trimmed mean.
verbose:: Logical; whether to output commments of the processing.
...: Parameters described above, that are not used in the function called, but are passed further into a private function that uses them in its processing.

Details

For array data, the statstics are either determined by a t-test, or a linear model. For sequencing data, the two groups are assumed to be from a negative binomial distribution, and an exact test is used.

Value

A data.frame, with the same number of rows as there are features described by anno, but with additional columns for the statistics calculated at each feature.

Author(s)

Mark Robinson

Examples

  require(GenomicRanges)
  intensities <- matrix(c(6.8, 6.5, 6.7, 6.7, 6.9,
                          8.8, 9.0, 9.1, 8.0, 8.9), ncol = 2)
  colnames(intensities) <- c("Normal", "Cancer")
  d.matrix <- matrix(c(-1, 1))
  colnames(d.matrix) <- "Cancer-Normal"
  probe.anno <- data.frame(chr = rep("chr1", 5),
                           position = c(4000, 5100, 6000, 7000, 8000), 
                           index = 1:5)
  anno <- GRanges("chr1", IRanges(7500, 10000), '+', name = "Gene 1")
  blocksStats(intensities, anno, 2500, 2500, probe.anno, log2.adj = FALSE, design = d.matrix)

markrobinsonuzh/Repitools documentation built on March 20, 2024, 6:04 a.m.