avgByBin: Aggregates data by genomic bins

Description Usage Arguments Details Value See Also Examples

View source: R/avgByBin.R

Description

Aggregates data by genomic bins

Usage

1
2
3
avgByBin(xpr, featureData, target_GR, justReturnBins = FALSE, 
   getBinCountOnly = FALSE, FUN = mean, doSampleCor = FALSE, 
   verbose = FALSE)

Arguments

xpr

(data.frame or matrix) Locus-wise values. Rows correspond to genomic intervals (probes, genes, etc.,) while columns correspond to individual samples

featureData

(data.frame or GRanges) Locus coordinates. Row order must match xpr. Column order should be: 1. chrom, 2. locus start, 3. locus end. All elements are assumed to be of identical width. Coordinates must be zero-based or one-based, but not half-open. Coordinate system must match that of target_GR.

target_GR

(GRanges) Target intervals, with coordinate system matching that of featureData.

justReturnBins

(logical) when TRUE, returns the coordinates of the bin to which each row belongs. Does not aggregate data in any way. This output can be used as input for more complex functions with data from each bin.

getBinCountOnly

(logical) when TRUE, does not aggregate or expect xpr. Only returns number of overlapping subject ranges per bin. Speeds up computation.

FUN

(function) function to aggregate data in bin

doSampleCor

(logical) set to TRUE to compute mean pairwise sample correlation (Pearson correlation) for each bin; when TRUE, this function overrides FUN.

verbose

(logical) print status messages

Details

Computed mean value of binned data. This function assumes that all elements in featureData have identical width. If provided with elements of disparate widths, the respective widths are not weighted averaging. This behaviour may change in future versions of IdeoViz. This function allows the user to bin data if this hasn't already been done, and is a step involved in preparing the data for plotOnIdeo(). This function computes binned within-sample average of probes overlapping the same range. Where a range overlaps multiple bins, it gets counted in all.

Value

(GRanges) Binned data or binning statistics; information returned for non-empty bins only. The default for this function is to return binned data; alternately, if justReturnBins=TRUE or getBinCountOnly=TRUE the function will return statistics on bin counts. The latter may be useful to plot spatial density of the input metric.
The flags and output types are presented in order of evaluation precedence:

  1. If getBinCountOnly=TRUE, returns a list with a single entry: bin_ID: (data.frame) bin information: chrom, start, end, width, strand, index, and count. "index" is the row number of target_GR to which this bin corresponds

  2. If justReturnBins=TRUE and getBinCountOnly=FALSE, returns a list with three entries:

    1. bin_ID: same as bin_ID in output 1 above

    2. xpr:(data.frame) B-by-n columns where B is total number of [target_GR, featureData] overlaps (see next entry, binmap_idx) and n is number of columns in xpr; column order matches xpr. Contains sample-wise data "flattened" so that each [target,subject] pair is presented. More formally, entry [i,j] contains expression for overlap of row i from binmap_idx for sample j (where 1 <= i <= B, 1 <= j <= n)

    3. binmap_idx:(matrix) two-column matrix: 1) target_GR row, 2) row of featureData which overlaps with index in column 1. (matrix output of GenomicRanges::findOverlaps()))

  3. Default: If justReturnBins=FALSE and getBinCountOnly=FALSE, returns a GRanges object. Results are contained in the elementMetadata slot. For a dataset with n samples, the table would have (n+1) columns; the first column is bin_count, and indicates number of units contained in that bin. Columns (2:(n+1)) contain binned values for each sample in column order corresponding to that of xpr.
    For doSampleCor=TRUE, result is in a metadata column with name "mean_pairwise"cor". Bins with a single datapoint per sample get a value of NA.

See Also

getIdeo(), getBins()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ideo_hg19 <- getIdeo("hg19")
data(GSM733664_broadPeaks)
chrom_bins <- getBins(c("chr1","chr2","chrX"), 
	ideo_hg19,stepSize=5*100*1000)
# default binning
mean_peak <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]),  
GSM733664_broadPeaks[,1:3], chrom_bins)
# custom function
median_peak <- avgByBin(
data.frame(value=GSM733664_broadPeaks[,7]), 
GSM733664_broadPeaks[,1:3], chrom_bins, FUN=median)
# mean pairwise sample correlation
data(binned_multiSeries)
bins2 <- getBins(c("chr1"), ideo_hg19, stepSize=5e6)
samplecor <- avgByBin(mcols(binned_multiSeries)[,1:3], binned_multiSeries, bins2, doSampleCor=TRUE)
# just get bin count
binstats <- avgByBin(data.frame(value=GSM733664_broadPeaks[,7]), 
GSM733664_broadPeaks[,1:3], chrom_bins, getBinCountOnly=TRUE)

IdeoViz documentation built on Nov. 8, 2020, 8:01 p.m.