DataBinning: Data binning

dataBinningR Documentation

Data binning

Description

This function bins a mapped P-site data matrix for a given gene into a binned matrix, for statistical testing downstream. Data can be adaptively binned, where each gene has a different number of bins and bin widths, but the bin positions for a given gene are the same across different conditions and replicates. Alternatively, data can also be binned into bins of fixed width, down to the single-codon level.

Usage

dataBinning(data, bin.width = 0, zero.omit = FALSE, 
    bin.from.5UTR = TRUE, cores = NULL)

Arguments

data

A list of mapped P-site position matrices from the coverage object of the psiteMapping function. In each element of the list, rows correspond to replicates, while columns correspond to nucleotides across the total transcript.

bin.width

Binning width per bin. If specified, it is the number of codons merged per bin; if not specified, an adaptive binning width method is used.

zero.omit

If the zero.omit argument is set to TRUE, bins with zero mapped P-site counts across all replicates are removed from the differential pattern analysis.

bin.from.5UTR

When the coding region length is not any integer multiple of binning width, and if value of bin.from.5UTR is set to TRUE, the uneven width bins will be arranged at the 3' end of the total transcript. If set to FALSE, binning will proceed from the 3' end.

cores

The number of cores to use for parallel execution. If not specified, the number of cores is set to the value of detectCores(logical = FALSE).

Details

We recommend to use an adaptive bin width h following the Freedman-Diaconis rule,

h= 2*IQR/m^(1/3)

. To see certain regions of transcripts in greater detail (e.g. near the start and stop codons), a specified bin.width per bin can be used to check the local differential pattern, though it may lead to low power at small fold change positions and potentially high computational time.

Value

A list of binned P-site footprint matrices: in each matrix, rows correspond to replicates, columns correspond to bins. Bin names are set to "start-end" genomic coordinates.

See Also

psiteMapping

Examples

data(data.psite)
data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, 
    zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
data.codon <- dataBinning(data = data.psite$coverage, bin.width = 1, 
    zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)

jipingw/RiboDiPA documentation built on Dec. 10, 2024, 8:46 p.m.