DataBinning: Data binning
In jipingw/RiboDiPA: Differential pattern analysis for Ribo-seq data

dataBinning

R Documentation

Data binning

Description

This function bins a mapped P-site data matrix for a given gene into a binned matrix, for statistical testing downstream. Data can be adaptively binned, where each gene has a different number of bins and bin widths, but the bin positions for a given gene are the same across different conditions and replicates. Alternatively, data can also be binned into bins of fixed width, down to the single-codon level.

Usage

dataBinning(data, bin.width = 0, zero.omit = FALSE, 
    bin.from.5UTR = TRUE, cores = NULL)

Arguments

`data`	A list of mapped P-site position matrices from the `coverage` object of the `psiteMapping` function. In each element of the list, rows correspond to replicates, while columns correspond to nucleotides across the total transcript.
`bin.width`	Binning width per bin. If specified, it is the number of codons merged per bin; if not specified, an adaptive binning width method is used.
`zero.omit`	If the `zero.omit` argument is set to `TRUE`, bins with zero mapped P-site counts across all replicates are removed from the differential pattern analysis.
`bin.from.5UTR`	When the coding region length is not any integer multiple of binning width, and if value of `bin.from.5UTR` is set to `TRUE`, the uneven width bins will be arranged at the 3' end of the total transcript. If set to `FALSE`, binning will proceed from the 3' end.
`cores`	The number of cores to use for parallel execution. If not specified, the number of cores is set to the value of `detectCores(logical = FALSE)`.

Details

We recommend to use an adaptive bin width h following the Freedman-Diaconis rule,

h= 2*IQR/m^(1/3)

. To see certain regions of transcripts in greater detail (e.g. near the start and stop codons), a specified bin.width per bin can be used to check the local differential pattern, though it may lead to low power at small fold change positions and potentially high computational time.

Value

A list of binned P-site footprint matrices: in each matrix, rows correspond to replicates, columns correspond to bins. Bin names are set to "start-end" genomic coordinates.

Examples

data(data.psite)
data.binned <- dataBinning(data = data.psite$coverage, bin.width = 0, 
    zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)
data.codon <- dataBinning(data = data.psite$coverage, bin.width = 1, 
    zero.omit = FALSE, bin.from.5UTR = TRUE, cores = 2)

jipingw/RiboDiPA documentation built on Dec. 10, 2024, 8:46 p.m.