BSBinAggregate: Bins regions and aggregates methylation data across the...

Description Usage Arguments Value Examples

View source: R/binning_aggregation.R

Description

First bins regions and averages the proportion of methylation for all methylation sites within each bin (ie the methylation of all sites within region 1, bin 1 are averaged, then all sites within region 1, bin 2 are averaged, etc.) Then aggregates methylation across all regions by bin by averaging the proportion of methylation in each corresponding bin (ie all bin1's together, all bin2's together, etc.).

Usage

1
2
BSBinAggregate(BSDT, rangeDT, binNum, minBaseCovPerBin = 500,
  byRegionGroup = TRUE, splitFactor = NULL, hasCoverage = TRUE)

Arguments

BSDT

A single data table that has DNA methylation data on individual sites including a "chr" column with chromosome, a "start" column with the coordinate number for the cytosine, a "methylProp" column with proportion of methylation (0 to 1), optionally a "methylCount" column with number of methylated reads for each site, and optionally a "coverage" column with total number of reads for each site (hasCoverage param).

rangeDT

A data table with the sets of regions to be binned, with columns named "start", "end". Strand may also be given and will affect the output. See "Value" section.

binNum

Number of bins across the region.

minBaseCovPerBin

Filter out bins where the sum of coverage values is less than X before returning.

byRegionGroup

Default TRUE will aggregate methylation over corresponding bins for each region (all bin1's aggregated, all bin2's, etc). byRegionGroup = FALSE is deprecated.

splitFactor

With default NULL, aggregation will be done separately/individually for each sample.

hasCoverage

Default TRUE. Whether there is a coverage column

Value

With splitFactor = NULL, it will return a data.table with binNum rows, containing aggregated methylation data over regions in region set "rangeDT". Each region was split into bins; methylation was put in these bins; Output contains sum of the all corresponding bins for the regions of each region set ie for all regions in each region set: first bins summed, second bins summed, etc. Columns of the output should be "bin", "methylProp", and, if coverage was included as input col, "coverage"

Info about how strand of rangeDT affects output: The MIRA profile will be symmetrical if no strand information is given for the regions (produced by averaging the profile with the reverse of the profile), because the orientation of the regions is arbitrary with respect to biological features (like a promoter for instance) that could be oriented directionally (e.g. 5' to 3'). If strand information is given, regions on the minus strand will be flipped before being aggregated with plus strand regions so the MIRA profile will be in 5' to 3' orientation.

Examples

1
2
3
4
5
6
data("exampleBSDT") # exampleBSDT
data("exampleRegionSet") # exampleRegionSet
exampleBSDT <- addMethPropCol(exampleBSDT)
aggregateBins <- BSBinAggregate(BSDT = exampleBSDT, 
                             rangeDT = exampleRegionSet, 
                             binNum = 11, splitFactor = NULL)

databio/MIRA documentation built on April 16, 2020, 9:53 p.m.