aggregateMethyl: Aggregate methylation data to get a summary methylation...

Description Usage Arguments Details Value Examples

View source: R/MIRA.R

Description

The main function for aggregating methylation data in MIRA analysis. Aggregates methylation across all regions in a given region set to give a summary methylation profile for each region set.

Usage

1
aggregateMethyl(BSDT, GRList, binNum = 11, minBaseCovPerBin = 500)

Arguments

BSDT

A single data.table that has DNA methylation data on individual sites. Alternatively a BSseq object is allowed which will be converted internally to data.tables. The data.table input should have columns: "chr" for chromosome, "start" for cytosine coordinate, "methylProp" for proportion of methylation (0 to 1), optionally "methylCount" for number of methylated reads, and optionally "coverage" for total number of reads.

GRList

A GRangesList object containing region sets, each set corresponding to a type of regulatory element. Each region set in the list should be named. A named list of data.tables also works.

binNum

How many bins each region should be split into for aggregation of the DNA methylation data.

minBaseCovPerBin

Screen out region sets that have any bins in the final methylation profile with 'sumCoverage' below the 'minBaseCovPerBin' threshold. 'sumCoverage' is an output column: during aggregation, the 'coverage' values for each base in a bin are added, then these sums are added for corresponding bins from all regions, producing a 'sumCoverage' value for each bin. 'minBaseCovPerBin' is only used if there is a "coverage" column in the input methylation data.table. 'sumCoverage' is greater than or equal to the number of separate reads that contributed to a given bin.

Details

Each region is split into bins. For a given set of regions, methylation is first aggregated (averaged) within each bin in each region. Then methylation from corresponding bins from each region are aggregated (averaged) across all regions (all first bins together, all second bins together, etc.), giving an aggregate methylation profile. This process is done for each region set.

Value

a data.table with binNum rows for each region set containing aggregated methylation data. If the input was a BSseq object with multiple samples, a list of data.tables will be returned with one data.table for each sample. Each region was split into bins; methylation was put in these bins; Output contains sum of the all corresponding bins for the regions of each region set, ie for all regions in each region set: first bins summed, second bins summed, etc. Columns of the output should be "bin", "methylProp", "sumCoverage" (only if coverage was an input column, described below), "featureID" (ID for the region set). For information on symmetry of bins and output when a region set has strand info, see ?BSBinAggregate. 'sumCoverage' is calculated as follows: during aggregation, the 'coverage' values for each base in a bin are added, then these sums are added for corresponding bins from all regions, producing a 'sumCoverage' value for each bin.

Examples

1
2
3
data("exampleBSDT", package = "MIRA")
data("exampleRegionSet", package = "MIRA")
exBinDT <- aggregateMethyl(exampleBSDT, exampleRegionSet)

databio/MIRA documentation built on April 16, 2020, 9:53 p.m.