signal2bins: Genomic Signal to Summarized Bins

View source: R/signal2bins.R

signal2binsR Documentation

Genomic Signal to Summarized Bins

Description

This function summarizes a genomic signal (variable) split into bins (intervals). The signal must be provided in the metacolumn of a GRanges-class object.

Usage

signal2bins(
  signal,
  regions,
  stat = "mean",
  nbins = 20L,
  nbinsUP = 20L,
  nbinsDown = 20L,
  streamUp = NULL,
  streamDown = NULL,
  absolute = FALSE,
  na.rm = TRUE,
  missings = 0,
  region.size = 200,
  num.cores = 1L,
  tasks = 0L,
  verbose = TRUE,
  ...
)

Arguments

signal

Preferibly a single GRanges object with genomic signals in the meta-columns (each colum carrying a signal) or a list of GRanges objects, each GRanges carrying a signal in the meta-column. For example, methylation levels, any variable regularly measuring some genomic magnitude. This GRanges object can be created by using function uniqueGRanges from MethylIT R package.

regions

A GRanges carrying the genomic region where a summarized statistic can be computed. For example, annotated gene coordinates.

stat

Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are: 'mean', geometric mean ('gmean'), 'median', 'density', 'count' and 'sum' (default). Here, we define 'density' as the sum of values from the variable of interest in the given region devided by the length/width of the region. The option 'count' compute the number/count of positions in the specified regions with values greater than zero in the selected 'column'.

nbins, nbinsUP, nbinsDown

An integer denoting the number of bins used to split the regions, upstream the main regions, and downstream the main regions, respectively.

streamUp, streamDown

An interger denonting how many base-pairs up- and down-stream the provided regions must be include in the computation. Default is NULLL.

absolute

Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.

na.rm

Logical value. If TRUE, the NA values will be removed

missings

Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.

region.size

An integer. The minimun size of a region to be included in the computation. Default 300 (bp).

num.cores, tasks

Paramaters for parallele computation using package BiocParallel-package: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see bplapply and the number of tasks per job (only for Linux OS).

verbose

Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.

...

Argumetns to pass to uniqueGRanges function if GR is a list of GRanges objects.

Details

This function is useful, for example, to get the profile of the metylation signal around genes regions: gene-body plus 2kb upstream of the TSS and 2kb downtream of the TES. The intensity of the signal profile would vary depending on the sample conditions. If a given treatment has an effect on methylation then the intesity of the signal profile for the treatment would go over or below the control samples.

Value

A data.frame object carrying the bin coordinates: binCoord and, for each sample, the signal summarized in the requested statistic: statSumary. Notice that the bin coordinates are relative to original coordinates given in the GR objeect. For example, if the GR object carries genome-wide metylation signals (from several samples) and we are interested in to get the methylation signal profile around the genes regions, then we must provide the gene annotated coordinates in the argument regions, and set up the amount of bp upstream of TSS and dowstream of TES, say, streamUp = 2000 and streamDown = 2000, repectively. Next, if we set nbins = 20L, nbinsUP = 20L, nbinsDown = 20L, then the first and the last 20 bins of the returned signal profile represent 2000 bp each of them. Since gene-body sizes vary genome-wide, there is not a specific number of bp represented by the 20 bins covering the gene-body regions.

Author(s)

Robersy Sanchez. https://genomaths.com

See Also

A faster version: signals2bins.


genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.