getGRangesStat: Statistic of Genomic Regions
In genomaths/MethylIT: Methylation Analysis Based on Signal Detection

getGRangesStat

R Documentation

Statistic of Genomic Regions

Description

A function to estimate the summarized measures of a specified variable given in a GRanges object (a column from the metacolums of the GRanges object) after split the GRanges object into intervals.

Usage

getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0L,
  verbose = TRUE,
  ...
)

## S4 method for signature 'pDMP'
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

## S4 method for signature 'InfDiv'
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

## S4 method for signature 'list'
getGRangesStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  stat2 = NULL,
  stat3 = NULL,
  column = NULL,
  absolute = FALSE,
  absolute2 = FALSE,
  absolute3 = FALSE,
  select.strand = NULL,
  maxgap = -1L,
  minoverlap = 0L,
  select = "all",
  ignore.strand = TRUE,
  type = c("within", "start", "end", "equal", "any"),
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  naming = FALSE,
  na.rm = TRUE,
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

Arguments

`GR`	A `GRanges-class` object or a `GRangesList-class` object carrying the variables of interest in the GRanges metacolumn(s).
`win.size`	An integer for the size of the windows/regions size of the intervals of genomics regions.
`step.size`	Interval at which the regions/windows must be defined
`grfeatures`	A GRanges object corresponding to an annotated genomic feature. For example, gene region, transposable elements, exons, intergenic region, etc. If provided, then parameters 'win.size' and step.size are ignored and the statistics are estimated for 'grfeatures'.
`stat`	Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are: 'mean': The mean of values inside each region. 'gmean': The geometric mean of values inside each region. 'median': The median of values inside each region. 'density': The density of values inside each region. That is, the sum of values found in each region divided by the width of the region. 'count': Compute the number/count of positions with values greater than zero inside each regions. 'denCount': The number of sites with value > 0 inside each region divided by the width of the region. 'sum': The sum of values inside each region. If GR have zero metacolum, then it is set stat = "count" and all the sites are included in the computation.
`stat2, stat3`	The same as for 'stat' argument. If provided, the statistic selected in 'stat2' and stat3 will be also reported.
`column`	Integer number denoting the column where the variable of interest is located in the metacolumn of the GRanges object. Default is 1L if the number of columns is greater than 1, otherwise NULL.
`absolute`	Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.
`absolute2, absolute3`	The same as for 'absolute' argument, but applied when 'stat2' and 'stat3' are not null, respectively.
`select.strand`	Optional. If provided,'+' or '-', then the summarized statistic is computed only for the specified DNA chain.
`maxgap, minoverlap, type`	See `findOverlaps-methods` in the IRanges package for a description of these arguments.
`ignore.strand`	When set to TRUE, the strand information is ignored in the overlap calculations.
`scaling`	integer (default 1). Scaling factor to be used when stat = 'density'. For example, if scaling = 1000, then density * scaling denotes the sum of values in 1000 bp.
`logbase`	A positive number: the base with respect to which logarithms are computed when parameter 'entropy = TRUE' (default: logbase = 2).
`missings`	Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.
`naming`	Logical value. If TRUE, the rows GRanges object will be given the names(grfeatures). Default is FALSE.
`na.rm`	Logical value. If TRUE, the NA values will be removed.
`num.cores, tasks`	Parameters for parallel computation using package `BiocParallel-package`: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see `bplapply` and the number of tasks per job (only for Linux OS). Only used when signal is a list of `GRanges-class` object.
`verbose`	Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.

Details

This function split a Grange object into intervals genomic regions (GRs) of fixed size A summarized statistic (mean, median, geometric mean or sum) is calculated for the specified variable values from each region. Notice that if win.size == step.size, then non-overlapping windows are obtained.

Value

A GRanges-class object or a GRangesList-class object with the new genomic regions and their corresponding summarized statistic.

Author(s)

Robersy Sanchez (https://github.com/genomaths).

Examples

library(GenomicRanges)
set.seed(1)
gr <- GRanges(seqnames = Rle( c('chr1', 'chr2', 'chr3', 'chr4'),
            c(5, 5, 5, 5)),
            ranges = IRanges(start = 1:20, end = 1:20),
            strand = rep(c('+', '-'), 10),
            A = seq(1, 0, length = 20))
gr$B <- runif(20)
grs <- getGRangesStat(gr, win.size = 4, step.size = 4)
grs

## Selecting the positive strand
grs <- getGRangesStat(gr, win.size = 4,
             step.size = 4, select.strand = '+')
grs

## Selecting the negative strand
grs <- getGRangesStat(gr, win.size = 4,
             step.size = 4, select.strand = '-')
grs

genomaths/MethylIT documentation built on Feb. 3, 2024, 1:24 a.m.