getGRegionsStat: Statistic of Genomic Regions
In genomaths/MethylIT.utils: MethylIT utility

getGRegionsStat

R Documentation

Statistic of Genomic Regions

Description

A function to estimate summarized measures of a specified variable given in a GRanges object (a column from the metacolums of the GRanges object) after split the GRanges object into intervals. A faster alternative would be getGRegionsStat2.

Usage

getGRegionsStat(
  GR,
  win.size = 1,
  step.size = 1,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all"),
  verbose = TRUE,
  ...
)

## S4 method for signature 'GRanges'
getGRegionsStat(
  GR,
  win.size = 350,
  step.size = 350,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all")
)

## S4 method for signature 'list'
getGRegionsStat(
  GR,
  win.size = 350,
  step.size = 350,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all"),
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

## S4 method for signature 'InfDiv'
getGRegionsStat(
  GR,
  win.size = 350,
  step.size = 350,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all"),
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

## S4 method for signature 'pDMP'
getGRegionsStat(
  GR,
  win.size = 350,
  step.size = 350,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all"),
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

## S4 method for signature 'GRangesList'
getGRegionsStat(
  GR,
  win.size = 350,
  step.size = 350,
  grfeatures = NULL,
  stat = c("sum", "mean", "gmean", "median", "density", "count", "denCount"),
  absolute = FALSE,
  select.strand = NULL,
  column = 1L,
  prob = FALSE,
  entropy = FALSE,
  maxgap = -1L,
  minoverlap = 0L,
  scaling = 1000L,
  logbase = 2,
  missings = 0,
  type = c("within", "any", "start", "end", "equal"),
  ignore.strand = FALSE,
  naming = FALSE,
  col.names = NULL,
  output = c("hits", "all"),
  num.cores = 1L,
  tasks = 0,
  verbose = TRUE,
  ...
)

Arguments

`GR`	A GRange object or a list of GRanges object with the variable of interest in the GRanges metacolumn.
`win.size`	An integer for the size of the windows/regions size of the intervals of genomics regions.
`step.size`	Interval at which the regions/windows must be defined
`grfeatures`	A GRanges object corresponding to an annotated genomic feature. For example, gene region, transposable elements, exons, intergenic region, etc. If provided, then parameters 'win.size' and step.size are ignored and the statistics are estimated for 'grfeatures'.
`stat`	Statistic used to estimate the summarized value of the variable of interest in each interval/window. Posible options are: 'mean': The mean of values inside each region. 'gmean': The geometric mean of values inside each region. 'median': The median of values inside each region. 'density': The density of values inside each region. That is, the sum of values found in each region divided by the width of the region. 'count': Compute the number/count of positions with values greater than zero inside each regions. 'denCount': The number of sites with value > 0 inside each region divided by the width of the region. 'sum': The sum of values inside each region.
`absolute`	Optional. Logic (default: FALSE). Whether to use the absolute values of the variable provided. For example, the difference of methylation levels could take negative values (TV) and we would be interested on the sum of abs(TV), which is sum of the total variation distance.
`select.strand`	Optional. If provided,'+' or '-', then the summarized statistic is computed only for the specified DNA chain.
`column`	Integer number denoting the column where the variable of interest is located in the metacolumn of the GRanges object.
`prob`	Logic. If TRUE and the variable of interest has values between zero and 1, then the summarized statistic is comuputed using Fisher's transformation.
`entropy`	Logic. Whether to compute the entropy at each site from the specified regions. All the values from the selected column must belong to the interval [0, 1]. Next, the requested statistics for the entropy values at each site inside the regions is computed.
`maxgap, minoverlap, type`	See ?findOverlaps in the IRanges package for a description of these arguments.
`scaling`	integer (default 1). Scaling factor to be used when stat = 'density'. For example, if scaling = 1000, then density * scaling denotes the sum of values in 1000 bp.
`logbase`	A positive number: the base with respect to which logarithms are computed when parameter 'entropy = TRUE' (default: logbase = 2).
`missings`	Whether to write '0' or 'NA' on regions where there is not data to compute the statistic.
`ignore.strand`	When set to TRUE, the strand information is ignored in the overlap calculations.
`naming`	Logical value. If TRUE, the rows GRanges object will be given the names(GR). Default is FALSE.
`col.names`	Optional. An integer denoting the column where the names/gene-id of each region is provided. If naming = TRUE and the names/gene-id of the regions are given in specific column from the object grfeatures, then col.names can be specified. Region's names can be also given as names in object grfeatures, i.e., they can be taken calling names(grfeatures). In this last case col.names is not needed.
`output`	A string. Setting output = 'all' will return all the regions given in 'grfeatures'. Default is output = 'hits'.
`verbose`	Logical. Default is TRUE. If TRUE, then the progress of the computational tasks is given.
`num.cores, tasks`	Paramaters for parallele computation using package `BiocParallel-package`: the number of cores to use, i.e. at most how many child processes will be run simultaneously (see `bplapply` and the number of tasks per job (only for Linux OS).
`maxgap, minoverlap, type, select, ignore.strand`	Used to find overlapped regions. See `findOverlaps-methods` in the IRanges package for a description of these arguments.

Details

This function split a Grange object into intervals genomic regions (GR) of fixed size (as given in function 'tileMethylCounts2' R package methylKit, with small changes). A summarized statistic (mean, median, geometric mean or sum) is calculated for the specified variable values from each region. Notice that if win.size == step.size, then non-overlapping windows are obtained.

Value

An object of the same class of GR with the new genomic regions and their corresponding summarized statistic.

Author(s)

Robersy Sanchez (https://github.com/genomaths).

Examples

library(GenomicRanges)
gr <- GRanges(seqnames = Rle( c('chr1', 'chr2', 'chr3', 'chr4'),
            c(5, 5, 5, 5)),
            ranges = IRanges(start = 1:20, end = 1:20),
            strand = rep(c('+', '-'), 10),
            GC = seq(1, 0, length = 20))
grs <- getGRegionsStat(gr, win.size = 4, step.size = 4)
grs

## Selecting the positive strand
grs <- getGRegionsStat(gr, win.size = 4, step.size = 4, select.strand = '+')
grs

## Selecting the negative strand
grs <- getGRegionsStat(gr, win.size = 4, step.size = 4, select.strand = '-')
grs

## Operating over a list of GRanges objects
gr2 <- GRanges(seqnames = Rle( c('chr1', 'chr2', 'chr3', 'chr4'),
                            c(5, 5, 5, 5)),
                ranges = IRanges(start = 1:20, end = 1:20),
                strand = rep(c('+', '-'), 10),
                GC = runif(20))

grs <- getGRegionsStat(list(gr1 = gr, gr2 = gr2), win.size = 4, step.size=4)

## Compute the density of entropy inside each region
gr$GC <- runif(20) 
getGRegionsStat(gr, win.size = 4, step.size = 4, entropy = TRUE, 
                stat = "density")

genomaths/MethylIT.utils documentation built on July 4, 2023, 12:05 a.m.