getTopRegions: Get regions that are most associated with target variable

getTopRegionsR Documentation

Get regions that are most associated with target variable

Description

Get a GRanges with top regions from the region set based on average feature contribution scores for the regions or the quantile of the region's average feature contribution score based on the distribution of all feature contribution scores for the target variable. Returns average feature contribution score or quantile as GRanges metadata.

Usage

getTopRegions(
  signal,
  signalCoord,
  regionSet,
  signalCol = c("PC1", "PC2"),
  cutoff = 0.8,
  returnQuantile = TRUE
)

Arguments

signal

Matrix of feature contribution scores (the contribution of each epigenetic feature to each target variable). One named column for each target variable. One row for each original epigenetic feature (should be same order as original data/signalCoord). For (an unsupervised) example, if PCA was done on epigenetic data and the goal was to find region sets associated with the principal components, you could use the x$rotation output of prcomp(epigenetic data) as the feature contribution scores/'signal' parameter.

signalCoord

A GRanges object or data frame with coordinates for the genomic signal/original epigenetic data. Coordinates should be in the same order as the original data and the feature contribution scores (each item/row in signalCoord corresponds to a row in signal). If a data.frame, must have chr and start columns (optionally can have end column, depending on the epigenetic data type).

regionSet

A genomic ranges (GRanges) object with regions corresponding to the same biological annotation.

signalCol

A character vector with the names of the sample variables of interest/target variables (e.g. PCs or sample phenotypes).

cutoff

Numeric. Only regions with at least this value will be returned (either above this average 'signal' value or above this quantile if returnQuantile=TRUE).

returnQuantile

Logical. If FALSE, return region averages. If TRUE, for each region, return the quantile of that region's average value based on the distribution of individual feature values in 'signal' for that 'signalCol'.

Value

A GRanges object with region coordinates for regions with scores/quantiles above "cutoff" for any target variable in signalCol. The scores/quantiles for signalCol are given as metadata in the GRanges.

Examples

data("brcaATACCoord1")
data("brcaATACData1")
data("esr1_chr1")
featureContributionScores <- prcomp(t(brcaATACData1))$rotation
topRegions <- getTopRegions(signal=featureContributionScores,
                            signalCoord=brcaATACCoord1,
                            regionSet=esr1_chr1,
                            returnQuantile = TRUE)

databio/COCOA documentation built on Sept. 1, 2023, 5:50 p.m.