populationRanges: Summarizing CNV ranges across a population

View source: R/population_ranges.R

populationRangesR Documentation

Summarizing CNV ranges across a population

Description

In CNV analysis, it is often of interest to summarize individual calls across the population, (i.e. to define CNV regions), for subsequent association analysis with e.g. phenotype data.

Usage

populationRanges(
  grl,
  mode = c("density", "RO"),
  density = 0.1,
  ro.thresh = 0.5,
  multi.assign = FALSE,
  verbose = FALSE,
  min.size = 2,
  classify.ranges = TRUE,
  type.thresh = 0.1,
  est.recur = FALSE
)

Arguments

grl

A GRangesList.

mode

Character. Should population ranges be computed based on regional density ("density") or reciprocal overlap ("RO"). See Details.

density

Numeric. Defaults to 0.1.

ro.thresh

Numeric. Threshold for reciprocal overlap required for merging two overlapping regions. Defaults to 0.5.

multi.assign

Logical. Allow regions to be assigned to several region clusters? Defaults to FALSE.

verbose

Logical. Report progress messages? Defaults to FALSE.

min.size

Numeric. Minimum size of a summarized region to be included. Defaults to 2 bp.

classify.ranges

Logical. Should CNV frequency (number of samples overlapping the region) and CNV type (gain, loss, or both) be annotated? Defaults to TRUE.

type.thresh

Numeric. Required minimum relative frequency of each CNV type (gain / loss) to be taken into account when assigning CNV type to a region. Defaults to 0.1. That means for a region overlapped by individual gain and loss calls that both types must be present in >10 in order to be typed as 'both'. If gain or loss calls are present below the threshold they are ignored.

est.recur

Logical. Should recurrence of regions be assessed via a permutation test? Defaults to FALSE. See Details.

Details

  • CNVRuler procedure that trims region margins based on regional density

    Trims low-density areas (usually <10% of the total contributing individual calls within a summarized region).

    An illustration of the concept can be found here: https://www.ncbi.nlm.nih.gov/pubmed/22539667 (Figure 1)

  • Reciprocal overlap (RO) approach (e.g. Conrad et al., Nature, 2010)

    Reciprocal overlap of 0.51 between two genomic regions A and B:

    requires that B overlaps at least 51% of A, *and* that A also overlaps at least 51% of B

    Approach:

    At the top level of the hierarchy, all contiguous bases overlapping at least 1bp of individual calls are merged into one region. Within each region, we further define reciprocally overlapping regions with the following algorithm:

    • Calculate reciprocal overlap (RO) between all remaining calls.

    • Identify pair of calls with greatest RO. If RO > threshold, merge and create a new CNV. If not, exit.

    • Continue adding unclustered calls to the region, in order of best overlap. In order to add a call, the new call must have > threshold to all calls within the region to be added. When no additional calls may be added, move to next step.

    • If calls remain, return to 1. Otherwise exit.

  • GISTIC procedure (Beroukhim et al., PNAS, 2007) to identify recurrent CNV regions

    GISTIC scores each CNV region with a G-score that is proportional to the total magnitude of CNV calls in each CNV region. In addition, by permuting the locations in each sample, GISTIC determines the frequency with which a given score would be attained if the events were due to chance and therefore randomly distributed. A significance threshold can then be used to determine scores / regions that are unlikely to occur by chance alone.

Value

A GRanges object containing the summarized CNV ranges.

Author(s)

Ludwig Geistlinger, Martin Morgan

References

Kim et al. (2012) CNVRuler: a copy number variation-based case-control association analysis tool. Bioinformatics, 28(13):1790-2.

Conrad et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature, 464(7289):704-12.

Beroukhim et al. (2007) Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. PNAS, 104(50):20007-12.

See Also

findOverlaps

Examples


grl <- GRangesList(
     sample1 = GRanges( c("chr1:1-10", "chr2:15-18", "chr2:25-34") ),
     sample2 = GRanges( c("chr1:1-10", "chr2:11-18" , "chr2:25-36") ),
     sample3 = GRanges( c("chr1:2-11", "chr2:14-18", "chr2:26-36") ),
     sample4 = GRanges( c("chr1:1-12", "chr2:18-35" ) ),
     sample5 = GRanges( c("chr1:1-12", "chr2:11-17" , "chr2:26-34") ) ,
     sample6 = GRanges( c("chr1:1-12", "chr2:12-18" , "chr2:25-35") )
)

# default as chosen in the original CNVRuler procedure
populationRanges(grl, density=0.1, classify.ranges=FALSE)

# density = 0 merges all overlapping regions, 
# equivalent to: reduce(unlist(grl))
populationRanges(grl, density=0, classify.ranges=FALSE) 

# density = 1 disjoins all overlapping regions, 
# equivalent to: disjoin(unlist(grl))
populationRanges(grl, density=1, classify.ranges=FALSE)

# RO procedure
populationRanges(grl, mode="RO", ro.thresh=0.5, classify.ranges=FALSE)


waldronlab/CNVRanger documentation built on May 3, 2024, 3:21 p.m.