enrichedChrRegions: Find chromosomal regions with a high concentration of hits.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description Usage Arguments Details Value Methods Examples

This function looks for chromosomal regions where there is a large accumulation of hits, e.g. significant peaks in a chip-seq experiment or differentially expressed genes in an rna-seq or microarray experiment. Regions are found by computing number of hits in a moving window and selecting regions based on a FDR cutoff.

1	enrichedChrRegions(hits1, hits2, chrLength, windowSize=10^4-1, fdr=0.05, nSims=10, mc.cores=1)

`hits1`	Object containing hits (chromosome, start, and end). Can be a `GRanges` or `RangedData` object.
`hits2`	Optionally, another object containing hits. If specified, regions will be defined by comparing hits1 vs hits2.
`chrLength`	Named vector indicating the length of each chromosome in base pairs
`windowSize`	Size of the window used to smooth the hit count (see details)
`fdr`	Desired FDR level (see details)
`nSims`	Number of simulations to be used to estimate the FDR
`mc.cores`	Number of processors to be used in parallel computations (passed on to mclapply)

A smoothed number of hits is computed by counting the number of hits in a moving window of size windowSize. Notice that only the mid-point of each hit in hits1 (and hits2 if specified) is used. That is, hits are not treated as intervals but as being located at a single base pair.

If hits2 is missing, regions with large smoothed number of hits are selected. To assess statistical significance, we generate hits (also 1 base pair long) randomly distributed along the genome and compute the smoothed number of hits. The number of simulated hits is set equal to nrow(hits1). The process is repeated nSims times, resulting in several independent simulations. To estimate the FDR, several thresholds to define enriched chromosomal regions are considered. For each threshold, we count the number of regions above the threshold in the observed data and in the simulations. For each threshold t, the FDR is estimated as the average number of regions with score >=t in the simulations over the number of regions with score >=t in the observed data.

If hits2 is not missing, the difference in smoothed proportion of hits (i.e. the number of hits in the window divided by the overall number of hits) between the two groups is used as a test statistic. To assess statistical significance, we generate randomly scramble hits between sample 1 and sample 2 (maintaining the original number of hits in each sample), and we re-compute the test statistic. The FDR for a given threshold t is estimated as the number of bases in the simulated data with test statistic>t divided by number of bases in observed data with test statistic>t.

The lowest t with estimated FDR below fdr is used to define enriched chromosomal regions.

Object of class GRanges (if input is GRanges) or RangedData (if input is RangedData) containing regions with smoothed hit count above the specified FDR level.

signature(hits1 = "GRanges", hits2 = "missing"), signature(hits1 = "RangedData", hits2 = "missing"): Look for chromosome zones with a large number of hits reported in hits1.
signature(hits1 = "GRanges", hits2 = "GRanges"), signature(hits1 = "RangedData", hits2 = "RangedData"): Look for chromosomal zones with a different density of hits in hits1 vs hits2.

set.seed(1)
st <- round(rnorm(100,500,100))
st[st>10000] <- 10000
strand <- rep(c('+','-'),each=50)
hits1 <- GRanges('chr1', IRanges(st,st+38),strand=strand)
chrLength <- c(chr1=10000)
enrichedChrRegions(hits1,chrLength=chrLength, windowSize=99, nSims=1)

htSeqTools documentation built on May 6, 2019, 3:39 a.m.

htSeqTools index

Manual for the htSeqTools library

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

htSeqTools
Quality Control, Visualization and Processing for High-Throughput Sequencing data

enrichedChrRegions: Find chromosomal regions with a high concentration of hits.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description

Usage

Arguments

Details

Value

Methods

Examples

Related to enrichedChrRegions in htSeqTools...

R Package Documentation

Browse R Packages

We want your feedback!

htSeqTools Quality Control, Visualization and Processing for High-Throughput Sequencing data

enrichedChrRegions: Find chromosomal regions with a high concentration of hits. In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description

Usage

Arguments

Details

Value

Methods

Examples

Related to enrichedChrRegions in htSeqTools...

R Package Documentation

Browse R Packages

We want your feedback!

htSeqTools
Quality Control, Visualization and Processing for High-Throughput Sequencing data

enrichedChrRegions: Find chromosomal regions with a high concentration of hits.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data