giniCoverage: Compute Gini coefficient.

Description Usage Arguments Details Value Methods Author(s) References See Also Examples

Description

Calculate Gini coefficient of High-throughput Sequencing aligned reads. The index provides a measure of "inequality" in read coverage which can be used for quality control purposes (see details).

Usage

1
giniCoverage(sample, mc.cores = 1, mk.plot = FALSE, seqName = "missing", species="missing", chrLengths="missing", numSim="missing")

Arguments

sample

A RangedData or list object

seqName

If sample is a RangedData, name of sequence to use in plots

mk.plot

Logical. If TRUE, logarithm of coverage values' histogram and Lorenz Curve plot are plotted.

mc.cores

If mc.cores is greater than 1, computations are performed in parallel for each element in the IRangesList object.

chrLengths

An integer array with lengths of chromosomes in sample for simluations of uniformily distributed reads.

species

A BSgenome species to obtain chromosome lengths for simluations of uniformily distributed reads.

numSim

Number of simulations to perform in order to find the expected Gini coefficient.

Details

The Gini coefficient provides a measure of "inequality" in read coverage. This can be used in any sequencing experiment where the goal is to find peaks, i.e. unusual accumulation of reads in some genomic regions. For instance, Chip-Seq etc. Typically these experiments will consist of samples of interest (e.g. immuno-precipitated) and controls. The samples of interest should exhibit higher peaks, whereas reads in the controls should show a more uniform distribution. Since the Gini coefficient can be seen as a measure of departure from uniformity, the coefficient should present smaller values in the control samples. Since the Gini coefficient depends on the number of reads per sample, a correction is performed by substracting the Gini index from a sample with uniformily distributed reads.

Value

If mk.plot==FALSE, the Gini index and adjusted Gini index for each element in the list or RangedData object.

If mk.plot==TRUE, a plot is produced showing the logarithm of coverage values' histogram and Lorenz Curve plot.

Methods

signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing")

Analize a single RangeData object with 'chrLengths' used for simulations ('Species' is ignored).

signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing")

Analize a single RangeData object with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).

signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing")

Analize a single RangeData object with 'chrLengths' used as chromosome lengths in simulations.

signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing")

Analize all RangeData objects from sample (list) with hromosome lengths for simulations taken as the largest end position of reads in each chromosome of all samples.

signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing")

Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations ('Species' is ignored).

signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing")

Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).

signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing")

Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations.

signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing")

Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken as the largest end position of reads in each chromosome of sample.

Author(s)

Camille Stephan-Otto

References

See the definition of the Gini coefficient and Lorenz curve at http://en.wikipedia.org/wiki/Gini_coefficient

See Also

ssdCoverage for another measure of inequality in coverage.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(1)
peak1 <- round(rnorm(500,100,10))
peak1 <- RangedData(IRanges(peak1,peak1+38),space='chr1')
peak2 <- round(rnorm(500,200,10))
peak2 <- RangedData(IRanges(peak2,peak2+38),space='chr1')
ip <- rbind(peak1,peak2)
bg <- runif(1000,1,300)
bg <- RangedData(IRanges(bg,bg+38),space='chr1')

rdl <- list(ip,bg)
ssdCoverage(rdl)
giniCoverage(rdl)

htSeqTools documentation built on May 6, 2019, 3:39 a.m.