giniCoverage: Compute Gini coefficient.
In htSeqTools: Quality Control, Visualization and Processing for High-Throughput Sequencing data

Description Usage Arguments Details Value Methods Author(s) References See Also Examples

Calculate Gini coefficient of High-throughput Sequencing aligned reads. The index provides a measure of "inequality" in read coverage which can be used for quality control purposes (see details).

1	giniCoverage(sample, mc.cores = 1, mk.plot = FALSE, seqName = "missing", species="missing", chrLengths="missing", numSim="missing")

`sample`	A RangedData or list object
`seqName`	If sample is a RangedData, name of sequence to use in plots
`mk.plot`	Logical. If TRUE, logarithm of coverage values' histogram and Lorenz Curve plot are plotted.
`mc.cores`	If `mc.cores` is greater than 1, computations are performed in parallel for each element in the `IRangesList` object.
`chrLengths`	An integer array with lengths of chromosomes in `sample` for simluations of uniformily distributed reads.
`species`	A `BSgenome` species to obtain chromosome lengths for simluations of uniformily distributed reads.
`numSim`	Number of simulations to perform in order to find the expected Gini coefficient.

The Gini coefficient provides a measure of "inequality" in read coverage. This can be used in any sequencing experiment where the goal is to find peaks, i.e. unusual accumulation of reads in some genomic regions. For instance, Chip-Seq etc. Typically these experiments will consist of samples of interest (e.g. immuno-precipitated) and controls. The samples of interest should exhibit higher peaks, whereas reads in the controls should show a more uniform distribution. Since the Gini coefficient can be seen as a measure of departure from uniformity, the coefficient should present smaller values in the control samples. Since the Gini coefficient depends on the number of reads per sample, a correction is performed by substracting the Gini index from a sample with uniformily distributed reads.

If mk.plot==FALSE, the Gini index and adjusted Gini index for each element in the list or RangedData object.

If mk.plot==TRUE, a plot is produced showing the logarithm of coverage values' histogram and Lorenz Curve plot.

signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing"): Analize a single RangeData object with 'chrLengths' used for simulations ('Species' is ignored).
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing"): Analize a single RangeData object with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing"): Analize a single RangeData object with 'chrLengths' used as chromosome lengths in simulations.
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing"): Analize all RangeData objects from sample (list) with hromosome lengths for simulations taken as the largest end position of reads in each chromosome of all samples.
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing"): Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations ('Species' is ignored).
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing"): Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing"): Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations.
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing"): Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken as the largest end position of reads in each chromosome of sample.

Camille Stephan-Otto

See the definition of the Gini coefficient and Lorenz curve at http://en.wikipedia.org/wiki/Gini_coefficient

ssdCoverage for another measure of inequality in coverage.

set.seed(1)
peak1 <- round(rnorm(500,100,10))
peak1 <- RangedData(IRanges(peak1,peak1+38),space='chr1')
peak2 <- round(rnorm(500,200,10))
peak2 <- RangedData(IRanges(peak2,peak2+38),space='chr1')
ip <- rbind(peak1,peak2)
bg <- runif(1000,1,300)
bg <- RangedData(IRanges(bg,bg+38),space='chr1')

rdl <- list(ip,bg)
ssdCoverage(rdl)
giniCoverage(rdl)