Description Usage Arguments Details Value Methods Author(s) References See Also Examples
Calculate Gini coefficient of High-throughput Sequencing aligned reads. The index provides a measure of "inequality" in read coverage which can be used for quality control purposes (see details).
1 | giniCoverage(sample, mc.cores = 1, mk.plot = FALSE, seqName = "missing", species="missing", chrLengths="missing", numSim="missing")
|
sample |
A RangedData or list object |
seqName |
If sample is a RangedData, name of sequence to use in plots |
mk.plot |
Logical. If TRUE, logarithm of coverage values' histogram and Lorenz Curve plot are plotted. |
mc.cores |
If |
chrLengths |
An integer array with lengths of chromosomes in |
species |
A |
numSim |
Number of simulations to perform in order to find the expected Gini coefficient. |
The Gini coefficient provides a measure of "inequality" in read coverage. This can be used in any sequencing experiment where the goal is to find peaks, i.e. unusual accumulation of reads in some genomic regions. For instance, Chip-Seq etc. Typically these experiments will consist of samples of interest (e.g. immuno-precipitated) and controls. The samples of interest should exhibit higher peaks, whereas reads in the controls should show a more uniform distribution. Since the Gini coefficient can be seen as a measure of departure from uniformity, the coefficient should present smaller values in the control samples. Since the Gini coefficient depends on the number of reads per sample, a correction is performed by substracting the Gini index from a sample with uniformily distributed reads.
If mk.plot==FALSE
, the Gini index and adjusted Gini index for each element in the list
or RangedData
object.
If mk.plot==TRUE
, a plot is produced showing the
logarithm of coverage values' histogram and Lorenz Curve plot.
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing")
Analize a single RangeData object with 'chrLengths' used for simulations ('Species' is ignored).
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing")
Analize a single RangeData object with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing")
Analize a single RangeData object with 'chrLengths' used as chromosome lengths in simulations.
signature(sample = "RangedData", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing")
Analize all RangeData objects from sample (list) with hromosome lengths for simulations taken as the largest end position of reads in each chromosome of all samples.
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "integer", numSim="missing")
Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations ('Species' is ignored).
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "character", chrLengths = "missing", numSim="missing")
Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken from BSgenome 'species' (package must be installed).
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "integer", numSim="missing")
Analize all RangeData objects from sample (list) with 'chrLengths' used as chromosome lengths in simulations.
signature(sample = "list", mc.cores = "ANY", mk.plot = "ANY", seqName = "ANY", species = "missing", chrLengths = "missing", numSim="missing")
Analize all RangeData objects from sample (list) with chromosome lengths for simulations taken as the largest end position of reads in each chromosome of sample.
Camille Stephan-Otto
See the definition of the Gini coefficient and Lorenz curve at http://en.wikipedia.org/wiki/Gini_coefficient
ssdCoverage
for another measure of inequality in coverage.
1 2 3 4 5 6 7 8 9 10 11 12 | set.seed(1)
peak1 <- round(rnorm(500,100,10))
peak1 <- RangedData(IRanges(peak1,peak1+38),space='chr1')
peak2 <- round(rnorm(500,200,10))
peak2 <- RangedData(IRanges(peak2,peak2+38),space='chr1')
ip <- rbind(peak1,peak2)
bg <- runif(1000,1,300)
bg <- RangedData(IRanges(bg,bg+38),space='chr1')
rdl <- list(ip,bg)
ssdCoverage(rdl)
giniCoverage(rdl)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.