"GreyList" Objects

Share:

Description

Regions of high signal in the input samples of a ChIP experiment can lead to artefacts in peak calling. This class generates "grey lists" of such regions, for use in filtering reads before peak calling (or filtering peaks after peak calling, though it is generally safer to filter first).

Objects from the Class

Objects can be created by calls of the form new("GreyList", genome, ...), where genome is a "BSgenome" object describing a genome, such as BSgenome.Hsapiens.UCSC.hg19. Alternatively, a karyotype file can be provided explictly: new("GreyList", karyoFile=fn, ...). Either genome or karyoFile must be provided; if both are present, the BSgenome object takes precedence.

Slots

genome:

The BSgenome object corresponding with the genome the reads are aligned to

karyotype:

The Seqinfo object from the BSgenome object, or made from the karyo_file

karyo_file:

The name of a file containing chromosome sizes for the reference genome of interest, one per line, as "chromName chromLength" pairs.

tiles:

A GRanges object with an overlapping tiling of the genome (by default 1Kb tiles every 512b).

counts:

A numeric vector holding the counts corresponding to the tiling and the BAM file provided.

files:

A vector of BAM filenames that were used to generate the counts (currently only accepts one).

size_param:

The computed estimates of the "size" parameter of the negative binomial distribution, estimated by MASS::fitdistr from repeated sampling from the counts.

size_stderr:

The standard errors of the "size" parameters, as estimated by MASS::fitdistr.

size_mean:

The mean of the "size" estimates.

mu_param:

Computed estimates of the "mu" parameter of the negative binomial distribution, estimated by MASS::fitdistr from repeated sampling from the counts.

mu_stderr:

The standard errors of the "mu" parameter.

mu_mean:

The mean of the "mu" estimates.

reps:

How many samples from the counts were taken.

sample_size:

How many values were sampled from the counts, for each estimate of "size" and "mu".

pvalue:

The requested p-value threshold.

threshold:

The calculated threshold, based on the p-value.

max_gap:

The largest gap to consider when merging nearby regions (i.e. if there are "grey" regions up to this many nucleotides apart, merge them into one long region).

regions:

A GRanges object defining the final grey list regions.

coverage:

The percentage of the genome covered by the grey list regions.

Methods

calcThreshold

signature(obj = "GreyList"): Calculate the cutoff for reads in bins, based on fitting the counts to a negative binomial distribution.

countReads

signature(obj = "GreyList"): Count reads in bins across the genome.

export

signature(object = "GreyList", con = "character", format = "missing"): Write the grey list to a file.

initialize

signature(.Object = "GreyList"): Create an initial object (invoked automatically by new("GreyList",...)).

loadKaryotype

signature(obj = "GreyList"): Load a genome description from a file. The file format is one line per chromosome, with the name of the chromosome followed by white space followed by an integer indicating the length of the chromosome.

getKaryotype

signature(obj = "GreyList"): Get the karyotype of a genome from a BSgenome object.

makeGreyList

signature(obj = "GreyList"): Compute the actual grey list, after calculating the threshold.

show

signature(object = "GreyList"): Display the grey list.

Author(s)

Gord Brown (gdbzork@gmail.com)

See Also

BSgenome, Seqinfo

Examples

1
2
3
4
5
6
7
8
showClass("GreyList")

# Load a karyotype file:
path <- system.file("extra", package="GreyListChIP")
fn <- file.path(path,"karyotype_chr21.txt")

# Create a GreyList object:
gl <- new("GreyList",karyoFile=fn)