dba.blacklist: Apply blacklists and/or greylists to peaks (and generate...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/DBA.R

Description

Filters peak intervals that overlap a blacklist (from ENCODE or user supplied.) Filter peak intervals that overlap a greylist, either user supplied or generated based on aligned reads for control samples (e.g. Inputs).

Usage

1
2
3
dba.blacklist(DBA, blacklist=DBA$config$doBlacklist, 
              greylist=DBA$config$doGreylist, 
              Retrieve, cores=DBA$config$cores)

Arguments

DBA

DBA object

blacklist

If not equal to FALSE, specifies that a blacklist should be applied to the peak intervals in the DBA object.

If equal to TRUE, the read bam files will be examined to determine an appropriate reference genome. If successful, and a blacklist is available for that genome, it will be applied.

A user specified blacklist can be specified by setting this parameter to a GRanges object containing the blacklisted regions.

Otherwise, this parameter may be set to one of the following constants, indicating which of the ENCODE blacklists should be applied:

  • DBA_BLACKLIST_HG19: Homo sapiens 19 (chromosomes have "chr")

  • DBA_BLACKLIST_HG38: Homo sapiens 38 (chromosomes have "chr")

  • DBA_BLACKLIST_GRCH37: Homo sapiens 37 (chromosomes are numbers)

  • DBA_BLACKLIST_GRCH38: Homo sapiens 38 (chromosomes are numbers)

  • DBA_BLACKLIST_MM9: Mus musculus 9

  • DBA_BLACKLIST_MM10: Mus musculus 10

  • DBA_BLACKLIST_CE10: C. elegans 10

  • DBA_BLACKLIST_CE11: C. elegans 11

  • DBA_BLACKLIST_DM3: Drosophila melanogaster 3

  • DBA_BLACKLIST_DM6: Drosophila melanogaster 6

greylist

If not equal to FALSE, specifies that a greylist should be applied to the peak intervals in the DBA object.

If equal to TRUE, the control bam files (if present), will be examined to determine an appropriate reference genome. Genomes associated with a valid BSgenome can be detected. If successful, this genome will be used to generate greylists for each available control (eg specified as bamControls in the sample sheet.)).

The greylist parameter can also be set explicitly to either a valid BSgenome object, or a character string with the name of a valid BSgenome object.

The following constants map to a subset of possible BSgenome objects:

  • DBA_BLACKLIST_HG19 : seqinfo from BSgenome.Hsapiens.UCSC.hg19

  • DBA_BLACKLIST_HG38 : seqinfo from BSgenome.Hsapiens.UCSC.hg38

  • DBA_BLACKLIST_GRCH38: seqinfo from BSgenome.Hsapiens.NCBI.GRCh38

  • DBA_BLACKLIST_MM9 : seqinfo from BSgenome.Mmusculus.UCSC.mm9

  • DBA_BLACKLIST_MM10 : seqinfo from BSgenome.Mmusculus.UCSC.mm10

  • DBA_BLACKLIST_CE10 : seqinfo from BSgenome.Celegans.UCSC.ce10

  • DBA_BLACKLIST_CE11 : seqinfo from BSgenome.Celegans.UCSC.ce11

  • DBA_BLACKLIST_DM3 : seqinfo from BSgenome.Dmelanogaster.UCSC.dm3

  • DBA_BLACKLIST_DM6 : seqinfo from BSgenome.Dmelanogaster.UCSC.dm6

A user specified greylist can also be specified by setting this parameter to a GRanges object containing the greylisted regions. It can also be a list with an element named greylist$master, which is a GRanges object containing the greylist to be applied.

Retrieve

If present, some aspects of a previous run of the function is retrieved instead of returning a DBA object.

If Retrieve=DBA_BLACKLIST, the blacklist, if present, is returned as a GRanges object.

If Retrieve=DBA_GREYLIST, the greylist, if present, is returned. If it was generated from more than one control, it will be returned as a list object with the first element (named $master) a GRanges object containing the merged greylist, and the second element (named $controls) being a GRangesList with each element containing the greylist for one control

If Retrieve=DBA_BLACKLISTED_PEAKS, the excluded peaks for each sample will be returned in a GRangesList object (with each element containing the filtered peak intervals for each sample). If counts are available for the peaks, this will include the following metadata columns:

  • cReads: Number of control reads overlapping this interval

  • Reads: Number of primary (ChIP) reads overlapping this interval

  • Score: Read score calculated by dba.count

Note that the if Retrieve is set, dba.blacklist must have been previously run, and all other parameters will be ignored.

cores

Parallel cores to use when running greylist generation.

Details

This function is intended to filter peak intervals that fall in regions of the genome that are known to be problematic for ChIP analysis. Blacklists, which are derived for a reference genome and should be applied for any experiments that use that reference, are distinguished from greylists, which are derived on a per-experiment basis using anomalous pileups in the control tracks (such as Inputs).

A core set of blacklists have been defined as part of the ENCODE project (see references).

Greylists can be generated using this function, which serves as a front-end to the GreyListChIP package. See the details of that package for more information on how it works. Note that the GreyListChIP package can be utilized separately to generate greylists with more fine-grained control, with the results passed back to DiffBind to filter peaks.

Value

DBA object, with peaks filtered (unless Retrieve is specified.)

Note

The p threshold can be altered by setting DBA$config$greylist.pval. The default is 0.999. See GreyListChIP::calcThreshold for details.

Ideally, Blacklists and Greylists will be applied to the aligned reads prior to calling peaks, as removing reads in anomalous regions will yield better background noise models. Once greylists have been generated, peaks can be re-called and read into DiffBind.

Author(s)

Rory Stark with thanks to Gord Brown

References

See Also

GreyListChIP (GreyList), BSgenome

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data(tamoxifen_peaks)
## Not run: tamoxifen <- dba.blacklist(tamoxifen, blacklist=TRUE,
                                    greylist="BSgenome.Hsapiens.UCSC.hg19")
## End(Not run)
data(tamoxifen_greylist)
tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19,
                           greylist=tamoxifen.greylist$master)
dba.blacklist(tamoxifen,Retrieve=DBA_GREYLIST)

data(tamoxifen_counts)
tamoxifen <- dba.count(tamoxifen, peaks=NULL, score=DBA_SCORE_CONTROL_READS)
tamoxifen <- dba.blacklist(tamoxifen, blacklist=DBA_BLACKLIST_HG19,
                           greylist=tamoxifen.greylist$master)
blacklisted <- dba.blacklist(tamoxifen, Retrieve=DBA_BLACKLISTED_PEAKS)
mean(blacklisted[[1]]$cReads)
mean(dba.peakset(tamoxifen,peaks=1,bRetrieve=TRUE)$Score)

DiffBind documentation built on March 24, 2021, 6 p.m.