callBindingSites: Predict protein binding sites from high-throughput sequencing...
In ChIPseqR: Identifying Protein Binding Sites in High-Throughput Sequencing Data

Description Usage Arguments Details Value Methods See Also Examples

Methods for function callBindingSites in Package ‘ChIPseqR’. These methods are used to identify protein binding sites from ChIP-seq data.

## S4 method for signature 'ANY'
callBindingSites(data, chrLen, plot=TRUE, verbose=TRUE, ..., plotTo)
## S4 method for signature 'character'
callBindingSites(data, type, minQual=70, ...)
## S4 method for signature 'matrix'
callBindingSites(data, chrName="chr", ...)
## S4 method for signature 'ReadCounts'
callBindingSites(data, bind, support, background, bgCutoff=0.9, supCutoff=0.9, 
fdr = 0.05, extend=1, tailCut=0.95, piLambda=0.5, adapt=FALSE, corSummary=median, compress = TRUE,
digits = 16, plot=TRUE, verbose=TRUE, ask=FALSE, plotTo, ...)

`data`	Either an object containing information about mapped reads or a list. See below for details.
`bind`	Length of binding region to use (see Details).
`support`	Length of support region to use (see Details).
`background`	Length of background window. If this is missing it will be set to 10(`bind`+2`support`).
`chrLen`	Numeric vector indicating the length of all chromosomes. Only needed when `data` is an `AlignedRead` object. `readBfaToc` may be used to supply this information.
`bgCutoff`	Numeric value between 0.5 and 1. This determines how much estimates of the background read density are allowed to vary for adjacent windows. Set to 1 to disable cutoff.
`supCutoff`	Numeric value between 0.5 and 1. This determines how much estimates of the support region read density are allowed to vary for forward and reverse strand. Set to 1 to disable cutoff.
`fdr`	Target false discovery rate.
`extend`	Numeric value indicating how far mapped reads should be extended when calculating read counts.
`type`	Format of alignment file (see `readAligned` forr details).
`minQual`	Minimum alignment quality to use. All reads with lower alignment quality are discarded.
`tailCut`	Truncation point used to exclude outliers when estimating null distribution.
`chrName`	Name to use for the single chromosome.
`piLambda`	If `adapt=TRUE` this parameter is used to estimate the proportion of scores not related to binding sites.
`adapt`	Logical indicating whether an adaptive false discovery rate should be used. If this is `FALSE` (the default) the usual Benjamini-Hochberg procedure is used to control the FDR.
`corSummary`	Function used to summarise cross-correlation across chromosomes. See the Details section on binding and support region.
`compress`	Logical indicating whether the return value should be compressed.
`digits`	Number of decimal places to retain for binding site score for compression.
`plot`	Logical. If `plot=TRUE` (the default) some diagnostic plots are produced during the analysis.
`verbose`	Logical. If `verbose=TRUE` (the default) status messages are printed to indicate progress.
`ask`	Logical. Setting this to `TRUE` causes the system to wait for user input before displaying a new plot. See `devAskNewPage`.
`plotTo`	Character string giving the name of a file that should be used to store plots generated during the analysis. If this is not missing a pdf file with the given name will be created.
`...`	Additional arguments. Most methods pass them on to the `ReadCounts` method.

The length of binding and support regions can either be given as a single value or as a range of possible values (by providing the minimum and maximum). In the latter case the cross-correlation between read counts on forward and reverse strand will be used to determine a value within that range. Note that this may lead sub-optimal choices of binding and support region length.

An object of class BindScore if compress = FALSE, otherwise an object of class RLEBindScore

data = "ANY": Default method to handle all forms of input not explicitly handled by their own method. In particular this will be used for objects of class AlignedRead and data.frame but it will handle class for which a strandPileup method is available.
data = "character": Allows to use a file name referring to a file of mapped sequence reads as input.
data = "matrix": Uses a matrix of read counts (for a single chromosome) as input.
data = "ReadCounts": This methods implements the peak calling algorithm. Other methods will typically reformat their input and pass it on to this method.

simpleNucCall for an interface with nucleosome specific defaults. This function uses strandPileup, startScore, getCutoff and pickPeak. See the help pages of these functions for additional detail on the individual steps involved. See getBindLen for details on the estimation of binding and support region length.

set.seed(1)

## determine binding site locations
b <- sample(1:1e6, 5000)

## sample read locations
fwd <- unlist(lapply(b, function(x) sample((x-83):(x-73), 20, replace=TRUE)))
rev <- unlist(lapply(b, function(x) sample((x+73):(x+83), 20, replace=TRUE)))

## add some background noise
fwd <- c(fwd, sample(1:(1e6-25), 50000))
rev <- c(rev, sample(25:1e6, 50000))

## create data.frame with read positions as input to strandPileup
reads <- data.frame(chromosome="chr1", position=c(fwd, rev), 
	length=25, strand=factor(rep(c("+", "-"), times=c(150000, 150000))))

## create object of class ReadCounts
readPile <- strandPileup(reads, chrLen=1e6, extend=1, plot=FALSE)

## predict binding site locations
## the artificial dataset is very small so predictions may not be very reliable
bindScore <- callBindingSites(readPile, bind=147, support=20, background=2000, plot=FALSE)