dba.peakset: Add a peakset to, or retrieve a peakset from, a DBA object
In DiffBind: Differential Binding Analysis of ChIP-Seq Peak Data

Description Usage Arguments Details Value Author(s) See Also Examples

Adds a peakset to, or retrieves a peakset from, a DBA object

dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, treatment, replicate,
            control, peak.caller, peak.format, reads=0, consensus=FALSE, 
            bamReads, bamControl, spikein,
            scoreCol, bLowerScoreBetter, filter, counts,
            bRemoveM=TRUE, bRemoveRandom=TRUE,
            minOverlap=2, bMerge=TRUE,
            bRetrieve=FALSE, writeFile, numCols=4,
            DataType=DBA$config$DataType)

DBA

DBA object. Required unless creating a new DBA object by adding an initial peakset.

peaks

When adding a specified peakset: set of peaks, either a GRanges object, or a peak dataframe or matrix (chr,start,end,score), or a filename where the peaks are stored.

When adding a consensus peakset: a sample mask or vector of peakset numbers to include in the consensus. If missing or NULL, a consensus is derived from all peaksets present in the model. See dba.mask, or dba.show to get peakset numbers.

When adding and empty peakset (zero peaks), set peaks=NA.

When adding a set of consensus peaksets: a sample mask or vector of peakset numbers. Sample sets will be derived only from subsets of these peaksets.

When adding all the peaks from one DBA object to another: a DBA object. In this case, the only other parameter to have an effect is minOverlap.

When retrieving and/or writing a peakset: either a GRanges, or a peak dataframe or matrix (chr,start,end,score), or a peakset number; if NULL, retrieves/writes the full binding matrix.

`sampID`	ID string for the peakset being added; if missing, one is assigned (a serial number for a new peakset, or a concatenation of IDs for a consensus peakset). Must be unique for each sample.
`tissue`	tissue name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of tissues).
`factor`	factor name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of factors).
`condition`	condition name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of conditions).
`treatment`	treatment name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of treatment).
`replicate`	replicate number for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of replicate numbers).
`control`	control name for the peakset being added; if missing, one is assigned for a consensus peakset (a concatenation of control names).
`peak.caller`	peak caller name string. If peaks is specified as a file, and peak.format is missing, a default fie format for the caller will be used (see peak.format). Supported values: “raw”: default peak.format: raw text file “bed”: default peak.format: bed file “narrow”: default peak.format: narrowPeaks file “macs”: default peak.format: MACS .xls file “bayes”: default peak.format: bayesPeak file “tpic”: default peak.format: TPIC file “sicer”: default peak.format: SICER file “fp4”: default peak.format: FindPeaks v4 file “swembl”: default peak.format: SWEMBL file “csv”: default peak.format: comma separated value file “report”: default peak.format: csv file saved via `dba.report` When adding a consensus peakset, a default value (a concatenation of peak caller names) is assigned if this is missing.
`peak.format`	peak format string. If specified, overrides the default file format for the specified peak caller. Supported formats (with default score column): “raw”: raw text file file; scoreCol=4 “bed”: bed file; scoreCol=5 “narrow”: narrowPeaks file; scoreCol=8 “macs”: MACS .xls file; scoreCol=7 “bayes”: bayesPeak file; scoreCol=4, filter=0.5 “tpic”: TPIC file; scoreCol=0 (all scores=1) “sicer”: SICER file; scoreCol=7 “fp4”: FindPeaks v4 file; scoreCol=5 “swembl”: SWEMBL file; scoreCol=4 “csv”: csv file; scoreCol=4 “report”: report file; scoreCol=9, bLowerScoreBetter=T
`reads`	total number of ChIPed library reads for the peakset being added.
`consensus`	either the logical value of the consensus attribute when adding a specific peakset (set to `TRUE` for consensus peaksets generated by `dba.peakset`), or a metadata attribute or vector of attributes when generating a set of consensus peaksets. In the latter case, a consensus peakset will be added for each set of samples that have the same values for the specified attributes. Alternatively, attributes may be specified proceeded by a negative sign, in which case a consensus peakset will be added for each set of samples that differ only in their values for those attributes. See examples. Allowable attributes: `DBA_TISSUE; -DBA_TISSUE` `DBA_FACTOR; -DBA_FACTOR` `DBA_CONDITION; -DBA_CONDITION` `DBA_TREATMENT; -DBA_TREATMENT` `DBA_REPLICATE; -DBA_REPLICATE` `DBA_CALLER; -DBA_CALLER`
`bamReads`	file path of the BAM/BED file containing the aligned reads for the peakset being added.
`bamControl`	file path of the BAM/BED file containing the aligned reads for the control used for the peakset being added.
`spikein`	file path of the BAM/BED file containing the aligned reads for the spike-ins used for the peakset being added.
`scoreCol`	peak column to normalize to 0...1 scale when adding a peakset; 0 indicates no normalization
`bLowerScoreBetter`	Logical indicating that lower scores indicate higher confidence peaks; default is that higher scores indicate better peaks.
`filter`	Numeric indicating a filter value for peaks. If present, any peaks with a score less than this value (or higher if `bLowerScoreBetter==TRUE`) will be removed from the peakset.
`counts`	Used for adding externally computed peak counts. Can be a filename or a dataframe. Can consist of a single column (or vector) with the counts, or two columns, with an ID for each interval in the first column and the counts in the second column, or four columns (chr, start, end, counts). When `counts` is specified, `peaks` and related parameters are ignored, and all peaksets in the DBA object must be specified in this way, all with exactly the same number of intervals.
`bRemoveM`	logical indicating whether to remove peaks on chrM when adding a peakset
`bRemoveRandom`	logical indicating whether to remove peaks on chrN_random when adding a peakset
`minOverlap`	the minimum number of peaksets a peak must be in to be included when adding a consensus peakset. When retrieving, if the peaks parameter is a vector (logical mask or vector of peakset numbers), a binding matrix will be retrieved including all peaks in at least this many peaksets. If `minOverlap` is between zero and one, peak will be included from at least this proportion of peaksets.
`bMerge`	logical indicating whether global binding matrix should be compiled after adding the peakset. When adding several peaksets via successive calls to `dba.peakset`, it may be more efficient to set this parameter to `FALSE` and call `dba(DBA)` after all of the peaksets have been added.
`bRetrieve`	logical indicating that a peakset is being retrieved and/or written, not added.
`writeFile`	file to write retrieved peakset.
`numCols`	number of columns to include when writing out peakset. First four columns are chr, start, end, score; the remainder are maintained from the original peakset. Ignored when writing out complete binding matrix.
`DataType`	The class of object for returned peaksets: `DBA_DATA_GRANGES` `DBA_DATA_FRAME` Can be set as default behavior by setting `DBA$config$DataType`.

MODE: Add a specified peakset:

dba.peakset(DBA=NULL, peaks, sampID, tissue, factor, condition, replicate, control, peak.caller, reads, consensus, bamReads, bamControl, normCol, bRemoveM, bRemoveRandom)

MODE: Add a consensus peakset (derived from overlapping peaks in peaksets already present):

dba.peakset(DBA, peaks, minOverlap)

MODE: Add a sets of consensus peaksets bases on sample sets that share or differ in specified attributes

dba.peakset(DBA, peaks, consensus, minOverlap)

MODE: Retrieve a peakset:

dba.peakset(DBA, peaks, bRetrieve=T)

MODE: Write a peakset out to a file:

dba.peakset(DBA, peaks, bRetrieve=T, writeFile, numCols)

DBA object when adding a peakset. Peakset matrix or GRanges object when retrieving and/or writing a peakset.

Rory Stark

to add peaksets using a sample sheet, see dba.

# create a new DBA object by adding three peaksets
mcf7 <- dba.peakset(NULL,
                   peaks=system.file("extra/peaks/MCF7_ER_1.bed.gz", package="DiffBind"),
                   peak.caller="bed", sampID="MCF7.1",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=1)
mcf7 <- dba.peakset(mcf7,
                   peaks=system.file("extra/peaks/MCF7_ER_2.bed.gz", package="DiffBind"),    
                   peak.caller="bed", sampID="MCF7.2",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=2)
mcf7 <- dba.peakset(mcf7,
                   peaks=system.file("extra/peaks/MCF7_ER_3.bed.gz", package="DiffBind"),      
                   peak.caller="bed", sampID="MCF7.3",tissue="MCF7",
                   factor="ER",condition="Responsive",replicate=3)
mcf7

#retrieve peaks that are in all three peaksets
mcf7.consensus <- dba.peakset(mcf7, 1:3, minOverlap=3, bRetrieve=TRUE)
mcf7.consensus

#add a consensus peakset -- peaks in all three replicates
mcf7 <- dba.peakset(mcf7, 1:3, minOverlap=3,sampID="MCF7_3of3")
mcf7

#add consensus peaksets for all sample types by combining replicates
data(tamoxifen_peaks)
tamoxifen <- dba.peakset(tamoxifen,consensus = -DBA_REPLICATE)
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)

#add consensus peaksets for all sample types by (same tissue and condition) 
data(tamoxifen_peaks)
tamoxifen <- dba.peakset(tamoxifen,consensus = c(DBA_TISSUE,DBA_CONDITION))
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
dba.plotVenn(tamoxifen,tamoxifen$masks$Responsive & tamoxifen$masks$Consensus)

#create consensus peaksets from sample type consensuses for Responsive and Resistant sample groups
tamoxifen <- dba.peakset(tamoxifen,peaks=tamoxifen$masks$Consensus,consensus=DBA_CONDITION)
dba.show(tamoxifen,mask=tamoxifen$masks$Consensus)
dba.plotVenn(tamoxifen,17:18)
 
#retrieve the consensus peakset as GRanges object
mcf7.consensus <- dba.peakset(mcf7,mcf7$masks$Consensus,bRetrieve=TRUE)
mcf7.consensus