dba.normalize: Specify parameters for normalizing a dataset; calculate...
In DiffBind: Differential Binding Analysis of ChIP-Seq Peak Data

Description Usage Arguments Details Value Note Author(s) See Also Examples

Enables normalization of datasets using a variety of methods, including background, spike-in, and parallel factor normalization. Alternatively, allows a user to specify library sizes and normalization factors directly, or retrieve computed ones.

dba.normalize(DBA, method = DBA$config$AnalysisMethod,
              normalize = DBA_NORM_DEFAULT, library = DBA_LIBSIZE_DEFAULT, 
              background = FALSE, spikein = FALSE, offsets = FALSE,
              libFun=mean, bRetrieve=FALSE, ...)

`DBA`	DBA object that includes count data for a consensus peakset.
`method`	Underlying method, or vector of methods, for which to normalize. Supported methods: `DBA_EDGER` use `edgeR` package for analysis `DBA_DESEQ2` use `DESeq2` package for analysis `DBA_ALL_METHODS` normalize for both both `edgeR` and `DESeq2`
`normalize`	Either user-supplied normalization factors in a numeric vector, or a specification of a method to use to calculate normalization factors. Methods can be specified using one of the following: `DBA_NORM_RLE` ("RLE") RLE normalization (native to `DBA_DESEQ2`, and available for `DBA_EDGER`). `DBA_NORM_TMM` ("TMM") TMM normalization (native to `DBA_EDGER`, and available for `DBA_DESEQ2`). `DBA_NORM_NATIVE` ("native") Use native method based on `method`: `DBA_NORM_RLE` for `DBA_DESEQ2` or `DBA_NORM_TMM` for `DBA_EDGER`. `DBA_NORM_LIB` ("lib") Normalize by library size only. Library sizes can be specified using the `library` parameter. Normalization factors will be calculated to give each equal weight in a manner appropriate for the analysis `method`. See also the `libFun` parameter, which can be used to scale the normalization factors for `DESeq2.` `DBA_NORM_DEFAULT` ("default") Default method: The "preferred" normalization approach depending on `method` and whether an explicit design is present. See `Details` below. `DBA_NORM_OFFSETS` ("offsets") Indicates that offsets have been specified using the `offsets` parameter, and they should be used without alteration. `DBA_NORM_OFFSETS_ADJUST` ("adjust offsets") Indicates that offsets have been specified using the `offsets` parameter, and they should be adjusted for library size and mean centering before being used in a `DBA_DESEQ2` analysis.
`library`	Either user-supplied library sizes in a numeric vector, or a specification of a method to use to calculate library sizes. Library sizes can be based on one of the following: `DBA_LIBSIZE_FULL` ("full") Use the full library size (total number of reads in BAM/SAM/BED file) `DBA_LIBSIZE_PEAKREADS` ("RiP") Use the number of reads that overlap consensus peaks. `DBA_LIBSIZE_BACKGROUND` ("background") Use the total number of reads aligned to the chromosomes for which there is at least one peak. This required a background bin calculation (see parameter `background`). These values are usually the same or similar to `DBA_LIBSIZE_FULL`. `DBA_LIBSIZE_DEFAULT` ("default") Default method: The "preferred" library size depending on `method`, `background`, and whether an explicit design is present. See `Details` below.
`background`	This parameter controls the option to use "background" bins, which should not have differential enrichment between samples, as the basis for normalizing (instead of using reads counts overlapping consensus peaks). When enabled, the chromosomes for which there are peaks in the consensus peakset are tiled into large bins and reads overlapping these bins are counted. If present, `background` can either be a logical value, a numeric value, or a previously computed `$background` object. If `background` is a logical value and set to `TRUE`, background bins will be computed using the default bin size of 15000bp. Setting this value to `FALSE` will prevent background mode from being used in any default settings. If `background` is a numeric value, it will be used as the bin size. If `background` is a previously computed `$background` object, these counts will be used as the background. A `$background` object can be obtained by calling `dba.normalize` with `bRetrieve=TRUE` and `method=DBA_ALL_METHODS`. After counting (or setting) background bins, both the `normalize` and `library` parameters will be used to determine how the final normalization factors are calculated. If `background` is missing, it will be set to `TRUE` if `library=DBA_LIBSIZE_BACKGROUND`, or if `library=DBA_LIBSIZE_DEFAULT` and certain conditions are met (see `Details` below). If `background` is not `FALSE`, then the library size will be set to `library=DBA_LIBSIZE_BACKGROUND`
`spikein`	Either a logical value, a character vector of chromosome names, a `GRanges` object containing peaks for a parallel factor, or a `$background` object containing previously computed spike-in read counts. If `spikein` is a logical value set to `FALSE`, no spike-in normalization is performed. If `spikein` is a logical value set to `TRUE`, background normalization is performed using spike-in tracks. There must be a spike-in track for each sample. see `dba` and/or `dba.peakset` for details on how to include a spike-in track with a sample (eg. by including a `Spikein` column in the sample sheet.) All chromosomes in the spike-in bam files will be used. If `spikein` is a character vector of one or more chromosome names, only reads on the named chromosome(s) will be used for background normalization. If spike-in tracks are available, reads on chromosomes with these names in the spike-in track will be counted. If no spike-in tracks are available, reads on chromosomes with these names in the main `bamReads` bam files will be counted. If `spikein` is a `GRanges` object containing peaks for a parallel factor, then background normalization is performed counting reads in the spike-in tracks overlapping peaks in this object. If `spikein` is a previously computed `$background` object, these counts will be used as the spikein background. A `$background` object can be obtained by calling `dba.normalize` with `bRetrieve=TRUE` and `method=DBA_ALL_METHODS`. Note that if `spikein` is not `FALSE`, then the library size will be set to `library=DBA_LIBSIZE_BACKGROUND`
`offsets`	This parameter controls the use of offsets (matrix of normalization factors) instead of a single normalization factor for each sample. It can either be a logical value, a `matrix`, or a `SummarizedExperiment`. If it is a logical value and set to `FALSE`, no offsets will be computed or used. A value of `TRUE` indicates that an offset matrix should be computed using a `loess` fit. Alternatively, user-calculated normalization offsets can be supplied as a `matrix` or as a `SummarizedExperiment` (containing an `assay` named "offsets"). In this case, the user may also set the `normalize` parameter to indicate whether the offsets should be applied as-is to a `DESeq2` analysis (`DBA_NORM_OFFSETS`, default), or if they should be adjusted for library size and mean centering (`DBA_NORM_OFFSETS_ADJUST`).
`libFun`	When `normalize=DBA_NORM_LIB`, normalization factors are calculated by dividing the library sizes for each sample by a common denominator, obtained by applying `libFun` to the vector of library sizes. For `method=DBA_EDGER`, the normalization factors are further adjusted so as to make all the effective library sizes (library sizes multiplied by normalization factors) the same, and adjusted to multiply to 1.
`bRetrieve`	If set to `TRUE`, information about the current normalization will be returned. The only other relevant parameter in this case is the `method`. If `method=DBA_DESEQ2` or `method=DBA_EDGER`, a record will be returned including normalization values for the appropriate analysis method. This record is a `list` consists of the following elements: $norm.method A character string corresponding to the normalization method, generally one of the values that can be supplied as a value to `normalize`. $norm.factors A vector containing the computed normalization factors. $lib.method A character string corresponding to the value of the method used to calculate the library size, generally one of the values that can be supplied as a value to `library`. $lib.sizes A vector containing the computed library sizes. $background If the normalization if based on binned background reads, this field will be `TRUE`. $control.subtract If control reads were subtracted from the read counts, this field will be `TRUE`. If `method=DBA_ALL_METHODS`, the record be a list with one of the above records for each `method` for which normalization factors have been computed (`$DESeq2` and `edgeR`). If `background` bins have been calculated, this will include an element called `$background`. This element can be passed in as the value to `background` or `spikein` to re-use a previously computed set of reads. It contains three subfields: $background$binned a `SummarizedExperiment` object containing the binned counts. $background$bin.size a numeric value with the bin size used. $background$back.calc character string indicating how the background was calculated (bins, spike-ins, or parallel factor). If `offsets` are available, this will include an element called `$offsets` with two subfields: $offsets$offsets a `matrix` or a `SummarizedExperiment` object containing the offsets. offsets$offset.method a character string indicating the source of the offsets, either `"loess"` or `"user"`.
`...`	Extra parameters to be passed to `limma::loessFit` when computing offsets.

The default normalization parameters are as follows:

normalize=DBA_NORM_LIB
library=DBA_LIBSIZE_FULL
background=FALSE

If background=TRUE, then the default becomes library=DBA_LIBSIZE_BACKGROUND.

If dba.contrast has been used to set up contrasts with design=FALSE (pre-3.0 mode), then the defaults are:

normalize=DBA_NORM_DEFAULT
library=DBA_LIBSIZE_FULL
background=FALSE

In this case, normalize=DBA_NORM_LIB will be set for method=DBA_DESEQ2 for backwards compatibility.

Either a DBA object with normalization terms added, or (if bRetrieve=TRUE), a record or normalization details.

The csaw package is used to compute background bins and offsets based on limma::loessFit.

See the DiffBind vignette for technical details of how this is done, and the csaw vignette for details on background bins and loess offsets can be used to address different biases in ChIP-seq data.

Rory Stark

dba.count, dba.analyze, dba.save

# load DBA object with counts 
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition")

# default normalization: Full library sizes
tamoxifen <- dba.normalize(tamoxifen)
dba.normalize(tamoxifen, bRetrieve=TRUE)
dba.analyze(tamoxifen)

# RLE/TMM using Reads in Peaks
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS,
                           normalize=DBA_NORM_NATIVE, 
                           library=DBA_LIBSIZE_PEAKREADS)
dba.normalize(tamoxifen, method=DBA_DESEQ2, bRetrieve=TRUE)
dba.normalize(tamoxifen, method=DBA_EDGER, bRetrieve=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS,bDB=TRUE)

# TMM in Background using precomputed background
norm <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS,bRetrieve=TRUE)
tamoxifen <- dba.normalize(tamoxifen, background=norm$background,
                           normalize="TMM", method=DBA_ALL_METHODS)
tamoxifen <- dba.analyze(tamoxifen)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotMA(tamoxifen)

# LOESS offsets
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, offsets=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)

par(mfrow=c(3,1))
dba.plotMA(tamoxifen,th=0,bNormalized=FALSE)
dba.plotMA(tamoxifen,method=DBA_DESEQ2)
dba.plotMA(tamoxifen,method=DBA_EDGER)