dba.normalize: Specify parameters for normalizing a dataset; calculate...

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/DBA.R

Description

Enables normalization of datasets using a variety of methods, including background, spike-in, and parallel factor normalization. Alternatively, allows a user to specify library sizes and normalization factors directly, or retrieve computed ones.

Usage

1
2
3
4
dba.normalize(DBA, method = DBA$config$AnalysisMethod,
              normalize = DBA_NORM_DEFAULT, library = DBA_LIBSIZE_DEFAULT, 
              background = FALSE, spikein = FALSE, offsets = FALSE,
              libFun=mean, bRetrieve=FALSE, ...)

Arguments

DBA

DBA object that includes count data for a consensus peakset.

method

Underlying method, or vector of methods, for which to normalize.

Supported methods:

  • DBA_EDGER use edgeR package for analysis

  • DBA_DESEQ2 use DESeq2 package for analysis

  • DBA_ALL_METHODS normalize for both both edgeR and DESeq2

normalize

Either user-supplied normalization factors in a numeric vector, or a specification of a method to use to calculate normalization factors.

Methods can be specified using one of the following:

  • DBA_NORM_RLE ("RLE") RLE normalization (native to DBA_DESEQ2, and available for DBA_EDGER).

  • DBA_NORM_TMM ("TMM") TMM normalization (native to DBA_EDGER, and available for DBA_DESEQ2).

  • DBA_NORM_NATIVE ("native") Use native method based on method: DBA_NORM_RLE for DBA_DESEQ2 or DBA_NORM_TMM for DBA_EDGER.

  • DBA_NORM_LIB ("lib") Normalize by library size only. Library sizes can be specified using the library parameter. Normalization factors will be calculated to give each equal weight in a manner appropriate for the analysis method. See also the libFun parameter, which can be used to scale the normalization factors for DESeq2.

  • DBA_NORM_DEFAULT ("default") Default method: The "preferred" normalization approach depending on method and whether an explicit design is present. See Details below.

  • DBA_NORM_OFFSETS ("offsets") Indicates that offsets have been specified using the offsets parameter, and they should be used without alteration.

  • DBA_NORM_OFFSETS_ADJUST ("adjust offsets") Indicates that offsets have been specified using the offsets parameter, and they should be adjusted for library size and mean centering before being used in a DBA_DESEQ2 analysis.

library

Either user-supplied library sizes in a numeric vector, or a specification of a method to use to calculate library sizes.

Library sizes can be based on one of the following:

  • DBA_LIBSIZE_FULL ("full") Use the full library size (total number of reads in BAM/SAM/BED file)

  • DBA_LIBSIZE_PEAKREADS ("RiP") Use the number of reads that overlap consensus peaks.

  • DBA_LIBSIZE_BACKGROUND ("background") Use the total number of reads aligned to the chromosomes for which there is at least one peak. This required a background bin calculation (see parameter background). These values are usually the same or similar to DBA_LIBSIZE_FULL.

  • DBA_LIBSIZE_DEFAULT ("default") Default method: The "preferred" library size depending on method, background, and whether an explicit design is present. See Details below.

background

This parameter controls the option to use "background" bins, which should not have differential enrichment between samples, as the basis for normalizing (instead of using reads counts overlapping consensus peaks). When enabled, the chromosomes for which there are peaks in the consensus peakset are tiled into large bins and reads overlapping these bins are counted.

If present, background can either be a logical value, a numeric value, or a previously computed $background object.

If background is a logical value and set to TRUE, background bins will be computed using the default bin size of 15000bp. Setting this value to FALSE will prevent background mode from being used in any default settings.

If background is a numeric value, it will be used as the bin size.

If background is a previously computed $background object, these counts will be used as the background. A $background object can be obtained by calling dba.normalize with bRetrieve=TRUE and method=DBA_ALL_METHODS.

After counting (or setting) background bins, both the normalize and library parameters will be used to determine how the final normalization factors are calculated.

If background is missing, it will be set to TRUE if library=DBA_LIBSIZE_BACKGROUND, or if library=DBA_LIBSIZE_DEFAULT and certain conditions are met (see Details below).

If background is not FALSE, then the library size will be set to library=DBA_LIBSIZE_BACKGROUND

spikein

Either a logical value, a character vector of chromosome names, a GRanges object containing peaks for a parallel factor, or a $background object containing previously computed spike-in read counts.

If spikein is a logical value set to FALSE, no spike-in normalization is performed.

If spikein is a logical value set to TRUE, background normalization is performed using spike-in tracks. There must be a spike-in track for each sample. see dba and/or dba.peakset for details on how to include a spike-in track with a sample (eg. by including a Spikein column in the sample sheet.) All chromosomes in the spike-in bam files will be used.

If spikein is a character vector of one or more chromosome names, only reads on the named chromosome(s) will be used for background normalization. If spike-in tracks are available, reads on chromosomes with these names in the spike-in track will be counted. If no spike-in tracks are available, reads on chromosomes with these names in the main bamReads bam files will be counted.

If spikein is a GRanges object containing peaks for a parallel factor, then background normalization is performed counting reads in the spike-in tracks overlapping peaks in this object.

If spikein is a previously computed $background object, these counts will be used as the spikein background. A $background object can be obtained by calling dba.normalize with bRetrieve=TRUE and method=DBA_ALL_METHODS.

Note that if spikein is not FALSE, then the library size will be set to library=DBA_LIBSIZE_BACKGROUND

offsets

This parameter controls the use of offsets (matrix of normalization factors) instead of a single normalization factor for each sample. It can either be a logical value, a matrix, or a SummarizedExperiment.

If it is a logical value and set to FALSE, no offsets will be computed or used. A value of TRUE indicates that an offset matrix should be computed using a loess fit.

Alternatively, user-calculated normalization offsets can be supplied as a matrix or as a SummarizedExperiment (containing an assay named "offsets"). In this case, the user may also set the normalize parameter to indicate whether the offsets should be applied as-is to a DESeq2 analysis (DBA_NORM_OFFSETS, default), or if they should be adjusted for library size and mean centering (DBA_NORM_OFFSETS_ADJUST).

libFun

When normalize=DBA_NORM_LIB, normalization factors are calculated by dividing the library sizes for each sample by a common denominator, obtained by applying libFun to the vector of library sizes.

For method=DBA_EDGER, the normalization factors are further adjusted so as to make all the effective library sizes (library sizes multiplied by normalization factors) the same, and adjusted to multiply to 1.

bRetrieve

If set to TRUE, information about the current normalization will be returned. The only other relevant parameter in this case is the method.

If method=DBA_DESEQ2 or method=DBA_EDGER, a record will be returned including normalization values for the appropriate analysis method. This record is a list consists of the following elements:

  • $norm.method A character string corresponding to the normalization method, generally one of the values that can be supplied as a value to normalize.

  • $norm.factors A vector containing the computed normalization factors.

  • $lib.method A character string corresponding to the value of the method used to calculate the library size, generally one of the values that can be supplied as a value to library.

  • $lib.sizes A vector containing the computed library sizes.

  • $background If the normalization if based on binned background reads, this field will be TRUE.

  • $control.subtract If control reads were subtracted from the read counts, this field will be TRUE.

If method=DBA_ALL_METHODS, the record be a list with one of the above records for each method for which normalization factors have been computed ($DESeq2 and edgeR).

If background bins have been calculated, this will include an element called $background. This element can be passed in as the value to background or spikein to re-use a previously computed set of reads. It contains three subfields:

  • $background$binned a SummarizedExperiment object containing the binned counts.

  • $background$bin.size a numeric value with the bin size used.

  • $background$back.calc character string indicating how the background was calculated (bins, spike-ins, or parallel factor).

If offsets are available, this will include an element called $offsets with two subfields:

  • $offsets$offsets a matrix or a SummarizedExperiment object containing the offsets.

  • offsets$offset.method a character string indicating the source of the offsets, either "loess" or "user".

...

Extra parameters to be passed to limma::loessFit when computing offsets.

Details

The default normalization parameters are as follows:

If background=TRUE, then the default becomes library=DBA_LIBSIZE_BACKGROUND.

If dba.contrast has been used to set up contrasts with design=FALSE (pre-3.0 mode), then the defaults are:

In this case, normalize=DBA_NORM_LIB will be set for method=DBA_DESEQ2 for backwards compatibility.

Value

Either a DBA object with normalization terms added, or (if bRetrieve=TRUE), a record or normalization details.

Note

The csaw package is used to compute background bins and offsets based on limma::loessFit.

See the DiffBind vignette for technical details of how this is done, and the csaw vignette for details on background bins and loess offsets can be used to address different biases in ChIP-seq data.

Author(s)

Rory Stark

See Also

dba.count, dba.analyze, dba.save

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# load DBA object with counts 
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen,design="~Tissue + Condition")

# default normalization: Full library sizes
tamoxifen <- dba.normalize(tamoxifen)
dba.normalize(tamoxifen, bRetrieve=TRUE)
dba.analyze(tamoxifen)

# RLE/TMM using Reads in Peaks
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS,
                           normalize=DBA_NORM_NATIVE, 
                           library=DBA_LIBSIZE_PEAKREADS)
dba.normalize(tamoxifen, method=DBA_DESEQ2, bRetrieve=TRUE)
dba.normalize(tamoxifen, method=DBA_EDGER, bRetrieve=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotVenn(tamoxifen,contrast=1,method=DBA_ALL_METHODS,bDB=TRUE)

# TMM in Background using precomputed background
norm <- dba.normalize(tamoxifen,method=DBA_ALL_METHODS,bRetrieve=TRUE)
tamoxifen <- dba.normalize(tamoxifen, background=norm$background,
                           normalize="TMM", method=DBA_ALL_METHODS)
tamoxifen <- dba.analyze(tamoxifen)
dba.show(tamoxifen,bContrasts=TRUE)
dba.plotMA(tamoxifen)

# LOESS offsets
tamoxifen <- dba.normalize(tamoxifen, method=DBA_ALL_METHODS, offsets=TRUE)
tamoxifen <- dba.analyze(tamoxifen, method=DBA_ALL_METHODS)
dba.show(tamoxifen,bContrasts=TRUE)

par(mfrow=c(3,1))
dba.plotMA(tamoxifen,th=0,bNormalized=FALSE)
dba.plotMA(tamoxifen,method=DBA_DESEQ2)
dba.plotMA(tamoxifen,method=DBA_EDGER)

DiffBind documentation built on March 24, 2021, 6 p.m.