bioCond: Create a 'bioCond' Object to Group ChIP-seq Samples

View source: R/bioCond.R

bioCondR Documentation

Create a bioCond Object to Group ChIP-seq Samples

Description

bioCond creates an object which represents a biological condition, given a set of ChIP-seq samples belonging to the condition. Such objects, once created, can be supplied to fitMeanVarCurve to fit the mean-variance trend, and subsequently to diffTest for calling differential ChIP-seq signals between two conditions.

Usage

bioCond(
  norm.signal,
  occupancy = NULL,
  occupy.num = 1,
  name = "NA",
  weight = NULL,
  strMatrix = NULL,
  meta.info = NULL
)

Arguments

norm.signal

A matrix or data frame of normalized signal intensities, where each row should represent a genomic interval and each column a sample.

occupancy

A matrix or data frame of logical values with the same dimension as of norm.signal, marking the occupancy status of each interval in each sample. This argument is only used to derive the occupancy status of each interval in the biological condition. By default, each interval is considered to be occupied by each sample.

occupy.num

For each interval, the minimum number of samples occupying it required for the interval to be considered as occupied by the biological condition (see also "Details").

name

A character scalar specifying the name of the biological condition. Used only for demonstration.

weight

A matrix or data frame specifying the relative precisions of signal intensities in norm.signal. Must have the same number of columns as norm.signal. A vector is interpreted as a matrix having a single row. Note that rows of weight are recycled if necessary. By default, the same weight is assigned to each measurement in norm.signal.

strMatrix

An optional list of symmetric matrices specifying directly the structure matrix of each genomic interval. Elements of it are recycled if necessary. This argument, if set, overrides the weight argument. See "Details" and setWeight for information about structure matrix.

meta.info

Optional extra information (e.g., genomic coordinates of intervals). If set, the supplied argument is stored in the meta.info field of returned bioCond, and shall never be used by other tools in MAnorm2.

Details

To call this function, one typically needs to first perform an MA normalization on raw read counts of ChIP-seq samples by using normalize.

The function will assign an indicator to each genomic interval (stored in the occupancy field of the returned object; see also "Value"), marking if the interval is occupied by this biological condition. The argument occupy.num controls the minimum number of samples that occupy an interval required for the interval to be determined as occupied by the condition. Notably, the occupancy states of genomic intervals may matter when fitting a mean-variance curve, as one may choose to use only the occupied intervals to fit the curve (see also fitMeanVarCurve).

For signal intensities of each genomic interval, weight specifies their relative precisions corresponding to different ChIP-seq samples in norm.signal. Intrinsically, the weights will be used to construct the structure matrices of the created bioCond. Alternatively, one can specify strMatrix directly when calling the function. To be noted, MAnorm2 uses a structure matrix to model the relative variances of signal intensities of a genomic interval as well as the correlations among them, by considering them to be associated with a covariance matrix proportional to the structure matrix. See setWeight for a detailed description of structure matrix.

Value

bioCond returns an object of class "bioCond", representing the biological condition to which the supplied ChIP-seq samples belong.

In detail, an object of class "bioCond" is a list containing at least the following fields:

name

Name of the biological condition.

norm.signal

A matrix of normalized signal intensities of ChIP-seq samples belonging to the condition.

occupancy

A logical vector marking the occupancy status of each genomic interval.

meta.info

The meta.info argument (only present when it is supplied).

strMatrix

Structure matrices associated with the genomic intervals.

sample.mean

A vector of observed mean signal intensities of genomic intervals.

sample.var

A vector recording the observed variance of signal intensities of each genomic interval.

Note that the sample.mean and sample.var fields are calculated by applying the GLS (generalized least squares) estimation to the signal intensities of each genomic interval, considering them as having a common mean and a covariance matrix proportional to the corresponding structure matrix. Specifically, the sample.var field times the corresponding structure matrices gives an unbiased estimate of the covariance matrix associated with each interval (see setWeight for details).

Besides, a fit.info field will be added to bioCond objects once you have fitted a mean-variance curve for them (see fitMeanVarCurve for details).

There are also other fields used internally for fitting the mean-variance trend and calling differential intervals between conditions. These fields should never be modified directly.

Warning

Among all the fields contained in a bioCond object, only name and meta.info are subject to free modifications; The strMatrix field must be modified through setWeight.

References

Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.

See Also

normalize for performing an MA normalization on ChIP-seq samples; normalizeBySizeFactors for normalizing ChIP-seq samples based on their size factors; setWeight for modifying the structure matrices of a bioCond object.

normBioCond for performing an MA normalization on bioCond objects; normBioCondBySizeFactors for normalizing bioCond objects based on their size factors; cmbBioCond for combining a set of bioCond objects into a single one; MAplot.bioCond for creating an MA plot on two bioCond objects; summary.bioCond for summarizing a bioCond.

fitMeanVarCurve for modeling the mean-variance dependence across intervals in bioCond objects; diffTest for comparing two bioCond objects; aovBioCond for comparing multiple bioCond objects; varTestBioCond for calling hypervariable and invariant intervals across ChIP-seq samples contained in a bioCond.

Examples

data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")

## Construct a bioCond object for the GM12891 cell line.

# Apply MA normalization to the ChIP-seq samples of GM12891.
norm <- normalize(H3K27Ac, 5:6, 10:11)

# Call the constructor and optionally attach some meta information to the
# resulting bioCond, such as the coordinates of genomic intervals.
GM12891 <- bioCond(norm[5:6], norm[10:11], name = "GM12891",
                   meta.info = norm[1:3])

# Alternatively, you may assign different weights to the replicate samples
# for estimating the mean signal intensities of genomic intervals in this
# cell line. Here the weight of the 2nd replicate is reduced to half the
# weight of the 1st one.
GM12891_2 <- bioCond(norm[5:6], norm[10:11], name = "GM12891",
                     weight = c(1, 0.5))

# Equivalently, you can achieve the same effect by setting the strMatrix
# parameter.
GM12891_3 <- bioCond(norm[5:6], norm[10:11], name = "GM12891",
                     strMatrix = list(diag(c(1, 2))))


MAnorm2 documentation built on Oct. 29, 2022, 1:12 a.m.