QL.fit: Fit quasi-likelihood models to replicated ChIP-seq data...

Description Usage Arguments Details Value Author(s) References Examples

Description

Fit constrained quasi-likelihood models to ChIP-seq data partitioned into a count matrix.

Usage

1
2
3
QL.fit(counts, design.matrix, chip.col.indicator, log.offset = NULL,
  Model = "NegBin", print.progress = TRUE, NBdisp = "trend",
  bias.fold.tolerance = 1.1, ...)

Arguments

counts

A matrix of integers obtained by counting reads across a genomic partition. Each row contains observations from a single window of the genomic partition. Each column contains observations from a single sample (either ChIP or control/input).

design.matrix

A design matrix for the full model, including a column that indicates whether the observation is a ChIP sample. The number of rows must be ncol(counts). The number of columns must be at least two, usually an intercept and an indicator whether the sample is ChIP (1) or input/control (0). Means are modeled with a log link function.

chip.col.indicator

A binary vector of length ncol(design.matrix) that indicates which column of design.matrix corresponds to the ChIP indicator.

log.offset

A vector of log-scale, additive factors used to adjust estimated log-scale means for differences in library sizes across samples. Commonly used offsets include log.offset=log(colSums(counts)) and log.offset=log(apply(counts[rowSums(counts)!=0,],2,quantile,.75)). If NULL, the later offset is used.

Model

Must be one of "Poisson" or "NegBin", specifying use of a quasi-Poisson or quasi-negative binomial model, respectively.

print.progress

logical. If TRUE, updates are provided regarding which window (row number) is being analyzed. Updates occur frequently to start then eventually occur every 5000 windows.

NBdisp

Used only when Model="NegBin". Must be one of "trend", "common" or a vector of non-negative real numbers with length equal to nrow(counts). Specifying NBdisp="trend" or NBdisp="common" will use estimateGLMTrendedDisp or estimateGLMCommonDisp, respectively, from the package edgeR to estimate negative binomial dispersion parameters for each window. Estimates obtained from other sources can be used by entering NBdisp as a vector containing the negative binomial dispersion value to use for each window when fitting the quasi-likelihood model.

bias.fold.tolerance

A numerical value no smaller than 1. If the bias reduction of maximum likelihood estimates of (log) fold change is likely to result in a ratio of fold changes greater than this value, then bias reduction will be performed on such windows. Setting bias.fold.tolerance=Inf will completely disable bias reduction; setting bias.fold.tolerance=1 will always perform bias reduction. Estimates that are projected into the constrained space are not bias-reduced.

...

Arguments to be passed to the function estimateGLMTrendedDisp or estimateGLMCommonDisp from the package edgeR.

Details

A wrapper for PoisDev or NBDev, depending on whether quasi-Poisson or quasi-negative binomial models are requested. See the respective functions for details. Used within the main BQ peak calling function.

Value

A list containing:

LRT

Matrix providing unadjusted two-sided likelihood ratio test statistics. Each column contains statistics from a single hypothesis test that the ChIP coefficient is equal to zero versus not equal to zero, applied separately to each window.

phi.hat.dev

Vector providing unshrunken, deviance-based estimates of the quasi-dispersion parameter for each window.

phi.hat.pearson

Vector providing unshrunken, Pearson estimates of the quasi-dispersion parameter for each window.

mn.cnt

Vector of the average count for each window.

den.df

Denominator degrees of freedom. Equal to the number of samples minus the number of fitted parameters in the full model.

num.df

Vector of numerator degrees of freedom for each test, computed as the difference in the number of fitted parameters between the full and reduced models.

Model

Either "Poisson" or "NegBin", specifying which model (quasi-Poisson or quasi-negative binomial, respectively) was used.

nb.disp

Only appears when Model="NegBin". Vector providing negative binomial dispersion parameter estimates used during fitting of quasi-negative binomial model for each window.

fitted.values

Matrix of fitted mean values without constraints.

coefficients

Matrix of estimated coefficients for each window. Note that these are given on the log scale. (i.e., intercept coefficients report log(average count) and non-intercept coefficients report estimated (and bias reduced, when appropriate) log fold-changes.)

LRT.constrained

Same as LRT, but uses the constrained MLE in the full model. Each column contains statistics from a single hypothesis test that the ChIP coefficient is equal to zero versus greater than zero, applied separately to each window.

phi.hat.dev.constrained

Same as phi.hat.dev, but subject to the constraint that the ChIP coefficient is non-negative.

phi.hat.pearson.constrained

Same as phi.hat.pearson, but subject to the constraint that the ChIP coefficient is non-negative.

fitted.values.constrained

Same as fitted.values, but subject to the constraint that the ChIP coefficient is non-negative.

coefficients.constrained

Same as coefficients, but subject to the constraint that the ChIP coefficient is non-negative.

Author(s)

Emily Goren (emily.goren@gmail.com) based on modifications of code by Steve Lund.

References

Goren, Liu, Wang and Wang (2018) "BinQuasi: a peak detection method for ChIP-sequencing data with biological replicates" Bioinformatics.

Kosmidis and Firth (2009) "Bias reduction in exponential family nonlinear models" Biometrika, 96, 793–804.

Lund, Nettleton, McCarthy and Smyth (2012) "Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates" SAGMB, 11(5).

McCarthy, Chen and Smyth (2012) "Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation" Nucleic Acids Res. 40(10), 4288–97.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
 set.seed(5)
 ####################################################
 # Simulate data three replicates with one chromosome
 ####################################################
 reps <- 3
 chr.length <- 1e5
 window.width <- 200
 K <- chr.length / window.width
 start <- seq(1, chr.length, by = window.width)
 end <- start + window.width - 1
 n.peaks <- 100 # No. of true peaks
 peak.idx <- sample.int(K, n.peaks)
 samples <- c(paste0('C', 1:reps), paste0('I', 1:reps))
 # Set parameters
 beta0 <- runif(K, log(10), log(100))
 beta1 <- rep(0, K); beta1[peak.idx] <- log(5) / runif(n.peaks)^(1/5)
 # Set means
 mu.ChIP <- exp(beta0 + beta1)
 mu.input <- exp(beta0)
 # Negative binomial dispersion parameter
 phi <- 1/rchisq(K, df = 5)
 # Draw read counts using a negative binomial distribution
 C <- lapply(1:reps, function(r) rpois(K, (mu.ChIP * rgamma(K, 1/phi))/(1/phi)))
 I <- lapply(1:reps, function(r) rpois(K, (mu.input * rgamma(K, 1/phi))/(1/phi)))
 counts <- do.call('cbind', append(C, I))
 colnames(counts) <- samples
 rownames(counts) <- start
 head(counts)
 
 ####################################################
 # Fit quasi-negative binomial model to each window.
 ####################################################
 design.matrix  <- cbind(rep(1, reps*2), # Intercept
                         rep(c(1,0), each = reps)) # Indicates ChIP sample
 chip.col.indicator <- c(0,1) # Second column of design matrix indicates ChIP sample
 fit <- QL.fit(counts, design.matrix, chip.col.indicator, 
               log.offset = rep(1, ncol(counts)), Model = 'NegBin') 
 # Look at fitted values
 counts.fitted <- fit$fitted.values.constrained
 head(round(counts.fitted, 2))
 

emilygoren/BinQuasi documentation built on May 17, 2019, 12:08 p.m.