mix: Fitting mixture of two densities, either Poisson or Negative...

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/mix.R

Description

mix uses an EM algorithm to fit ChIP-seq count data by a latent mixture model with two components. One component is the signal density and the other is the background density. mix can deal with more than one experiment at the same time. In this case, it fits individual models to each experiment. The output of this function can be used for further analysis by mix.joint or enrich.mix.

Usage

1
2
mix(data, method = NULL, initialpara=NULL, fixoffset=FALSE, fixk=3,krange=c(0:10), 
     exp.label=NULL, stopdiff=1e-04, parallel=FALSE)

Arguments

data

A list, whose first argument is a n x 3 matrix with information on the bins. The three columns should contain "Chromosome", "Start" and "Stop" information. The second list contains the counts of ChIP-seq experiments. This is a n x p matrix, where n is the number of bins and p is the number of experiments. Count data for at least one experiment should be given.

method

A character variable. Can be "Poisson" or "NB" and it refers to the densities of the mixture distribution.

initialpara

A numeric matrix or vector. The initial parameters given for EM algorithm. In form of c("p", "lambda_S", "lambda_B") if method="Poisson" or c("p", "mu_S", "phi_S", "mu_B", "phi_B") if method="NB". Could be a matrix if initial values are the different for multiple experiments or a vector if initial values are the same. If not given, then a default value of (0.1, 10, 1) or (0.1, 10, 1, 1, 1) for method="Poisson" or "NB" respectively.

fixoffset

A logical variable. If TRUE, the offset of the signal distribution is fixed by the user and is the same for all experiments. If FALSE, the offset is estimated empirically for each experiment. Default value is FALSE.

fixk

A numeric variable. The value of the offset, when fixoffset = TRUE.

krange

A numeric vector. The range of the offset, when fixoffset = FALSE. Default range is from 0 to 10.

exp.label

A charater vector, giving a label for each experiments.

stopdiff

A numeric variable. A prescribed small quantity for determining the convergence of the EM algorithm. Default value is 1e-04.

parallel

A logical variable. If TRUE, then the individual experiments will be processed in parallel, using the clusterApplyLB function in package parallel. Default value is TRUE.

Value

data

The data provided as input.

parameters

The parameters estimated by the mixture model. The parameters are (p, lambda_S, lambda_B, k) when method="Poisson" or (p, mu_S, phi_S, mu_B, phi_B, k) when method="NB". p is the proportion of signal in the mixture model. For a Poisson mixture model, lambda_S and lambda_B represent the mean of the signal and mean of the background, respectively. For a NB mixture model, mu_S and phi_S are the mean and overdispersion of the signal density, respectively, whereas mu_B and phi_B are the mean and overdispersion of the background density, respectively.

method

The method used for the analysis

Author(s)

Yanchun Bao and Veronica Vinciotti

References

Bao et al. Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics 2013, 14:169 DOI:10.1186/1471-2105-14-169.

See Also

See also mix.joint, enrich.mix

Examples

1
2
3
4
5
6
7
8
9
tempdir()
data(p300cbp.1000bp)
exp.label=c("CBPT0", "CBPT301", "CBPT302", "p300T0", 
"p300T301", "p300T302", "WangCBP", "Wangp300")
## Simple examples -- only two experiments and first 5000 observations
CBPT30=list()
CBPT30$region=p300cbp.1000bp$region[1:5000,]
CBPT30$count=p300cbp.1000bp$count[1:5000,2:3]
Poissonfit.simple<-mix(CBPT30, method="Poisson", exp.label=exp.label[c(2,3)])

enRich documentation built on March 13, 2020, 2:46 a.m.