cssp.fit-methods: Fit the CSSP Model.

Description Usage Arguments Details Value Author(s) Examples

Description

Fit the CSSP Model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cssp.fit(dat, method = "mde", p1 = 0.5, p2 = 0.99, beta.init = NULL,
  e0.init = 0.9, e0.lb = 0.5, ngc = 9, nite = 50, tol = 0.01,
  useGrid = FALSE, nsize = NULL, ncomp = 2, nonpa = FALSE,
  zeroinfl = FALSE, seed = NULL)

## S4 method for signature 'data.frame'
cssp.fit(dat, method = "mde", p1 = 0.5, p2 = 0.99,
  beta.init = NULL, e0.init = 0.9, e0.lb = 0.5, ngc = 9, nite = 50,
  tol = 0.01, useGrid = FALSE, nsize = NULL, ncomp = 2, nonpa = FALSE,
  zeroinfl = FALSE, seed = NULL)

## S4 method for signature 'BinData'
cssp.fit(dat, method = "mde", p1 = 0.5, p2 = 0.99,
  beta.init = NULL, e0.init = 0.9, e0.lb = 0.5, ngc = 9, nite = 50,
  tol = 0.01, useGrid = FALSE, nsize = NULL, ncomp = 2, nonpa = FALSE,
  zeroinfl = FALSE, seed = NULL)

Arguments

dat

A data.frame or BinData-class object containing bin-level chip, input, M and GC information. For the data.frame object, the columns must contain "chip", "input", "M". For BinData object, the slots must contain "tagCount", "input", "M". If "GC" is not provided, model will be fitted without using gc-Content scores.

method

A character indicating the method of fitting algorithm to be used. "mde" (Default) - minimum distance estimation; "gem" - the generalized EM method.

p1

The numeric value for the lower bound for the p-value region where the p-values are assumed to be uniformly distributed. Default: 0.5.

p2

The numeric value for the upper bound for the p-value region where the p-values are assumed to be uniformly distributed. Default: 0.99.

beta.init

The numeric value for the initializing the size parameter for the background model of the ChIP sample. If "NULL", the size parameter of the fitted input sample model is used.

e0.init

The numeric value for initializing parameter e0. Default: 0.9.

e0.lb

The numeric value for the lower bound of parameter e0. Default is 0.5. This parameter is recommended to be set according to the p-value plot.

ngc

An integer value for the number of knots used in the spline model for the gc covariate. Default: 9.

nite

An integer value for the maximum number of iterations taken. Default: 50.

tol

A numeric value for the tolerance for convergence. Default: 1e-3.

useGrid

A logical value indicating whether the gridding method is used. If TRUE, the covariate space is grided adaptively. This trims down the sample size for fitting the regression model when the data contains too many observations, and is suggested for genome-wide analysis. Default: FALSE.

nsize

A numeric value for the number of bins to be randomly chosen in estimating the normalizatiing parameters. If Null (default), all bins are used in normalization. For genome wide analysis, nsize=5000 is suggested.

ncomp

A numeric value for the number of signal components.

nonpa

A logical value indicating whether a nonparametric model for the background ChIP sample and the input sample is fitted.

zeroinfl

A logical value indicating whether a zero-inflated negative binomial model is fitted for the ChIP background.

seed

A numeric value for the seed of generating random variables. Default: NULL. Users should specify this value for generating exactly reproducible results.

Details

The current version of cssp.fit has implemented the following method.
The "method" argument specifies the method to estimate the normalization models for the ChIP background from the input data. "mde" uses minimum distance estimation, "gem" uses generalized E-M estimation.
The 'nonpa' argument specifies whether a glm model is used. If "nonpa" is FALSE, a GLM is used to fit the input data. If "nonpa" is TRUE, the mean response within each grid is taken as the predict. These two arguments enables the analysis for genome-wide data. In this case, "nsize" grids are used.
If "nonpa" is FALSE, then "useGrid" specifies whether the covariate space is grided adaptively, and the mean values within each grid is used for regression.
If "nonpa" is TRUE, "zeroinfl" specifies whether a zero-inflation model for the background is used. This is useful for low-depth ChIP data, where too many bins have zero count.

Value

CSSPFit-class A CSSPFit object.

Author(s)

Chandler Zuo zuo@stat.wisc.edu

Examples

1
2
3
4
5
6
data( bin.data )
cssp.fit( bin.data )
cssp.fit( bin.data, method = "gem" )
data( bindata.chr1 )
cssp.fit( bindata.chr1 )
cssp.fit( bindata.chr1, method = "gem", ngc = 1 )

chandlerzuo/cssp documentation built on May 13, 2019, 3:23 p.m.