Post process Estimation of binding site positions obtained from PING

Share:

Description

Post process Estimation of binding site positions obtained from PING. Refit mixture models with stronger prior in candidate regions contain potential problems, and then convert final result into dataframe.

Usage

1
postPING(ping, seg, rho2=NULL, sigmaB2=NULL, alpha2=NULL, beta2=NULL, min.dist= 100, paraEM=NULL, paraPrior=NULL, score=0.05, dataType="MNase", nCores=1, makePlot=FALSE, FragmentLength=100, mart=NULL, seg.boundary=NULL, DupBound=NULL, IP=NULL, datname="")

Arguments

ping

A 'pingList' object containing estimation of nuclesome positions, result of 'PING' function.

seg

An object of class 'segmentReadsList' containing the results for all regions pre-processed, 'segmentReads' function.

paraEM

A list of parameters for the EM algorithm. The default parameters should be good enough for most usages.

minK:

An integer, default=0. The minimum number of binding events per region. If the value is 0, the minimum number is automatically calculated.

maxK:

An integer, default=0. The maximum number of binding events per region. If the value is 0, the maximum number is automatically calculated.

tol:

A numeric, default=1e-4. The tolerance for the EM algorithm.

B:

An integer, default=100. The maximum number of iterations to be used.

mSelect:

A character string specifying the information criteria to be used when selecting the number of binding events. Default="AIC3"

mergePeaks:

A logical stating whether overlapping binding events should be picked. Default=TRUE

mapCorrect:

A logical stating whether mappability profiles should be incorporated in the estimation, i.e: missing reads estimated. Default=TRUE

paraPrior

A list of parameters for the prior distribution. The default parameters should be good enough for most usages.

xi:

An integer. The average DNA fragment size.

rho:

An integer. A variance parameter for the average DNA fragment size distribution.

alpha:

An integer. First hyperparameter of the inverse Gamma distribution for sigma^2 in the PICS model

beta:

An integer. Second hyperparameter of the inverse Gamma distribution for sigma^2 in the PING model

lambda:

An integer. The lambda control Gaussian Markov Random Field prior on the distance of adjacent nucleosomes, we do not recommend user change the default value.

dMu:

An integer. Our best guess for the distance between two neighboring nucleosomes.

rho2, sigmaB2, alpha2, beta2

Integer values, the parameters in the prior of mixture models to be re-fitted.

min.dist

The minimum distance of two adjacent nucleosomes predicted from different candidate regions, smaller than that will be treated as duplicated predictions for the same nucleosomes.

score

A numeric. The score threshold used when calling FilterPING.

dataType

A character string that can be set to use selected default parameters for the algorithm.

nCores

An integer. The number of cores that should be used in parallel by the function.

makePlot

A logical. Plot a summary of the output.

FragmentLength, mart, seg.boundary, DupBound, IP, datname

Plotting parameters and options.

IP:

A GRanges object. The reads used in segmentation process.

FragmentLength:

An integer. The length of XSET profile extension

Value

A data.frame containing the estimation of binding site positions.

Note

Based on our experiemt on a few real data sets, we suggestion to use following values of parameters. For sonication data we use rho1=1.2; sigmaB2=6400;rho=15;alpha1=10; alpha2=98; beta2=200000. For MNase data we use rho1=3; sigmaB2=4900; rho=8; alpha1=20; alpha2=100; beta2=100000. The value of xi depends on specy of sample, since that affect the length of linker-DNA. For example, we use xi=160 for yeast and xi=200 for mouse.

Author(s)

Xuekui Zhang <xzhang@stat.ubc.ca>, Sangsoon Woo, swoo@fhcrc.org and Raphael Gottardo <raphael.gottardo@ircm.qc.ca>

References

Xuekui Zhang, Gordon Robertson, Sangsoon Woo, Brad G. Hoffman, and Raphael Gottardo, "Probabilistic Inference for Nucleosome Positioning with MNase-based or Sonicated Short-read Data" PlosONE, under review.

See Also

PING plotSummary