peakreg: Call and merge enriched genomic windows/bins.

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/iSeq.R

Description

A function used to call and merge enriched bins using the posterior probability calculated by iSeq1 or iSeq2 functions at certain posterior probability and false discovery rate (FDR) cutoffs.

Usage

1
peakreg(chrpos,count,pp,cutoff,method=c("ppcut","fdrcut"),maxgap=300)

Arguments

chrpos

A n by 3 matrix or data frame. The rows correspond to genomic bins. The first column contains chromosome IDs; the second and third columns contain the start and end positions of the bin, respectively.

count

A n by 2 matrix containing the number of sequence tags in the bins specified by chrpos. The first column contains the tag counts for chain 1 (usually the forward chain), and the second column contains the tag counts for chain 2 (usually the reverse chain). See the document of the function 'mergetag' for the definition of chain 1 and 2. The function uses the information in 'count' to find the center of the enriched regions, where the true binding sites are usually located.

pp

A vector containing the posterior probabilities of bins in the enriched state returned by functions iSeq1 or iSeq2.

cutoff

The cutoff value (a scalar) used to call enriched bins. If use posterior probability as a criterion (method="ppcut"), a bin is said to be enriched if its pp is greater than the cutoff. If use FDR as a criterion (method="fdrcut"), bins are said to be enriched if the bin-based FDR is less than the cutoff. The FDR is calculated using a direct posterior probability approach (Newton et al., 2004).

method

'ppcut' or 'fdrcut'.

maxgap

The criterion used to merge enriched bins. If the genomic distance of adjacent bins is less than maxgap, the bins will be merged into the same enriched region.

Value

A data frame with rows corresponding to enriched regions and columns corresponding to the following:

chr

Chromosome IDs.

gstart

The start genomic position of the enriched region.

gend

The end genomic position of the enriched region.

rstart

The row number for gstart in chrpos.

rend

The row number for gend in chrpos.

peakpos

The inferred center (peak) of the enriched region.

meanpp

The mean posterior probability of the merged regions/bins.

ct1

total tag counts for the region from gstart to gend for the chain corresponding to count[,1]; ct1=sum(count[rstart:rend,1])

ct2

total tag counts for the region from gstart to gend for the chain corresponding to count[,2]; ct2=sum(count[rstart:rend,2])

ct12

ct12 = ct1 + ct2

sym

A parameter used to measure if the forward and reverse tag counts are symmetrical (or balanced) in enriched regions. The values range from 0.5 (perfect symmetry) to 0 (complete asymmetry).

Author(s)

Qianxing Mo qianxing.mo@moffitt.org

References

Qianxing Mo. (2012). A fully Bayesian hidden Ising model for ChIP-seq data analysis. Biostatistics 13(1), 113-28.

Newton, M., Noueiry, A., Sarkar, D., Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 , 155-176.

See Also

iSeq1, iSeq2, mergetag,plotreg

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(nrsf)
chip = rbind(nrsf$chipFC1592,nrsf$chipFC1862,nrsf$chipFC2002)
mock = rbind(nrsf$mockFC1592,nrsf$mockFC1862,nrsf$mockFC2002)
tagct = mergetag(chip=chip,control=mock,maxlen=80,minlen=10,ntagcut=20)
tagct22 = tagct[tagct[,1]=="chr22",]
res1 = iSeq1(Y=tagct22[,1:4],gap=200,burnin=200,sampling=500,ctcut=0.95,a0=1,b0=1,
 a1=5,b1=1, k0=3,mink=0,maxk=10,normsd=0.1,verbose=FALSE)

reg1 = peakreg(tagct22[,1:3],tagct22[,5:6]-tagct22[,7:8],res1$pp,0.5,
        method="ppcut",maxgap=200)

reg2 = peakreg(tagct22[,1:3],tagct22[,5:6]-tagct22[,7:8],res1$pp,0.05,
         method="fdrcut",maxgap=200)

iSeq documentation built on Nov. 8, 2020, 8:03 p.m.

Related to peakreg in iSeq...