mergetag: Aggregate sequence tags into dynamic genomic windows/bins and...

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/iSeq.R

Description

A function to aggregate sequence tags into genomic windows/bins with dynamic length specified by the user and count the number of tags falling in the dynamic windows/bins.

Usage

1
mergetag(chip,control,maxlen=80,minlen=10,ntagcut=10)

Arguments

chip

A n by 3 matrix or data frame. The Rows correspond to sequence tags. chip[,1] contains chromosome IDs; chip[,2] contains the genomic positions of sequence tags matched to the reference genome. For each tag, in order to accurately infer the true binding sites, we suggest using the middle positions of the tags as the tags' positions on the chromosomes. Note a genomic position must be an integer. chip[,3] contains the direction indicators of the sequence tags. The user can basically use any symbols to represent the forward or reverse chains. Function 'mergetag' use integer 1 and 2 to represent the directions of the chains by doing as.numeric(as.factor(chip[,3])). Therefore, the user should know the directions referred by integer 1 and 2. For example, if the forward and reverse chains are represented by 'F' and 'R', respectively, then chains 1 and 2 will refer to the forward and reverse chain, respectively. In the output, the tag counts are summarized for chains 1 and 2, respectively (see the below for details).

control

A n by 3 matrix or data frame. The column names of control must be the same as the column names of chip.

maxlen

The maximum length of the genomic window/bin into which sequence tags are aggregated.

minlen

The minimum length of the genomic window/bin into which sequence tags are aggregated.

ntagcut

The tag count cutoff value for triggering bin size change. For example, suppose L_i and C_i are the length and tag count for bin i, respectively. If C_i >= ntagcut, the length for bin i+1 will be min(L_i/2,minlen); if C_i < ntagcut, the length for bin i+1 will be max(2*L_i, maxlen). Note, by default, the bin sizes decrease/increase by a factor of 2. Thus, the user should let maxlen = (2^n)*minlen.

Value

A data frame with rows corresponding to the bins and columns corresponding to the following:

chr

Chromosome IDs.

gstart

The start position of the bin.

gend

The position of the last read/tag falling in the bin.

gend2

The end position of the bin (gend2 only shows in the output when argument control is missing).

ct12

For one-sample analysis, where only the ChIP data are available, ct12 = ipct1 + ipct2. For two-sample analysis, where both the ChIP and control data are available. ct12 = maximum(ipct1+ipct2-conct1-conct2,0).

ipct1

The number of sequence tags for the chain 1 of the ChIP data.

ipct2

The number of sequence tags for the chain 2 of the ChIP data.

conct1

The number of sequence tags for the chain 1 of the control data.

conct2

The number of sequence tags for the chain 2 of the control data.

Author(s)

Qianxing Mo qianxing.mo@moffitt.org

References

Qianxing Mo. (2012). A fully Bayesian hidden Ising model for ChIP-seq data analysis. Biostatistics 13(1), 113-28.

See Also

iSeq1, iSeq2, peakreg,plotreg

Examples

1
2
3
4
5
data(nrsf)
chip = rbind(nrsf$chipFC1592,nrsf$chipFC1862,nrsf$chipFC2002)
mock = rbind(nrsf$mockFC1592,nrsf$mockFC1862,nrsf$mockFC2002)

tagct = mergetag(chip=chip,control=mock,maxlen=80,minlen=10,ntagcut=10)

iSeq documentation built on Nov. 8, 2020, 8:03 p.m.

Related to mergetag in iSeq...