splitGAlignmentsByCut: split bams into nucleosome free, mononucleosome, dinucleosome...
In jianhong/ATACseqQC: ATAC-seq Quality Control

splitGAlignmentsByCut

R Documentation

split bams into nucleosome free, mononucleosome, dinucleosome and trinucleosome

Description

use random forest to split the reads into nucleosome free, mononucleosome, dinucleosome and trinucleosome. The features used in random forest including fragment length, GC content, and UCSC phastCons conservation scores.

Usage

splitGAlignmentsByCut(
  obj,
  txs,
  genome,
  conservation,
  outPath,
  breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf),
  labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome",
    "inter3", "trinucleosome", "others"),
  labelsOfNucleosomeFree = "NucleosomeFree",
  labelsOfMononucleosome = "mononucleosome",
  trainningSetPercentage = 0.15,
  cutoff = 0.8,
  halfSizeOfNucleosome = 80L,
  summaryFun = mean
)

Arguments

`obj`	an object of GAlignments
`txs`	GRanges of transcripts
`genome`	an object of BSgenome
`conservation`	an object of GScores.
`outPath`	folder to save the splitted alignments. If outPath is setting, the return of the function will not contain seq and qual fields.
`breaks`	a numeric vector for fragment size of nucleosome free, mononucleosome, dinucleosome and trinucleosome. The breaks pre-defined here is following the description of Greenleaf's paper (see reference).
`labels`	a character vector for labels of the levels of the resulting category.
`labelsOfNucleosomeFree`, `labelsOfMononucleosome`	character(1). The label for nucleosome free and mononucleosome.
`trainningSetPercentage`	numeric(1) between 0 and 1. Percentage of trainning set from top coverage.
`cutoff`	numeric(1) between 0 and 1. cutoff value for prediction.
`halfSizeOfNucleosome`	numeric(1) or integer(1). Thre read length will be adjusted to half of the nucleosome size to enhance the signal-to-noise ratio.
`summaryFun`	Function to summarize genomic scores when more than one position is retrieved. This will greatly affect the CPU time.

Value

a list of GAlignments

Author(s)

Jianhong Ou

References

Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. and Greenleaf, W.J., 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12), pp.1213-1218.

Chen, K., Xi, Y., Pan, X., Li, Z., Kaestner, K., Tyler, J., Dent, S., He, X. and Li, W., 2013. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome research, 23(2), pp.341-351.

Examples

library(GenomicRanges)
bamfile <- system.file("extdata", "GL1.bam", 
                       package="ATACseqQC", mustWork=TRUE)
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
gal1 <- readBamFile(bamFile=bamfile, tag=tags, 
                    which=GRanges("chr1", IRanges(1, 1e6)), 
                    asMates=FALSE)
names(gal1) <- mcols(gal1)$qname
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(phastCons100way.UCSC.hg19)
splitGAlignmentsByCut(gal1, txs=txs, genome=Hsapiens, 
                      conservation=phastCons100way.UCSC.hg19)

jianhong/ATACseqQC documentation built on June 14, 2025, 1:23 a.m.