splitGAlignmentsByCut: split bams into nucleosome free, mononucleosome, dinucleosome...

View source: R/splitGAlignmentsByCut.R

splitGAlignmentsByCutR Documentation

split bams into nucleosome free, mononucleosome, dinucleosome and trinucleosome

Description

use random forest to split the reads into nucleosome free, mononucleosome, dinucleosome and trinucleosome. The features used in random forest including fragment length, GC content, and UCSC phastCons conservation scores.

Usage

splitGAlignmentsByCut(
  obj,
  txs,
  genome,
  conservation,
  outPath,
  breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf),
  labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome",
    "inter3", "trinucleosome", "others"),
  labelsOfNucleosomeFree = "NucleosomeFree",
  labelsOfMononucleosome = "mononucleosome",
  trainningSetPercentage = 0.15,
  cutoff = 0.8,
  halfSizeOfNucleosome = 80L,
  summaryFun = mean
)

Arguments

obj

an object of GAlignments

txs

GRanges of transcripts

genome

an object of BSgenome

conservation

an object of GScores.

outPath

folder to save the splitted alignments. If outPath is setting, the return of the function will not contain seq and qual fields.

breaks

a numeric vector for fragment size of nucleosome free, mononucleosome, dinucleosome and trinucleosome. The breaks pre-defined here is following the description of Greenleaf's paper (see reference).

labels

a character vector for labels of the levels of the resulting category.

labelsOfNucleosomeFree, labelsOfMononucleosome

character(1). The label for nucleosome free and mononucleosome.

trainningSetPercentage

numeric(1) between 0 and 1. Percentage of trainning set from top coverage.

cutoff

numeric(1) between 0 and 1. cutoff value for prediction.

halfSizeOfNucleosome

numeric(1) or integer(1). Thre read length will be adjusted to half of the nucleosome size to enhance the signal-to-noise ratio.

summaryFun

Function to summarize genomic scores when more than one position is retrieved. This will greatly affect the CPU time.

Value

a list of GAlignments

Author(s)

Jianhong Ou

References

Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. and Greenleaf, W.J., 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12), pp.1213-1218.

Chen, K., Xi, Y., Pan, X., Li, Z., Kaestner, K., Tyler, J., Dent, S., He, X. and Li, W., 2013. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome research, 23(2), pp.341-351.

Examples

library(GenomicRanges)
bamfile <- system.file("extdata", "GL1.bam", 
                       package="ATACseqQC", mustWork=TRUE)
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
gal1 <- readBamFile(bamFile=bamfile, tag=tags, 
                    which=GRanges("chr1", IRanges(1, 1e6)), 
                    asMates=FALSE)
names(gal1) <- mcols(gal1)$qname
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(phastCons100way.UCSC.hg19)
splitGAlignmentsByCut(gal1, txs=txs, genome=Hsapiens, 
                      conservation=phastCons100way.UCSC.hg19)


jianhong/ATACseqQC documentation built on March 19, 2024, 8:30 a.m.