View source: R/splitGAlignmentsByCut.R
splitGAlignmentsByCut | R Documentation |
use random forest to split the reads into nucleosome free, mononucleosome, dinucleosome and trinucleosome. The features used in random forest including fragment length, GC content, and UCSC phastCons conservation scores.
splitGAlignmentsByCut(
obj,
txs,
genome,
conservation,
outPath,
breaks = c(0, 100, 180, 247, 315, 473, 558, 615, Inf),
labels = c("NucleosomeFree", "inter1", "mononucleosome", "inter2", "dinucleosome",
"inter3", "trinucleosome", "others"),
labelsOfNucleosomeFree = "NucleosomeFree",
labelsOfMononucleosome = "mononucleosome",
trainningSetPercentage = 0.15,
cutoff = 0.8,
halfSizeOfNucleosome = 80L,
summaryFun = mean
)
obj |
an object of GAlignments |
txs |
GRanges of transcripts |
genome |
an object of BSgenome |
conservation |
an object of GScores. |
outPath |
folder to save the splitted alignments. If outPath is setting, the return of the function will not contain seq and qual fields. |
breaks |
a numeric vector for fragment size of nucleosome free, mononucleosome, dinucleosome and trinucleosome. The breaks pre-defined here is following the description of Greenleaf's paper (see reference). |
labels |
a character vector for labels of the levels of the resulting category. |
labelsOfNucleosomeFree , labelsOfMononucleosome |
character(1). The label for nucleosome free and mononucleosome. |
trainningSetPercentage |
numeric(1) between 0 and 1. Percentage of trainning set from top coverage. |
cutoff |
numeric(1) between 0 and 1. cutoff value for prediction. |
halfSizeOfNucleosome |
numeric(1) or integer(1). Thre read length will be adjusted to half of the nucleosome size to enhance the signal-to-noise ratio. |
summaryFun |
Function to summarize genomic scores when more than one position is retrieved. This will greatly affect the CPU time. |
a list of GAlignments
Jianhong Ou
Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. and Greenleaf, W.J., 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12), pp.1213-1218.
Chen, K., Xi, Y., Pan, X., Li, Z., Kaestner, K., Tyler, J., Dent, S., He, X. and Li, W., 2013. DANPOS: dynamic analysis of nucleosome position and occupancy by sequencing. Genome research, 23(2), pp.341-351.
library(GenomicRanges)
bamfile <- system.file("extdata", "GL1.bam",
package="ATACseqQC", mustWork=TRUE)
tags <- c("AS", "XN", "XM", "XO", "XG", "NM", "MD", "YS", "YT")
gal1 <- readBamFile(bamFile=bamfile, tag=tags,
which=GRanges("chr1", IRanges(1, 1e6)),
asMates=FALSE)
names(gal1) <- mcols(gal1)$qname
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txs <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(phastCons100way.UCSC.hg19)
splitGAlignmentsByCut(gal1, txs=txs, genome=Hsapiens,
conservation=phastCons100way.UCSC.hg19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.