DynaMO: Predicts spatiotemporal binding activities of transcription...
In spo111/DynaMO: DynaMO predicts spatiotemporal binding of transcription factors

Description Usage Arguments Details Value Author(s) Examples

DynaMO predicts transcription factor (TF) binding sites by learning the spatial distribution of epigenomic data with a random forest model. In a time-course experiment, DynaMO also predicts the temporal binding patterns of TFs at predicted binding sites. DynaMO identifies important TFs in a dynamic process.

1 2	DynaMO(readlist, peak, motif, mode="l", core=1, readsmem=TRUE, readlen=150, readformat="bam", motiflen=300, motifbin=20, cluster = 0, fdrcut=0.01,batch=TRUE)

`readlist`	A list of data.frame objects where each row is a path to a file storing aligned reads. Each data.frame stores one chromatin mark.
`peak`	A list object with length equal to the number of histone markers. Each element is a list object with length equal to the number of samples. Each sub-element is a GRanges object storing locations of peaks of histone markers.
`motif`	A vector object with length equal to the number of motifs. Each element is a path to a file storing locations of motif sites.
`mode`	"s" or "l". "l" means all motif sites are examined by random forest models and evaluated as real binding sites. "s" means only motif sites overlapping histone marker peaks are examined and evaluated. "s" is useful when the number of motifs and the number of motif sites are large.
`core`	The number of cores to use.
`readsmem`	A logcial. If TRUE, aligned reads are imported once and kept in memory. It requires large memory but saves time.
`readlen`	The length to be extended from the 5' ends of reads.
`readformat`	"bam" or "txt". The format of aligned reads can be a .txt file or a .bam file.
`motiflen`	An integer. The length to be extended from the centers of motif sites.
`motifbin`	An integer. The length of each bin.
`cluster`	An integer. The number of clusters for K-means clustering. 0 means automatic determination of number of clusters.
`fdrcut`	A numeric value. Motif sites with adjusted p values lower than fdrcut are used for clustering.
`batch`	A logcial. If TRUE, motifs are divided into batches for processing. It is recommended when the genome is big, such as human and mouse and there are many motifs to examine.

This function integrates the complete pipeline for analyzing spatiotemporal binding activities of transcription factors. It handles both simple cases, such as one sample or one motif, and complex cases, such as multiple samples and motifs. It imports location information of peak and motif sites and aligned reads and outputs probabilities of each motif in each sample as false discovery rates and inner products. When multiple samples are provided, it also outputs clustering results of motif sites and enrichment of motifs in each cluster.

Read counts at motif sites are written in files named "motif_[motif index]_ readcount.txt". Each row is a motif site and named by the index of motif sites. Each column is a sample. False discovery rates are written in files named "motif_[motif index]_fdr.txt". Rows are motif sites with the same order as readcount files and columns represent samples. Inner products are written in files named "motif_[motif index]_innerprod.txt". Rows are motif sites with the same order as readcount files and columns represent samples. Clustering information is written in a file named "motif_filter_reads_cluster .txt". Each row is a motif site and each column is a sample. The last three columns are cluster id, motif id and motif site id. Motif enrichment information is written in a file named "motif_cluster_ enrichment.txt". Each row represents a motif site and each column represents either counts of motif sites, fold changes, p values and adjusted p values from one sample.

Zheng Kuang

#Import reads, motif sites and call peaks first
library(BayesPeak)
histonelist=vector("list",length=2)
histonelist[[1]]=data.frame(c(system.file("extdata","read1.txt"
,package="DynaMO",mustWork=TRUE),system.file("extdata","read2.txt"
,package="DynaMO",mustWork=TRUE)))
histonelist[[2]]=data.frame(c(system.file("extdata","read3.txt"
,package="DynaMO",mustWork=TRUE),system.file("extdata","read4.txt"
,package="DynaMO",mustWork=TRUE)))
motif=vector(length=5)
for(i in 1:5){
	motif[i]=system.file("extdata",paste("motif_",i,".txt",sep=""
	),package="DynaMO",mustWork=TRUE)
}
peak=vector("list",length=2)
for(i in 1:2){
    peak[[i]]=vector("list",length=nrow(histonelist[[i]]))
    for(j in 1:nrow(histonelist[[i]])){
        tempreads=get.reads(histonelist[[i]][j,1],150,"txt")
        tempreads1=as.data.frame(tempreads)
        tempreads2=cbind(tempreads1[,1:3],tempreads1[,5])
        colnames(tempreads2)=c("chr","start","end","strand")
        temppeak=summarize.peaks(bayespeak(tempreads2))
        peak[[i]][[j]]=GRanges(seqnames=Rle(as.character(space
        (temppeak))),ranges=IRanges(start=start(temppeak),
        end=end(temppeak)))
    }
}
#DynaMO function
DynaMO(histonelist,peak,motif,"l",2,readsmem=TRUE,150,"txt",
250,20,0,0.01)