R/extractfeature.R

Defines functions extractfeature

Documented in extractfeature

#' Extract features for certain genomic regions
#'
#' Extract features for certain genomic regions
#'
#' This function takes as input the recontructed matrix generated by SCATE and a list of genomic regions of interest by the user. It outputs a subset of matrix that overlaps or nearest to the given genomic regions of interest.
#' @param res Matrix outputted by SCATE.
#' @param region Dataframe of genomic region. First column: chromosome name. Second column: start position. Third column: end position.
#' @param mode Either 'overlap' or 'nearest'. If overlap, only bins that overlap with the regions of interest will be shown. If nearest, the nearest bin to each region will be shown. Order will be the same as the input genomic region.
#' @param folder A character value specifying the location where the BED files will be stored. If not NULL, the signal and location of each bin will be saved as BED files in 'folder'. Each cluster has its own BED file, and the name of the BED file is the same as the cluster name. These BED files can be uploaded to UCSC genome browser for visualization.
#' @return If foler is NULL, a subset of the input matrix. Otherwise nothing will be returned and the results will be saved to local folder.
#' @export
#' @import GenomicAlignments GenomicRanges parallel splines2 xgboost
#' @author Zhicheng Ji, Weiqiang Zhou, Wenpin Hou, Hongkai Ji* <whou10@@jhu.edu>
#' @examples
#' scateres <- data.frame(combine=seq_len(6))
#' rownames(scateres) <- paste0('chr1_',c(0:5)*200,'_',199+c(0:5)*200)
#' extractfeature(scateres,data.frame(seqnames='chr1',start=0,end=201))

extractfeature <- function(res,region,mode='overlap',folder=NULL) {
   
      gr <- strsplit(row.names(res),'_')
      gr <- do.call(rbind,gr)
      gr <- GRanges(seqnames=gr[,1],IRanges(start=as.numeric(gr[,2]),end=as.numeric(gr[,3])))
      region <- GRanges(seqnames=region[,1],IRanges(start=region[,2],end=region[,3]))
      if (mode=='overlap') {
            o <- as.matrix(findOverlaps(gr,region))
      } else {
            o <- nearest(gr,region)
      }
      id <- unique(o[,1])
      res <- res[id,]
      if (is.null(folder)) {
            res
      } else {
            dir.create(folder)
            for (cl in colnames(res)) {
                  d <- data.frame(chr=as.character(seqnames(gr[id])),start=start(gr)[id],end=end(gr)[id],n='bin',score=res[,cl])
                  write.table(d,file=paste0(folder,'/',cl,'.bed'),sep='\t',quote=FALSE,col.names = FALSE,row.names = FALSE)
            }
            NULL
      }
}

Try the SCATE package in your browser

Any scripts or data that you put into this service are public.

SCATE documentation built on Nov. 8, 2020, 5:56 p.m.