options(width = 400)

Introduction

TADBD is a fast and sensitive tool for detection of TAD boundaries on Hi-C contact matrix. A Haar-based algorithm is proposed to detect TAD boundaries on Hi-C contact matrix. In view of the geometry of TAD, a diagonal template is chosen to extract Haar feature of each point on the diagonal of contact matrix, by considering multi-scale aggregation at template size. Then the peaks on the average Haar feature value curve are located, and statistical filtering is performed to determine those significant ones as the final TAD boundaries. Furthermore, the feature extraction procedure is accelerated with the help of a compact integrogram.

The package contains three functions, DataLoad() , TADBD() and Output() with the function of loading data, detecting TAD boundaries and outputing the result. The input contact matrix can be dense or sparse, that is to say, the resulting files or memory objects of multiple contact matrix preparation tools and most Hi-C normalization approaches are acceptable. As output, the final detected TAD boundaries can be given in an optional form between text and graphics.

Getting Started

Installation

# if (!requireNamespace("BiocManager", quietly=TRUE))
#     install.packages("BiocManager")
# BiocManager::install("TADBD")
#install.packages("devtools") # if you have not installed "devtools" package
#devtools::install_github("bioinfo-lab/TADBD")
#library(TADBD)

Input data

working with a Hi-C matrix file

The input contact matrix can be dense or sparse.

sparse format

Below is an example of loading data when 'hicmat' is in sparse format

#Load R package TADBD
#library(TADBD)
#Configuration of the parameters, including species, chromsome and resolution
#species <- "hg19"
#chr <- "chr18"
#resolution <- 50000
#Close scientific notation
#options(scipen = 999)
#Specify Hi-C data to be loaded
#data(hicdata)
#Load a Hi-C contact matrix file in a sparse format
#hicmat <- DataLoad(hicdata, bsparse = TRUE, species, chr, resolution)

dense format

Below is an example of loading data when 'hicmat' is in dense format

#Load R package TADBD
library(TADBD)
#Configuration of the parameters, including species, chromsome and resolution
species <- "hg19"
chr <- "chr18"
resolution <- 50000
#Close scientific notation
options(scipen = 999)
#Specify Hi-C data to be loaded
data(hicdata)
#Load a Hi-C contact matrix file in a dense format
hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution)

Running TADBD

Once matrices that the function LoadData() outputs are in an acceptable format, TADBD() can be run with only one parameter. Below we show how to run the algorithm, and TADBD() outputs the the bin number of TAD boundaries on the contact matrix

#Load R package TADBD
library(TADBD)
#Configuration of the parameters, including species, chromsome and resolution
species <- "hg19"
chr <- "chr18"
resolution <- 50000
#Close scientific notation
options(scipen = 999)
#Specify Hi-C data to be loaded
data(hicdata)
#Load a Hi-C contact matrix file in a dense format
hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution)
#Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE
df_result <- TADBD(hicmat)

Detecting TAD boundaries

Our method is specifically designed to detect TAD boundaries. The function output() takes the bin number of the detected TAD boundaries as input, and outputs the TAD boundaries in two optional forms where one is two text files for detected TAD boundaries and intermediate peaks respectively, and the other is the two text files and a graphical heatmap. Below is an example of running the function output().

Output two text files, one is for detected TAD boundaries, the other for intermediate peaks

#Load R package TADBD
library(TADBD)
#Configuration of the parameters, including species, chromsome and resolution
species <- "hg19"
chr <- "chr18"
resolution <- 50000
#Close scientific notation
options(scipen = 999)
#Specify Hi-C data to be loaded
data(hicdata)
#Load a Hi-C contact matrix file in a dense format
hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution)
#Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE
df_result <- TADBD(hicmat)
#Output two text files, one is for detected TAD boundaries, the other for intermediate peaks
Output(df_result, species, chr, resolution, outxtfile="./result")

Output two text files and a heatmap

#Load R package TADBD
library(TADBD)
#Configuration of the parameters, including species, chromsome and resolution
species <- "hg19"
chr <- "chr18"
resolution <- 50000
#Close scientific notation
options(scipen = 999)
#Specify Hi-C data to be loaded
data(hicdata)
#Load a Hi-C contact matrix file in a dense format
hicmat <- DataLoad(hicdata, bsparse = FALSE, species, chr, resolution)
#Detect TAD boundaries on the loaded contact matrix using TADBD method using default parameter configuration, that is template.sizes = c(4,5,6), bstatfilter = TRUE
df_result <- TADBD(hicmat)
#Output two text files and a heatmap with TAD boundary tracks, the parameters of heatmap include starting and ending coordinates, as well as the color and the width of tracks
Output(df_result, species, chr, resolution, outxtfile="./result", bheatmap = TRUE, heatmapfile="./heatmap", hicmat, map_start=0, map_end=10000000, l_color="blue", l_width=2.5)

Reference

SessionInfo

sessionInfo()



bioinfo-lab/TADBD documentation built on March 15, 2020, 8:53 a.m.