dozmorovlab/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.

Getting started

Package details

Bioconductor views Classification Clustering FeatureExtraction FunctionalGenomics HiC Sequencing Software
LicenseMIT + file LICENSE
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
dozmorovlab/preciseTAD documentation built on April 20, 2021, 11:01 a.m.