preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.

Package details

AuthorSpiro Stilianoudakis [aut, cre], Mikhail Dozmorov [aut]
Bioconductor views Classification Clustering FeatureExtraction FunctionalGenomics HiC Sequencing Software
MaintainerSpiro Stilianoudakis <stilianoudasc@vcu.edu>
LicenseMIT + file LICENSE
Version1.0.0
URL https://github.com/dozmorovlab/preciseTAD
Package repositoryView on Bioconductor
Installation Install the latest version of this package by entering the following in R:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("preciseTAD")

Try the preciseTAD package in your browser

Any scripts or data that you put into this service are public.

preciseTAD documentation built on Nov. 8, 2020, 6:51 p.m.