README.md

(Check CALDER2 with mutiple updates here: https://github.com/CSOgroup/CALDER2)

CALDER user manuel

CALDER is a Hi-C analysis tool that allows: (1) compute chromatin domains from whole chromosome contacts; (2) derive their non-linear hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each chromatin domain from short-range contacts. CALDER is currently implemented in R.

Alt text

(A note on the performance of Calder vs PC-based approach)

Alt text

Installation

Make sure all dependencies have been installed:

Clone its repository and install it from source:

git clone https://github.com/CSOgroup/CALDER.git

install.packages(path_to_CALDER, repos = NULL, type="source") ## install from the cloned source file

Please contact yliueagle@googlemail.com for any questions about installation.

install CALDER and dependencies automaticly:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("GenomicRanges")
install.packages("remotes")
remotes::install_github("CSOgroup/CALDER")

Usage

The input data of CALDER is a three-column text file storing the contact table of a full chromosome (zipped format is acceptable, as long as it can be read by data.table::fread). Each row represents a contact record pos_x, pos_y, contact_value, which is the same format as that generated by the dump command of juicer (https://github.com/aidenlab/juicer/wiki/Data-Extraction):

16050000    16050000    10106.306
16050000    16060000    2259.247
16060000    16060000    7748.551
16050000    16070000    1251.3663
16060000    16070000    4456.1245
16070000    16070000    4211.7393
16050000    16080000    522.0705
16060000    16080000    983.1761
16070000    16080000    1996.749
...

A demo dataset is included in the repository CALDER/inst/extdata/mat_chr22_10kb_ob.txt.gz and can be accessed by system.file("extdata", "mat_chr22_10kb_ob.txt.gz", package='CALDER') once CALDER is installed. This data contains contact values of GM12878 on chr22 binned at 10kb (Rao et al. 2014)

CALDER contains three modules: (1) compute chromatin domains; (2) derive their hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each compartment domain.

To run three modules in a single step:

CALDER_main(contact_mat_file, 
            chr, 
            bin_size, 
            out_dir, 
            sub_domains=TRUE, 
            save_intermediate_data=FALSE,
            genome='hg19')

To run three modules in seperated steps:

# This will not compute sub-domains, but save the intermediate_data that can be used to compute sub-domains latter on
CALDER_main(contact_mat_file, 
            chr, 
            bin_size, 
            out_dir, 
            sub_domains=FALSE, 
            save_intermediate_data=TRUE,
            genome='hg19') 

# (optional depends on needs) Compute sub-domains using intermediate_data_file that was previous saved in the out_dir (named as chrxx_intermediate_data.Rds)
CALDER_sub_domains(intermediate_data_file, 
                   chr, 
                   out_dir, 
                   bin_size) 

Paramters:

Output:

chrxx_domain_hierachy.tsv

chrxx_sub_compartments.bed

chrxx_domain_boundaries.bed

chrxx_nested_boundaries.bed

chrxx_intermediate_data.Rds

chrxx_log.txt, chrxx_sub_domains_log.txt

Running time:

For the computational requirement, running CALDER on the GM12878 Hi-C dataset at bin size of 40kb took 36 minutes to derive the chromatin domains and their hierarchy for all chromosomes (i.e., CALDER Step1 and Step2); 13 minutes to derive the nested sub-domains (i.e., CALDER Step3). At the bin size of 10kb, it took 1 h 44 minutes and 55 minutes correspondingly (server information: 40 cores, 64GB Ram, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz). The evaluation was done using a single core although CALDER can be run in a parallel manner.

Demo run:

library(CALDER)

contact_mat_file = system.file("extdata", "mat_chr22_10kb_ob.txt.gz", package = 'CALDER')

CALDER_main(contact_mat_file, chr=22, bin_size=10E3, out_dir='./GM12878', sub_domains=TRUE, save_intermediate_data=FALSE)

The saved .bed files can be view directly through IGV:

Alt text

Citation

If you use CALDER in your work, please cite: https://www.nature.com/articles/s41467-021-22666-3

Contact information



CSOgroup/CALDER documentation built on June 6, 2023, 10:22 p.m.