CALDER is currently written in R.
git clone https://github.com/YuanlongLiu/CALDER.git
install.packages(path_to_CALDER, repos = NULL, type="source")
## install from the cloned source file
Please contact yuanlong.liu@unil.ch with any questions about installation.
The input data of CALDER is a three-column text file storing the contact information of a full chromosome (zipped format is acceptable, as long it can be read by data.table::fread
). Each row represents a contact record pos_x, pos_y, contact_value
, which is the same format as that generated by the dump
command of juicer https://github.com/aidenlab/juicer/wiki/Data-Extraction:
16050000 16050000 10106.306
16050000 16060000 2259.247
16060000 16060000 7748.551
16050000 16070000 1251.3663
16060000 16070000 4456.1245
16070000 16070000 4211.7393
16050000 16080000 522.0705
16060000 16080000 983.1761
16070000 16080000 1996.749
...
A demo dataset is included in the repository CALDER/inst/extdata/mat_chr22_10kb_ob.txt.gz
and can be accessed by system.file("extdata", "mat_chr22_10kb_ob.txt.gz", package='CALDER')
when CALDER is installed. This data contains contact values of GM12878 on chr22 (Rao et al. 2014)
CALDER contains three modules, (1) compute compartment domains; (2) derive their hierarchical organization and obtain sub-compartments; (3) compute nested sub-domains within each compartment domain.
CALDER_main(contact_mat_file, chr, bin_size, out_dir, sub_domains=TRUE, save_intermediate_data=FALSE)
CALDER_main(contact_mat_file, chr, bin_size, out_dir, sub_domains=FALSE, save_intermediate_data=TRUE) ## do not compute sub-domains, but save the intermediate_data that can be used to compute sub-domains latter on
CALDER_sub_domains(intermediate_data, chr, out_dir) ## (optional depends on needs) compute sub-domains using intermediate_data that was previous saved
contact_mat_file
: path to the contact matrix of a chromosomechr
: chromosome number. Either numeric or character, will be added to the output namebin_size
: numeric, the size of a bin in consistent with the contact matrix, numericout_dir
: the output directorysub_domains
: logical, whether to compute nested sub-domainssave_intermediate_data
: logical. If TRUE, an intermediate_data will be saved. This file can be used for computing nested sub-domains later oncompartment_label
, for example, B.2.2.2
and B.2.2.1
are two sub-branches of B.2.2
. The pos_end
column specifies all compartment domain borders, except when it is marked as gap
, which indicates it is the border of a gap chromsome region that has too few contacts and was excluded from the analysis (e.g., due to low mappability, deletion, technique flaw) For the computational requirement, running CALDER on the GM12878 Hi-C dataset at bin size of 40kb took 36 minutes to derive the compartment domains and their hierarchy for all chromosomes (i.e., CALDER Step1 and Step2); 13 minutes to derive the nested sub-domains (i.e., CALDER Step3). At the bin size of 10kb, it took 1 h 44 minutes and 55 minutes correspondingly (server information: 40 cores, 64GB Ram, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz). The evaluation was based on using a single core although CALDER can be run in a parallel manner.
library(CALDER)
contact_mat_file = system.file("extdata", "mat_chr22_10kb_ob.txt.gz", package = package_name)
CALDER_main(contact_mat_file, chr=22, bin_size=10E3, out_dir='./', sub_domains=TRUE, save_intermediate_data=FALSE)
The saved .bed files can be view directly:
If you use CALDER in your work, please cite: [ref to be added]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.