Authors: Rafal Zaborowski, Bartek Wilczynski
Institution: Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland
License: MIT + file LICENSE
For more information please contact r.zaborowski@mimuw.edu.pl or bartek@mimuw.edu.pl
DIADEM is R package for differential analysis of Hi-C data. It takes advantage of significant correlations of main diagonals between different Hi-C data sets (cell lines, experiments, etc.). The number of diagonals (maximum genomic distance between interacting regions) depends on chromosome and data quality but usually will equal to about 5% of total number of bins in given chromosome contact map. DIADEM uses GLM to model relationship between corresponding cells of a pair of Hi-C datasets at given genomic distance and then quantifies deviatons from the model in probabilistic way. The only required input are raw Hi-C contact map files in numpy npz format.
For more details, examples and quick start refer to vignette (invoke browseVignettes(package="DIADEM")
). You can also browse documentation of individual functions or objects within the package using standard R syntax (i.e.: help(foo)
or ?foo
) or have a look at reference manual - to produce it invoke from command line R CMD Rd2pdf path-to-package-directory
specifying path to where the package has been installed, usually something like ~/R/x86_64-pc-linux-gnu-library/3.6.2/DIADEM. This will create reference manual file DIADEM.pdf in directory where you invoked building command.
The indepth description of our model together with detailed analysis and motivation is described in manuscript available at: https://www.biorxiv.org/content/10.1101/654699v3.
The code is written in R, but data storage is done with numpy, so main requirements are (versions for which tests were performed are given in parenthesis):
Additionally following R packages are required:
Following additional packages are required to run examples, make plotting and to build vignette:
NOTE: Some of the above R packages require GSL (GNU Scientific Library). Before installation make sure that libgsl-dev is installed (sudo apt-get install libgsl-dev
on Ubuntu).
Two ways of installation are possible (both require R package devtools to be installed):
from github repository:
r
devtools::install_github("rz6/DIADEM", build_vignettes = TRUE)
from source: clone (:warning: NOTE: it must be cloned with --recursive flag, i.e.: git clone --recursive https://github.com/rz6/DIADEM.git
) repository - by default to directory: diadem, cd to directory containing cloned repo, open R and run:
r
devtools::install("diadem", build_vignettes = TRUE)
:warning: This repository contains submodule, which must be cloned as well for package to compile. Therefore this repository MUST be cloned with --recursive flag.
Import DIADEM package and list functions inside it:
library("DIADEM")
getNamespaceExports("DIADEM")
A good introduction with some examples and more precise description may be found in vignette. To print it call:
browseVignettes(package="DIADEM")
If the above will not render the vignette you can find it in your package installation directory (typically something like ~/R/x86_64-pc-linux-gnu-library/3.6.2/DIADEM) under doc/DIADEM.html path.
DIADEM package contains sample Hi-C contact map as R built-in dataset.
It can be accessed as shown below:
Hi-C contact maps in sparse format
```r library("DIADEM")
data(sample_hic_data, package = "DIADEM") msc.df <- sample_hic_data[["MSC-HindIII-1"]]
mtx.fname.msc <- file.path(tempdir(), "MSC-HindIII-1_40kb-raw.npz")
chr.sizes <- sample_hic_data[["chromosome.sizes"]]
l <- lapply(names(msc.df), function(chromosome) sparse2dense(dat2[[chromosome]], N = chr.sizes[[chromosome]])) names(l) <- names(msc.df)
save_npz(l, mtx.fname.msc) ```
Reading Hi-C matrices from npz file
```r
sparse.msc <- read_npz(mtx.fname.msc)
dense.msc <- read_npz(mtx.fname.msc, sparse.format = FALSE) ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.