In tanaylab/shaman: shaman - Sampling HiC contAct Matrices for Aparametric Normalization

This package provides a normalization scheme, along with basic analysis and statistical visualization of HiC experimental data. The normalization workflow consists of the following steps:

Start with observed HiC data.
Shuffle observed data to generate expected data.
Compute the normalized score of each observed data point.
Visualize normalized contact maps.
Quantify contact enrichment acoording to features.

Installation

Requirements:

The native environment for this package is the misha track database, i.e. start with an observed 2D track, generate ane expected 2D track and compute a score track. This process is CPU intensive and therefore requires SGE to be availabe, for distributed computing. However, inline functions which compute the expected and score values for a specific region are provided as well. The requirements are for all functions to work:

Perl
R packages:
- devtools.
- misha.
- RANN.
- data.table.
- ggplot2
- reshape2
- grid
- Gviz

Installing misha package:

Since misha package is not in CRAN, we will need to install it directly as follows: Install the package:

install.packages("http://www.wisdom.weizmann.ac.il/~nettam/shaman/misha_3.4.3.tar.gz", repos=NULL) #Installs misha package from file

Installing Gviz from bioconductor:

source("https://bioconductor.org/biocLite.R")
biocLite("Gviz")

Installing shaman package:

Download and install shaman:

devtools::install_bitubucket("tanaylab/shaman", ref='default', vignette=TRUE)
library(shaman)

Options

The following options are available for this package:

system parameters

shaman.sge_support = 0 - Set to 1 if Sun Grid Engine is available for distributed computing.
shaman.sge_flags = "" - SGE paramters to be passed to qsub (e.g. threads, queue, etc').

Workflow

Distributed mode (based on sun grid engine)

Set configuration parameter "shaman.sge_support" to be 1.
Generate expected HiC misha 2D track

shuffle_hic_track(track_db="db", obs_track_nm="obs", work_dir="work_dir")

Generate score 2D track

score_hic_track(track_db="db", work_dir="work_dir", score_track_nm="score_track", obs_track_nms=c("obs"))

Visualize region

point_score = gextract("score_track", region, colnames="score")
plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))

Example of normalized map with annotations

Local mode

Generating an expected matrix and computing its score for a given interval: a local computation is possible on a small region:

point_score = shuffle_and_score_hic_mat(obs_track_nms="obs", interval=interval, work_dir="work_dir")

Visualize region

plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))

Spatial contact enrichment - used to measure contact enrichemnt between two sets of genomic features.

Relies on an existance of an expected (shuffled) 2d track. Builds a grid comprising of all combinations of intervals from feature 1 and feature 2 that fall within a band defined by min_dist and max_dist. For each point on the grid, look at th surrounding window, defined by range parameter. Discard all windows that do not contain a point with a score (defined in scotre_track_nm) above the score_filter parameter. This allows for focusing on potentially enriched pairs. Discect the window into small bins, size in base pairs defined by the resolution parameter, and count the number of observed contacts, and the number of expected contacts in each bin. All windows are then summed together, generating a single matrix of observed and expected contacts, which is returned by function. Note that grid contains only points in which feature 1 position is smaller than feature 2 position.

Create data grid for two sets of features, and visualize it:

grid = shaman_generate_feature_grid(feature1, feature2, obs_track, exp_track, range=25000, resolution=500)
shaman_plot_feature_grid(grid, range=25000, grid_resolution=500, plot_resolution=1000)

Example of spatial contact enrichment for converging CTCF pairs: log10(obs) on the left, log2(obs/exp) on the right

tanaylab/shaman documentation built on April 2, 2022, 1:32 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tanaylab/shaman
shaman - Sampling HiC contAct Matrices for Aparametric Normalization

In tanaylab/shaman: shaman - Sampling HiC contAct Matrices for Aparametric Normalization

Installation

Requirements:

Installing misha package:

Installing Gviz from bioconductor:

Installing shaman package:

Options

system parameters

Workflow

Distributed mode (based on sun grid engine)

Local mode

Spatial contact enrichment - used to measure contact enrichemnt between two sets of genomic features.

R Package Documentation

Browse R Packages

We want your feedback!

tanaylab/shaman shaman - Sampling HiC contAct Matrices for Aparametric Normalization

In tanaylab/shaman: shaman - Sampling HiC contAct Matrices for Aparametric Normalization

Installation

Requirements:

Installing misha package:

Installing Gviz from bioconductor:

Installing shaman package:

Options

system parameters

Workflow

Distributed mode (based on sun grid engine)

Local mode

Spatial contact enrichment - used to measure contact enrichemnt between two sets of genomic features.

R Package Documentation

Browse R Packages

We want your feedback!

tanaylab/shaman
shaman - Sampling HiC contAct Matrices for Aparametric Normalization