This package provides a normalization scheme, along with basic analysis and statistical visualization of HiC experimental data. The normalization workflow consists of the following steps:

  1. Start with observed HiC data.
  2. Shuffle observed data to generate expected data.
  3. Compute the normalized score of each observed data point.
  4. Visualize normalized contact maps.
  5. Quantify contact enrichment acoording to features.

Installation

Requirements:

The native environment for this package is the misha track database, i.e. start with an observed 2D track, generate ane expected 2D track and compute a score track. This process is CPU intensive and therefore requires SGE to be availabe, for distributed computing. However, inline functions which compute the expected and score values for a specific region are provided as well. The requirements are for all functions to work:

Installing misha package:

Since misha package is not in CRAN, we will need to install it directly as follows: Install the package:

install.packages("http://www.wisdom.weizmann.ac.il/~nettam/shaman/misha_3.4.3.tar.gz", repos=NULL) #Installs misha package from file

Installing Gviz from bioconductor:

source("https://bioconductor.org/biocLite.R")
biocLite("Gviz")

Installing shaman package:

Download and install shaman:

devtools::install_bitubucket("tanaylab/shaman", ref='default', vignette=TRUE)
library(shaman)

Options

The following options are available for this package:

system parameters

Workflow

Distributed mode (based on sun grid engine)

shuffle_hic_track(track_db="db", obs_track_nm="obs", work_dir="work_dir")
score_hic_track(track_db="db", work_dir="work_dir", score_track_nm="score_track", obs_track_nms=c("obs"))
point_score = gextract("score_track", region, colnames="score")
plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))

Example of normalized map with annotations

Local mode

point_score = shuffle_and_score_hic_mat(obs_track_nms="obs", interval=interval, work_dir="work_dir")
plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))

Spatial contact enrichment - used to measure contact enrichemnt between two sets of genomic features.

Relies on an existance of an expected (shuffled) 2d track. Builds a grid comprising of all combinations of intervals from feature 1 and feature 2 that fall within a band defined by min_dist and max_dist. For each point on the grid, look at th surrounding window, defined by range parameter. Discard all windows that do not contain a point with a score (defined in scotre_track_nm) above the score_filter parameter. This allows for focusing on potentially enriched pairs. Discect the window into small bins, size in base pairs defined by the resolution parameter, and count the number of observed contacts, and the number of expected contacts in each bin. All windows are then summed together, generating a single matrix of observed and expected contacts, which is returned by function. Note that grid contains only points in which feature 1 position is smaller than feature 2 position.

Create data grid for two sets of features, and visualize it:

grid = shaman_generate_feature_grid(feature1, feature2, obs_track, exp_track, range=25000, resolution=500)
shaman_plot_feature_grid(grid, range=25000, grid_resolution=500, plot_resolution=1000)

Example of spatial contact enrichment for converging CTCF pairs: log10(obs) on the left, log2(obs/exp) on the right



tanaylab/shaman documentation built on April 2, 2022, 1:32 a.m.