This package provides a normalization scheme, along with basic analysis and statistical visualization of HiC experimental data. The normalization workflow consists of the following steps:
The native environment for this package is the misha track database, i.e. start with an observed 2D track, generate ane expected 2D track and compute a score track. This process is CPU intensive and therefore requires SGE to be availabe, for distributed computing. However, inline functions which compute the expected and score values for a specific region are provided as well. The requirements are for all functions to work:
Since misha package is not in CRAN, we will need to install it directly as follows: Install the package:
install.packages("http://www.wisdom.weizmann.ac.il/~nettam/shaman/misha_3.4.3.tar.gz", repos=NULL) #Installs misha package from file
source("https://bioconductor.org/biocLite.R") biocLite("Gviz")
Download and install shaman:
devtools::install_bitubucket("tanaylab/shaman", ref='default', vignette=TRUE) library(shaman)
The following options are available for this package:
shuffle_hic_track(track_db="db", obs_track_nm="obs", work_dir="work_dir")
score_hic_track(track_db="db", work_dir="work_dir", score_track_nm="score_track", obs_track_nms=c("obs"))
point_score = gextract("score_track", region, colnames="score") plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))
point_score = shuffle_and_score_hic_mat(obs_track_nms="obs", interval=interval, work_dir="work_dir")
plot_map_score_with_annotations("hg19", point_score$points, region, misha_tracks=list("K56.k27ac", "rna-seq"), annotations=list("ctcf_pos", "ctcf_neg"), a_colors=c("#4572A7", "#AA4643"))
Relies on an existance of an expected (shuffled) 2d track. Builds a grid comprising of all combinations of intervals from feature 1 and feature 2 that fall within a band defined by min_dist and max_dist. For each point on the grid, look at th surrounding window, defined by range parameter. Discard all windows that do not contain a point with a score (defined in scotre_track_nm) above the score_filter parameter. This allows for focusing on potentially enriched pairs. Discect the window into small bins, size in base pairs defined by the resolution parameter, and count the number of observed contacts, and the number of expected contacts in each bin. All windows are then summed together, generating a single matrix of observed and expected contacts, which is returned by function. Note that grid contains only points in which feature 1 position is smaller than feature 2 position.
Create data grid for two sets of features, and visualize it:
grid = shaman_generate_feature_grid(feature1, feature2, obs_track, exp_track, range=25000, resolution=500) shaman_plot_feature_grid(grid, range=25000, grid_resolution=500, plot_resolution=1000)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.