shaman_shuffle_hic_mat_for_track: Generate an expected matrix from observed data as a process...

shaman_shuffle_hic_mat_for_trackR Documentation

Generate an expected matrix from observed data as a process for generating an expected track

Description

shuffle_hic_mat_for_track

Usage

shaman_shuffle_hic_mat_for_track(
  track_db,
  track,
  work_dir,
  chrom,
  start1,
  end1,
  start2,
  end2,
  min_dist = 1024,
  max_dist = max(gintervals.all()$end),
  dist_resolution = NA,
  decay_smooth = NA,
  proposal_iterations = 10000000,
  shuffle = 80,
  hic_mcmc_max_resolution = 400,
  raw_ext = "raw",
  shuffled_ext = "shuffled",
  grid_small = 500000,
  grid_high = 1000000,
  grid_increase = 500000,
  grid_step_iter = 40,
  sort_uniq = FALSE
)

Arguments

track_db

Directory of the misha database.

track

Name of observed 2D genomic track for the hic data.

work_dir

Centralized directory to store temporary files.

chrom

The chormosome of the matrix.

start1

The start coordinate of the first dimension.

end1

The end coordinate of the first dimension.

start2

The start coordinate of the second dimension.

end2

The end coordinate of the second dimension.

min_dist

The minimum distance between contact end points.

max_dist

The maximum distance between contact end points.

dist_resolution

Number of bins in each log2 distance unit. If NA, value is determined based on observed data (recommended).

decay_smooth

Number of bins to use for smoothing the MCMC target function: the decay curve. If NA, value is determined based on observed data (recommended).

proposal_iterations

Number of MCMC sampling iterations between proposal corrections.

shuffle

Number of shuffling rounds for each observed point.

hic_mcmc_max_resolution

Maximum number of bins for each log2 unit

raw_ext

File extension of the observed data.

shuffled_ext

File extension of the shuffled data.

grid_small

Initial size of maximum distance between contact pairs consdered for switching

grid_high

Final size of maximum distance between contact pairs consdered for switching

grid_increase

Grid increase size

grid_step_iter

Number of iterations in each grid size

sort_uniq

Binary flag, indicating whether the shuffled matrix file should be sorted and contacts combined. This is required prior to importing the track to misha, and should be applied to full chromosomes only.

Details

This function generates an expected 2D hic matrix from observed hic data. Should not be called externally, The observed data is a combination of observed contacts in scope plus already shuffled near-cis contacts (stored in work_dir) which we sample from to maintain the decay probability curve.

Each step creates temporary files of the shuffled matrices which are then joined to a track. Temporary files are deleted upon track creation.


tanaylab/shaman documentation built on April 2, 2022, 1:32 a.m.