TADpole: Call hierarchical TADs

Description Usage Arguments Details Value Examples

View source: R/TADpole.R

Description

Computes a constrained hierarchical clustering of genomic regions in a HiC experiment, choosing the optimal amount of information from the HiC matrix and selecting the most informative number of TADs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
TADpole(
  mat_file,
  max_pcs = 200,
  min_clusters = 2,
  bad_frac = 0.01,
  chr,
  start,
  end,
  resol,
  centromere_search = FALSE
)

Arguments

mat_file

path to the input file. Must be in a tab-delimited matrix format.

max_pcs

The maximum number of principal components to retain for the analysis.

min_clusters

Minimum number of clusters into which partition the chromosome.

bad_frac

fraction of the matrix to falg as bad rows/columns.

chr

string with the chromosome name.

start

numeric start position of the region or of the chromosome.

end

numeric end position of the region or of the chromosome.

resol

numeric resolution/binning of the Hi-C experiment.

centromere_search

split the matrix by the centrormere into two smaller matrices representing the chromosomal arms. Useful when woring with big (>15000 bins) datasets.

Details

The 'centromere_search' parameter will split the matrix into two by the region with the longes stretch of bad (low signal) rows/columns. It will do so regardless of whether this stretch represents a true centromere or not. Note that this feature is useful when processing an entire chromosome, but be cautious of interpreting the partitions as the two chromosomal arms (p and q) when working with smaller regions.

Value

'tadpole' object that defines the clustering of genomic regions.

Examples

1
2
mat_file <- system.file("inst/extdata", "raw_chr18:460-606_20kb.tsv", package = "TADpole")
tadpole <- TADpole(mat_file, chr = "chr18", start = 496, end = 606, resol = 20000)

3DGenomes/TADpole documentation built on Jan. 30, 2020, 8:17 p.m.