Description Usage Arguments Value Windows size Author(s) References Examples
Identify Topological Domains from a Hi-C Contact Matrix
1 2 3 4 5 6 7 8 |
data |
A TopDomData object, or the pathname to a normalized
Hi-C contact matrix file as read by |
window.size |
The number of bins to extend (as a non-negative integer). Recommended range is in 5, ..., 20. |
outFile |
(optional) The filename without extension of the three result files optionally produced. See details below. |
statFilter |
(logical) Specifies whether non-significant topological-domain boundaries should be dropped or not. |
... |
Additional arguments passed to |
debug |
If |
A named list of class TopDom
with data.frame elements
binSignal
, domain
, and bed
.
The binSignal
data frame (N-by-7) holds mean contact frequency,
local extreme, and p-value for every bin. The first four columns
represent basic bin information given by matrix file, such as
bin id (id
), chromosome(chr
), start coordinate (from.coord
),
and end coordinate (to.coord
) for each bin.
The last three columns (local.ext
, mean.cf
, and p-value
) represent
computed values by the TopDom algorithm.
The columns are:
id
: Bin ID
chr
: Chromosome
from.coord
: Start coordinate of bin
to.coord
: End coordinate of bin
local.ext
:
-1
: Local minima.
-0.5
: Gap region.
0
: General bin.
1
: Local maxima.
mean.cf
: Average of contact frequencies between lower and upper
regions for bin i = 1,2,...,N.
p-value
: Computed p-value by Wilcox rank sum test.
See Shin et al. (2016) for more details.
The domain
data frame (D-by-7):
Every bin is categorized by basic building block, such as gap, domain,
or boundary.
Each row indicates a basic building block.
The first five columns include the basic information about the block,
'tag' column indicates the class of the building block.
id
: Identifier of block
chr
: Chromosome
from.id
: Start bin index of the block
from.coord
: Start coordinate of the block
to.id
: End bin index of the block
to.coord
: End coordinate of the block
tag
: Categorized name of the block. Three possible blocks exists:
gap
domain
boundary
size
: size of the block
The bed
data frame (D-by-4) is a representation of the domain
data frame in the
BED file format.
It has four columns:
chrom
: The name of the chromosome.
chromStart
: The starting position of the feature in the chromosome.
The first base in a chromosome is numbered 0.
chromEnd
: The ending position of the feature in the chromosome.
The chromEnd
base is not included in the feature. For example,
the first 100 bases of a chromosome are defined as chromStart=0
,
chromEnd=100
, and span the bases numbered 0-99.
name
: Defines the name of the BED line. This label is displayed to
the left of the BED line in the
UCSC Genome Browser
window when the track is open to full display mode or directly to
the left of the item in pack mode.
If argument outFile
is non-NULL
, then the three elements (binSignal
,
domain
, and bed
) returned are also written to tab-delimited files
with file names ‘<outFile>.binSignal’, ‘<outFile>.domain’, and
‘<outFile>.bed’, respectively. None of the files have row names,
and all but the BED file have column names.
The window.size
parameter is by design the only tuning parameter in the
TopDom method and affects the amount of smoothing applied when calculating
the TopDom bin signals. The binning window extends symmetrically downstream
and upstream from the bin such that the bin signal is the average
window.size^2
contact frequencies.
For details, see Equation (1) and Figure 1 in Shin et al. (2016).
Typically, the number of identified TDs decreases while their average
lengths increase as this window-size parameter increases (Figure 2).
The default is window.size = 5
(bins), which is motivated as:
"Considering the previously reported minimum TD size (approx. 200 kb)
(Dixon et al., 2012) and our bin size of 40 kb, w[indow.size] = 5 is a
reasonable setting" (Shin et al., 2016).
Hanjun Shin, Harris Lazaris, and Gangqing Hu. R package, help, and code refactoring by Henrik Bengtsson.
Shin et al., TopDom: an efficient and deterministic method for identifying topological domains in genomes, Nucleic Acids Research, 44(7): e70, April 2016. DOI: 10.1093/nar/gkv1505, PMCID: PMC4838359, PMID: 26704975
Shin et al., R script ‘TopDom_v0.0.2.R’, 2017 (originally from
http://zhoulab.usc.edu/TopDom/
;
later available on https://github.com/jasminezhoulab/TopDom via
https://zhoulab.dgsom.ucla.edu/pages/software)
Shin et al., TopDom Manual, 2016-07-08 (original from
http://zhoulab.usc.edu/TopDom/TopDom%20Manual_v0.0.2.pdf
;
later available on https://github.com/jasminezhoulab/TopDom via
https://zhoulab.dgsom.ucla.edu/pages/software)
Hanjun Shin, Understanding the 3D genome organization in topological domain level, Doctor of Philosophy Dissertation, University of Southern California, March 2017, http://digitallibrary.usc.edu/cdm/ref/collection/p15799coll40/id/347735
Dixon JR, Selvaraj S, Yue F, Kim A, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature; 485(7398):376-80, April 2012. DOI: 10.1038/nature11082, PMCID: PMC3356448, PMID: 22495300.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | path <- system.file("exdata", package = "TopDom", mustWork = TRUE)
## Original count data (on a subset of the bins to speed up example)
chr <- "chr19"
pathname <- file.path(path, sprintf("nij.%s.gz", chr))
data <- readHiC(pathname, chr = chr, binSize = 40e3, bins = 1:500)
print(data) ## a TopDomData object
## Find topological domains using the TopDom method
fit <- TopDom(data, window.size = 5L)
print(fit) ## a TopDom object
## Display the largest domain
td <- subset(subset(fit$domain, tag == "domain"), size == max(size))
print(td) ## a data.frame
## Subset TopDomData object
data_s <- subsetByRegion(data, region = td, margin = 0.9999)
print(data_s) ## a TopDomData object
vp <- grid::viewport(angle = -45, width = 0.7, y = 0.3)
gg <- ggCountHeatmap(data_s)
gg <- gg + ggDomain(td, color = "#cccc00") + ggDomainLabel(td)
print(gg, newpage = TRUE, vp = vp)
gg <- ggCountHeatmap(data_s, colors = list(mid = "white", high = "black"))
gg_td <- ggDomain(td, delta = 0.08)
dx <- attr(gg_td, "gg_params")$dx
gg <- gg + gg_td + ggDomainLabel(td, vjust = 2.5)
print(gg, newpage = TRUE, vp = vp)
## Subset TopDom object
fit_s <- subsetByRegion(fit, region = td, margin = 0.9999)
print(fit_s) ## a TopDom object
for (kk in seq_len(nrow(fit_s$domain))) {
gg <- gg + ggDomain(fit_s$domain[kk, ], dx = dx * (4 + kk %% 2), color = "red", size = 1)
}
print(gg, newpage = TRUE, vp = vp)
gg <- ggCountHeatmap(data_s)
gg_td <- ggDomain(td, delta = 0.08)
dx <- attr(gg_td, "gg_params")$dx
gg <- gg + gg_td + ggDomainLabel(td, vjust = 2.5)
fit_s <- subsetByRegion(fit, region = td, margin = 0.9999)
for (kk in seq_len(nrow(fit_s$domain))) {
gg <- gg + ggDomain(fit_s$domain[kk, ], dx = dx * (4 + kk %% 2), color = "blue", size = 1)
}
print(gg, newpage = TRUE, vp = vp)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.