View source: R/compartmentalize.R
compartmentalize | R Documentation |
Use contact matrices to identify self-interacting genomic compartments
compartmentalize(data, centers=2, dist.correct=TRUE,
cov.correct=TRUE, robust.cov=5, ...)
data |
an InteractionSet object containing bin pair data, like that produced by |
centers |
an integer scalar, specifying the number of clusters to form in |
dist.correct |
a logical scalar, indicating whether abundances should be corrected for distance biases |
cov.correct |
a logical scalar, indicating whether abundances should be corrected for coverage biases |
robust.cov |
a numeric scalar, specifying the multiple of MADs beyond which coverage outliers are removed |
... |
other arguments to pass to |
This function uses the interaction space to partition each linear chromosome into compartments. Bins in the same compartment interact more frequently with each other compared to bins in different compartments. This forms a checkerboard-like pattern in the interaction space that can be used to define the genomic intervals in each compartment. Typically, one compartment is gene-rich and is defined as “open”, while the other is gene-poor and defined as “closed”.
Compartment identification is done by setting up a ContactMatrix object, where each row/rolumn represents a bin and each matrix entry contains the frequency of contacts between bins.
Bins (i.e., rows) with similar interaction profiles (i.e., entries across columns) are clustered together with the k-means method.
Those with the same ID in the output compartment
vector are in the same compartment.
Note that clustering is done separately for each chromosome, so bins with the same ID across different chromosomes cannot be interpreted as being in the same compartment.
If dist.correct=TRUE
, frequencies are normalized to mitigate the effect of distance and to improve the visibility of long-range interactions.
This is done by computing the residuals of the distance-dependent trend - see filterTrended
for more details.
If cov.correct=TRUE
, frequencies are also normalized to eliminate coverage biases betwen bins.
This is done by computing the average coverage of each row/column, and dividing each matrix entry by the square root averages of the relevant row and column.
Extremely low-coverage regions such as telomeres and centromeres can confound k-means clustering.
To protect against this, all bins with (distance-corrected) coverages that are more than robust.cov
MADs away from the median coverage of each chromosome are identified and removed.
These bins will be marked with NA
in the returned compartment
for that chromosome.
To turn off robustification, set robust.cov
to NA
.
By default, centers
is set to 2 to model the open and closed compartments.
While a larger value can be used to obtain more clusters, care is required as the interpretation of the resulting compartments becomes more difficult.
If desired, users can also apply their own clustering methods on the matrix
returned in the output.
A named list of lists is returned where each internal list corresponds to a chromosome in data
and contains compartment
, an integer vector of compartment IDs for all bins in that chromosome; and matrix
, a ContactMatrix object containing (normalized) contact frequencies for the intra-chromosomal space.
Entries in compartment
are named according to the matching index of regions(data)
.
Aaron Lun
Lieberman-Aiden E et al. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-293.
Lajoie BR, Dekker J, Kaplan N (2014). The hitchhiker's guide to Hi-C analysis: practical guidelines. Methods 72, 65-75.
squareCounts
,
filterTrended
,
kmeans
# Dummying up some data.
set.seed(3426)
npts <- 100
npairs <- 5000
nlibs <- 4
anchor1 <- sample(npts, npairs, replace=TRUE)
anchor2 <- sample(npts, npairs, replace=TRUE)
data <- InteractionSet(
list(counts=matrix(rpois(npairs*nlibs, runif(npairs, 10, 100)), nrow=npairs)),
GInteractions(anchor1=anchor1, anchor2=anchor2,
regions=GRanges(c(rep("chrA", 80), rep("chrB", 20)),
IRanges(c(1:80, 1:20), c(1:80, 1:20))), mode="reverse"),
colData=DataFrame(totals=runif(nlibs, 1e6, 2e6)), metadata=List(width=1))
data <- unique(data)
# Running compartmentalization.
out <- compartmentalize(data)
head(out$chrA$compartment)
dim(out$chrA$matrix)
head(out$chrB$compartment)
dim(out$chrB$matrix)
test <- compartmentalize(data, cov.correct=FALSE)
test <- compartmentalize(data, dist.correct=FALSE)
test <- compartmentalize(data, robust.cov=NA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.