normalizeCNV | R Documentation |
Compute normalization offsets to remove CNV-driven and abundance-dependent biases
normalizeCNV(data, margins, prior.count=3, span=0.3, maxk=500,
assay.data=1, assay.marg=1, ...)
matchMargins(data, margins)
data |
an InteractionSet object produced by |
margins |
a RangedSummarizedExperiment object produced by |
prior.count |
a numeric scalar specifying the prior count to use in computing marginal log-ratios |
span |
a numeric scalar between 0 and 1, describing the span of the fit |
maxk |
a integer scalar specifying the number of vertices to use during local fitting |
assay.data |
a string or integer scalar specifying the matrix to use from |
assay.marg |
a string or integer scalar specifying the matrix to use from |
... |
other arguments to pass to |
Each bin pair in data
is associated with three covariates.
The first two are the marginal log-ratios of the corresponding bins, i.e., the log-ratio of the marginal counts between two libraries.
These represent the relative CNVs in the interacting regions between libraries.
To avoid redundancy, the first covariate is the larger marginal log-ratio whereas the second is the smaller.
The third covariate is the average abundance across all libraries.
Each bin pair is also associated with a response, i.e., the log-ratio of the interaction counts between two libraries.
A loess-like surface is fitted to the response against the three covariates, using the locfit
function.
The aim is to eliminate systematic differences between libraries at any combination of covariate values.
This removes CNV-driven biases as well as trended biases with respect to the abundance.
The fitted value can then be used as a GLM offset for each bin pair.
The objects in data
and margins
should be constructed with the same width
and param
arguments in their respective functions.
This ensures that the regions are the same, so that the marginal counts can be directly used.
Matching of the bins in each bin pair in data
to indices of margins
is performed using matchMargins
.
Note that the marginal counts are not directly computed from data
as filtering of bin pairs may be performed beforehand.
In practice, normalization offsets are computed for each library relative to a single reference “average” library.
This average library is constructed by using the average abundance as the (log-)count for both the bin pair and marginal counts.
The space of all pairs of CNV log-ratios is also rotated by 45 degrees prior to smoothing.
This improves the performance of the approximations used by locfit
.
The fit parameters can be changed by varying span
, maxk
and additional arguments in locfit
.
Higher values of span
will increase smoothness, at the cost of sensitivity.
Increases in maxk
may be required to obtain a more accurate approximation when fitting large datasets.
In all cases, a loess fit of degree 1 is used.
For use by downstream functions, the offset matrix can be stored as an assay named "offset"
in the data
object.
For normalizeCNV
, a numeric matrix is returned with the same dimensions as counts(data)
.
This contains log-based GLM offsets for each bin pair in each library.
For matchMargins
, a data frame is returned with integer fields anchor1
and anchor2
.
Each field specifies the index in margins
corresponding to the bin regions for each bin pair in data
.
Aaron Lun
locfit
,
lp
,
squareCounts
,
marginCounts
# Dummying up some data.
set.seed(3423746)
npts <- 100
npairs <- 5000
nlibs <- 4
anchor1 <- sample(npts, npairs, replace=TRUE)
anchor2 <- sample(npts, npairs, replace=TRUE)
data <- InteractionSet(
list(counts=matrix(rpois(npairs*nlibs, runif(npairs, 10, 100)), nrow=npairs)),
GInteractions(anchor1=anchor1, anchor2=anchor2,
regions=GRanges("chrA", IRanges(1:npts, 1:npts)), mode="reverse"),
colData=DataFrame(totals=runif(nlibs, 1e6, 2e6)))
margins <- SummarizedExperiment(matrix(rpois(npts*nlibs, 100), nrow=npts),
colData=DataFrame(totals=data$totals), rowRanges=regions(data))
# Running normalizeCNV.
out <- normalizeCNV(data, margins)
head(out)
head(normalizeCNV(data, margins, prior.count=1))
head(normalizeCNV(data, margins, span=0.5))
# Store offsets as the 'offset' assay for use by, e.g., asDGEList.
assays(data)$offset <- out
data
# Occasionally locfit will complain; increase maxk to compensate.
data <- InteractionSet(matrix(rpois(npairs*nlibs, 20), nrow=npairs),
GInteractions(anchor1=anchor1, anchor2=anchor2,
regions=GRanges("chrA", IRanges(1:npts, 1:npts)), mode="reverse"),
colData=DataFrame(totals=runif(nlibs, 1e6, 2e6)))
tryCatch(head(normalizeCNV(data, margins, maxk=100)), error=function(e) e)
head(normalizeCNV(data, margins, maxk=1000))
# Matching margins.
matched <- matchMargins(data, margins)
head(matched)
anchor1.counts <- margins[matched$anchor1,]
anchor2.counts <- margins[matched$anchor2,]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.