Description Usage Arguments Value Author(s) See Also Examples
For each position in the space underlying a set of ranges, counts the number of ranges that cover it.
1 2 3 4 5 6 7 8 9 | coverage(x, shift=0L, width=NULL, weight=1L, ...)
## S4 method for signature 'IntegerRanges'
coverage(x, shift=0L, width=NULL, weight=1L,
method=c("auto", "sort", "hash", "naive"))
## S4 method for signature 'IntegerRangesList'
coverage(x, shift=0L, width=NULL, weight=1L,
method=c("auto", "sort", "hash", "naive"))
|
x |
A IntegerRanges, Views, or IntegerRangesList object.
See |
shift, weight |
If |
width |
Specifies the length of the returned coverage vector(s).
|
method |
If The The Using |
... |
Further arguments to be passed to or from other methods. |
If x
is a IntegerRanges or Views object:
An integer- or numeric-Rle object depending on whether weight
is an integer or numeric vector.
If x
is a IntegerRangesList object:
An RleList object with one coverage vector per list element
in x
, and with x
names propagated to it. The i-th coverage
vector can be either an integer- or numeric-Rle object, depending
on the type of weight[[i]]
(after weight
has gone thru
as.list
and recycling, like described previously).
H. Pagès and P. Aboyoun
coverage-methods in the GenomicRanges
package for more coverage
methods.
The slice
function for slicing the Rle or
RleList object returned by coverage
.
IntegerRanges, IPos, IntegerRangesList, Rle, and RleList objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | ## ---------------------------------------------------------------------
## A. COVERAGE OF AN IRanges OBJECT
## ---------------------------------------------------------------------
x <- IRanges(start=c(-2L, 6L, 9L, -4L, 1L, 0L, -6L, 10L),
width=c( 5L, 0L, 6L, 1L, 4L, 3L, 2L, 3L))
coverage(x)
coverage(x, shift=7)
coverage(x, shift=7, width=27)
coverage(x, shift=c(-4, 2)) # 'shift' gets recycled
coverage(x, shift=c(-4, 2), width=12)
coverage(x, shift=-max(end(x)))
coverage(restrict(x, 1, 10))
coverage(reduce(x), shift=7)
coverage(gaps(shift(x, 7), start=1, end=27))
## With weights:
coverage(x, weight=as.integer(10^(0:7))) # integer-Rle
coverage(x, weight=c(2.8, -10)) # numeric-Rle, 'shift' gets recycled
## ---------------------------------------------------------------------
## B. FLOATING POINT ARITHMETIC CAN BRING A SURPRISE
## ---------------------------------------------------------------------
## Please be aware that rounding errors in floating point arithmetic can
## lead to some surprising results when computing a weighted coverage:
y <- IRanges(c(4, 10), c(18, 15))
w1 <- 0.958
w2 <- 1e4
cvg <- coverage(y, width=100, weight=c(w1, w2))
cvg # non-zero coverage at positions 19 to 100!
## This is an artefact of floating point arithmetic and the algorithm
## used to compute the weighted coverage. It can be observed with basic
## floating point arithmetic:
w1 + w2 - w2 - w1 # very small non-zero value!
## Note that this only happens with the "sort" and "hash" methods but
## not with the "naive" method:
coverage(y, width=100, weight=c(w1, w2), method="sort")
coverage(y, width=100, weight=c(w1, w2), method="hash")
coverage(y, width=100, weight=c(w1, w2), method="naive")
## These very small non-zero coverage values in the no-coverage regions
## of the numeric-Rle object returned by coverage() are not always
## present. But when they are, they can cause problems downstream or
## in unit tests. For example downstream code that relies on things
## like 'cvg != 0' to find regions with coverage won't work properly.
## This can be mitigated either by selecting the "naive" method (be aware
## that this can slow down things significantly) or by "cleaning" 'cvg'
## first e.g. with something like 'cvg <- round(cvg, digits)' where
## 'digits' is a carefully chosen number of digits:
cvg <- round(cvg, digits=3)
## Note that this rounding will also have the interesting side effect of
## reducing the memory footprint of the Rle object in general (because
## some runs might get merged into a single run as a consequence of the
## rounding).
## ---------------------------------------------------------------------
## C. COVERAGE OF AN IPos OBJECT
## ---------------------------------------------------------------------
pos_runs <- IRanges(c(1, 5, 9), c(10, 8, 15))
ipos <- IPos(pos_runs)
coverage(ipos)
## ---------------------------------------------------------------------
## D. COVERAGE OF AN IRangesList OBJECT
## ---------------------------------------------------------------------
x <- IRangesList(A=IRanges(3*(4:-1), width=1:3), B=IRanges(2:10, width=5))
cvg <- coverage(x)
cvg
stopifnot(identical(cvg[[1]], coverage(x[[1]])))
stopifnot(identical(cvg[[2]], coverage(x[[2]])))
coverage(x, width=c(50, 9))
coverage(x, width=c(NA, 9))
coverage(x, width=9) # 'width' gets recycled
## Each list element in 'shift' and 'weight' gets recycled to the length
## of the corresponding element in 'x'.
weight <- list(as.integer(10^(0:5)), -0.77)
cvg2 <- coverage(x, weight=weight)
cvg2 # 1st coverage vector is an integer-Rle, 2nd is a numeric-Rle
identical(mapply(coverage, x=x, weight=weight), as.list(cvg2))
## ---------------------------------------------------------------------
## E. SOME MATHEMATICAL PROPERTIES OF THE coverage() FUNCTION
## ---------------------------------------------------------------------
## PROPERTY 1: The coverage vector is not affected by reordering the
## input ranges:
set.seed(24)
x <- IRanges(sample(1000, 40, replace=TRUE), width=17:10)
cvg0 <- coverage(x)
stopifnot(identical(coverage(sample(x)), cvg0))
## Of course, if the ranges are shifted and/or assigned weights, then
## this doesn't hold anymore, unless the 'shift' and/or 'weight'
## arguments are reordered accordingly.
## PROPERTY 2: The coverage of the concatenation of 2 IntegerRanges
## objects 'x' and 'y' is the sum of the 2 individual coverage vectors:
y <- IRanges(sample(-20:280, 36, replace=TRUE), width=28)
stopifnot(identical(coverage(c(x, y), width=100),
coverage(x, width=100) + coverage(y, width=100)))
## Note that, because adding 2 vectors in R recycles the shortest to
## the length of the longest, the following is generally FALSE:
identical(coverage(c(x, y)), coverage(x) + coverage(y)) # FALSE
## It would only be TRUE if the 2 coverage vectors that we add had the
## same length, which would only happen by chance. By using the same
## 'width' value when we computed the 2 coverages previously, we made
## sure they had the same length.
## Because of properties 1 & 2, we have:
x1 <- x[c(TRUE, FALSE)] # pick up 1st, 3rd, 5th, etc... ranges
x2 <- x[c(FALSE, TRUE)] # pick up 2nd, 4th, 6th, etc... ranges
cvg1 <- coverage(x1, width=100)
cvg2 <- coverage(x2, width=100)
stopifnot(identical(coverage(x, width=100), cvg1 + cvg2))
## PROPERTY 3: Multiplying the weights by a scalar has the effect of
## multiplying the coverage vector by the same scalar:
weight <- runif(40)
cvg3 <- coverage(x, weight=weight)
stopifnot(all.equal(coverage(x, weight=-2.68 * weight), -2.68 * cvg3))
## Because of properties 1 & 2 & 3, we have:
stopifnot(identical(coverage(x, width=100, weight=c(5L, -11L)),
5L * cvg1 - 11L * cvg2))
## PROPERTY 4: Using the sum of 2 weight vectors produces the same
## result as using the 2 weight vectors separately and summing the
## 2 results:
weight2 <- 10 * runif(40) + 3.7
stopifnot(all.equal(coverage(x, weight=weight + weight2),
cvg3 + coverage(x, weight=weight2)))
## PROPERTY 5: Repeating any input range N number of times is
## equivalent to multiplying its assigned weight by N:
times <- sample(0:10L, length(x), replace=TRUE)
stopifnot(all.equal(coverage(rep(x, times), weight=rep(weight, times)),
coverage(x, weight=weight * times)))
## In particular, if 'weight' is not supplied:
stopifnot(identical(coverage(rep(x, times)), coverage(x, weight=times)))
## PROPERTY 6: If none of the input range actually gets clipped during
## the "shift and clip" process, then:
##
## sum(cvg) = sum(width(x) * weight)
##
stopifnot(sum(cvg3) == sum(width(x) * weight))
## In particular, if 'weight' is not supplied:
stopifnot(sum(cvg0) == sum(width(x)))
## Note that this property is sometimes used in the context of a
## ChIP-Seq analysis to estimate "the number of reads in a peak", that
## is, the number of short reads that belong to a peak in the coverage
## vector computed from the genomic locations (a.k.a. genomic ranges)
## of the aligned reads. Because of property 6, the number of reads in
## a peak is approximately the area under the peak divided by the short
## read length.
## PROPERTY 7: If 'weight' is not supplied, then disjoining or reducing
## the ranges before calling coverage() has the effect of "shaving" the
## coverage vector at elevation 1:
table(cvg0)
shaved_cvg0 <- cvg0
runValue(shaved_cvg0) <- pmin(runValue(cvg0), 1L)
table(shaved_cvg0)
stopifnot(identical(coverage(disjoin(x)), shaved_cvg0))
stopifnot(identical(coverage(reduce(x)), shaved_cvg0))
## ---------------------------------------------------------------------
## F. SOME SANITY CHECKS
## ---------------------------------------------------------------------
dummy_coverage <- function(x, shift=0L, width=NULL)
{
y <- IRanges:::unlist_as_integer(shift(x, shift))
if (is.null(width))
width <- max(c(0L, y))
Rle(tabulate(y, nbins=width))
}
check_real_vs_dummy <- function(x, shift=0L, width=NULL)
{
res1 <- coverage(x, shift=shift, width=width)
res2 <- dummy_coverage(x, shift=shift, width=width)
stopifnot(identical(res1, res2))
}
check_real_vs_dummy(x)
check_real_vs_dummy(x, shift=7)
check_real_vs_dummy(x, shift=7, width=27)
check_real_vs_dummy(x, shift=c(-4, 2))
check_real_vs_dummy(x, shift=c(-4, 2), width=12)
check_real_vs_dummy(x, shift=-max(end(x)))
## With a set of distinct single positions:
x3 <- IRanges(sample(50000, 20000), width=1)
stopifnot(identical(sort(start(x3)), which(coverage(x3) != 0L)))
|
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
integer-Rle of length 14 with 6 runs
Lengths: 2 2 4 1 3 2
Values : 3 1 0 1 2 1
integer-Rle of length 21 with 10 runs
Lengths: 3 1 2 1 2 2 4 1 3 2
Values : 1 0 1 2 3 1 0 1 2 1
integer-Rle of length 27 with 11 runs
Lengths: 3 1 2 1 2 2 4 1 3 2 6
Values : 1 0 1 2 3 1 0 1 2 1 0
integer-Rle of length 14 with 4 runs
Lengths: 1 9 1 3
Values : 0 1 0 1
integer-Rle of length 12 with 4 runs
Lengths: 1 9 1 1
Values : 0 1 0 1
integer-Rle of length 0 with 0 runs
Lengths:
Values :
integer-Rle of length 10 with 5 runs
Lengths: 2 2 4 1 1
Values : 3 1 0 1 2
integer-Rle of length 21 with 5 runs
Lengths: 3 1 7 4 6
Values : 1 0 1 0 1
integer-Rle of length 27 with 6 runs
Lengths: 3 1 7 4 6 6
Values : 0 1 0 1 0 1
integer-Rle of length 14 with 6 runs
Lengths: 2 2 4 1 3 2
Values : 110001 10000 0 100 10000100 100
numeric-Rle of length 14 with 6 runs
Lengths: 2 2 4 1 3 2
Values : -4.4 2.8 0.0 2.8 -7.2 2.8
numeric-Rle of length 100 with 5 runs
Lengths: 3 6 6 3 82
Values : 0.00000e+00 9.58000e-01 1.00010e+04 9.58000e-01 5.38458e-13
[1] 5.384582e-13
numeric-Rle of length 100 with 5 runs
Lengths: 3 6 6 3 82
Values : 0.00000e+00 9.58000e-01 1.00010e+04 9.58000e-01 5.38458e-13
numeric-Rle of length 100 with 5 runs
Lengths: 3 6 6 3 82
Values : 0.00000e+00 9.58000e-01 1.00010e+04 9.58000e-01 5.38458e-13
numeric-Rle of length 100 with 5 runs
Lengths: 3 6 6 3 82
Values : 0.000 0.958 10000.958 0.958 0.000
integer-Rle of length 15 with 3 runs
Lengths: 4 6 5
Values : 1 2 1
RleList of length 2
$A
integer-Rle of length 12 with 7 runs
Lengths: 1 1 1 2 5 1 1
Values : 1 0 1 0 1 0 1
$B
integer-Rle of length 14 with 10 runs
Lengths: 1 1 1 1 1 5 1 1 1 1
Values : 0 1 2 3 4 5 4 3 2 1
RleList of length 2
$A
integer-Rle of length 50 with 8 runs
Lengths: 1 1 1 2 5 1 1 38
Values : 1 0 1 0 1 0 1 0
$B
integer-Rle of length 9 with 6 runs
Lengths: 1 1 1 1 1 4
Values : 0 1 2 3 4 5
RleList of length 2
$A
integer-Rle of length 12 with 7 runs
Lengths: 1 1 1 2 5 1 1
Values : 1 0 1 0 1 0 1
$B
integer-Rle of length 9 with 6 runs
Lengths: 1 1 1 1 1 4
Values : 0 1 2 3 4 5
RleList of length 2
$A
integer-Rle of length 9 with 5 runs
Lengths: 1 1 1 2 4
Values : 1 0 1 0 1
$B
integer-Rle of length 9 with 6 runs
Lengths: 1 1 1 1 1 4
Values : 0 1 2 3 4 5
RleList of length 2
$A
integer-Rle of length 12 with 8 runs
Lengths: 1 1 1 2 3 2 1 1
Values : 10000 0 1000 0 100 10 0 1
$B
numeric-Rle of length 14 with 10 runs
Lengths: 1 1 1 1 1 5 1 1 1 1
Values : 0.00 -0.77 -1.54 -2.31 -3.08 -3.85 -3.08 -2.31 -1.54 -0.77
[1] TRUE
[1] FALSE
Warning message:
In coverage(x) + coverage(y) :
longer object length is not a multiple of shorter object length
cvg0
0 1 2 3 4 5
554 268 98 13 8 1
shaved_cvg0
0 1
554 388
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.