This function uses tools in the intervals package to quickly identify clusters – contiguous collections of positions or intervals which are separated by no more than a given distance from their neighbors to either side.

1 2 3 4 5 |

`x` |
An appropriate object. |

`w` |
Maximum permitted distance between a cluster member and its neighbors to either side. |

`which` |
Should indices into the |

`check_valid` |
Should |

A cluster is defined to be a maximal collection, with at least two
members, of components of `x`

which are separated by no more than
`w`

. Note that when `x`

represents intervals, an interval
must actually *contain a point* at distance `w`

or less from
a neighboring interval to be assigned to the same cluster. If the ends
of both intervals in question are open and exactly at distance
`w`

, they will not be deemed to be cluster co-members. See the
example below.

A list whose components are the clusters. Each component is thus a
subset of `x`

, or, if `which == TRUE`

, a vector of
indices into the `x`

object. (The indices correspond to row
numbers when `x`

is of class `"Intervals_virtual"`

.)

Implementation is by a call to `reduce`

followed by a call
to `interval_overlap`

. The `clusters`

methods are
included to illustrate the utility of the core functions in the
intervals package, although they are also useful in their own
right.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ```
# Numeric method
w <- 20
x <- sample( 1000, 100 )
c1 <- clusters( x, w )
# Check results
sapply( c1, function( x ) all( diff(x) <= w ) )
d1 <- diff( sort(x) )
all.equal(
as.numeric( d1[ d1 <= w ] ),
unlist( sapply( c1, diff ) )
)
# Intervals method, starting with a reduced object so we know that all
# intervals are disjoint and sorted.
B <- 100
left <- runif( B, 0, 1e4 )
right <- left + rexp( B, rate = 1/10 )
y <- reduce( Intervals( cbind( left, right ) ) )
gaps <- function(x) x[-1,1] - x[-nrow(x),2]
hist( gaps(y), breaks = 30 )
w <- 200
c2 <- clusters( y, w )
head( c2 )
sapply( c2, function(x) all( gaps(x) <= w ) )
# Clusters and open end points. See "Details".
z <- Intervals(
matrix( 1:4, 2, 2, byrow = TRUE ),
closed = c( TRUE, FALSE )
)
z
clusters( z, 1 )
closed(z)[1] <- FALSE
z
clusters( z, 1 )
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.