In mcavallaro/rancovr: Cluster detection in R with RAndom Neighbourhood COVeRing

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
devtools::load_all()

The package rancovr provides a family of functions for detecting anomalies with the RAndom Neighbourhood COVeRing. This method is particularly useful in epidemiological studies where one needs to evaluate whether a set of infection events occurring in a certain space and time is of concern and deviates from a null model prediction. In other words rancovr can help researchers compare observed data against null model predictions and determine whether the events are atypical or deviate significantly from what would be expected under the null model.

This vignette only provides a brief overview of the package. For a more comprehensive understanding of the methodology and its applications, we refer to references [1] and [[2]]. The package includes two primary functions, CreateCylinders and warning.score. The CreateCylinders function is used to generate cylindrical shapes that cover the spatio-temporal events of interest. Once the cylindrical shapes have been generated, the warning.score function can be used to aggregate the information from all cylinders and calculate a continuous score between zero and one for each event. This score provides a measure of the event's deviation from the expected behavior under a given null model.

Create dataset

The RAndom Neighbourhood COVeRing is designed to compare counts of discrete events with a baseline. In rancovr the baseline can be defined using a spatio-temporal grid, where the spatial component is expanded into rows using the expand.grid function (built-in in base R) and the columns represent time steps.

x <- seq(0, 10, length.out = 20)
y <- seq(0, 10, length.out = 20)
tt = seq(1, 20)
d1 <- expand.grid(x = x, y = y)
d1$names = paste('loc', as.character(1:20),  sep = '-')
baseline.matrix = sapply(tt, function( . ){
  exp(- (d1$x - 5)^2 - (d1$y - 5)^2 )  *  cos( 0.2 * . )^2} )
rownames(baseline.matrix) = d1$names
colnames(baseline.matrix) = as.character(tt)
baseline.matrix[1:5, 1:5]

The values of baseline.matrix at row i and column j represent the expected number of events at location i during time step j. To visualize the baseline at a specific time, the built-in R function image can be used to plot the matrix. In the resulting plot, darker colors indicate higher expected event counts.

b=seq(-0.5,max(baseline.matrix),length=13)
image(x, y, matrix(baseline.matrix[,5], ncol=20), main='t=5', breaks=b)
image(x, y, matrix(baseline.matrix[,15], ncol=20), main='t=15', breaks=b)

To generate observations from this baseline, we use the built-in R function rpois. Since these observations are generated from the baseline, they should not raise any alarms.

observation.matrix = matrix(
  rpois(length(baseline.matrix), baseline.matrix),
  ncol = ncol(baseline.matrix))

rownames(observation.matrix) = d1$names
colnames(observation.matrix) = as.character(tt)

We can use the image function to illustrate the distribution of events generated from the baseline and compare it with the original baseline.

b=seq(-0.5, max(observation.matrix), length = 13)
image(x, y, matrix(observation.matrix[,5], ncol = 20), main = 't=5', breaks = b)
image(x, y, matrix(observation.matrix[,15], ncol = 20), main = 't=15', breaks = b)

The aggregated observations and baseline are similar, within statistical fluctuations.

plot(colSums(observation.matrix), ylab='', xlab='t')
lines(colSums(baseline.matrix))
legend('bottomright', c('observation', 'baseline'), pch=c(1, NA), lty=c(NA, 1))

We will now introduce a set of discrete events at selected locations and times that do not conform to the known baseline. This set of events will be referred to as a "novel cluster", and it is important that the algorithm is capable of detecting these points.

observation.matrix[30:35, 10:15] = observation.matrix[30:35, 10:15] + rpois(36, 0.5)

t = 5
image(x, y, matrix(observation.matrix[,t], ncol=20), main = paste0('t=', as.character(t)), breaks = b)
t = 15
image(x, y, matrix(observation.matrix[,t], ncol=20), main = paste0('t=', as.character(t)), breaks = b)

plot(colSums(observation.matrix), ylab='', xlab='t')
lines(colSums(baseline.matrix))
legend('bottomright', c('observation', 'baseline'), pch=c(1, NA), lty=c(NA, 1))

Retrospective usage

We now utilise rancovr to identify the cluster of atypical events. Retrospectively, we generate all possible cylinders up until a specified date TT using the function CreateCylinders with the only.last=FALSE option.

cylinders = CreateCylinders(observation.matrix,
                            baseline.matrix, range(tt), n.cylinders = 1000,
                            coord.df = d1, size_factor = 4, GT = 5)
head(cylinders)

Each cylinder encompasses a number of observed events. The expected number of events in each cylinder is determined by the Poisson statistics and can be plotted against the actual number of observed events using the plotCylindersCI function. This can help us identify cylinders that contain a significantly higher number of events than what would be expected based on the baseline.

plotCylindersCI(cylinders )

We then apply the function warning.score2 to all grid locations to extract warning scores given the cylinders.

tmp = apply(d1, 1, FUN=warning.score2, 20, cylinders)
d1[, 'retrospective'] = tmp[1,]
head(d1)

Prospective usage

rancover can be also prospectively, the same as what you would have got if you'd done it in realtime. Draw cylinders up until a partial date TT with only.last=TRUE option and apply warning.score2 at each partial date. Setting only.last=TRUE generate cylinders that do not extend past the time TT. These cylinders cover present or past events and extend up until the current day (and not in the future).

for (TT in tt){
  cylinders = CreateCylinders(observation.matrix,
                            baseline.matrix, range(tt), n.cylinders = 1000,
                            coord.df = d1, size_factor = 4, GT = 5, only.last = T, TT = TT)
  colname =  paste0('warning.score.', as.character(TT))
  tmp = apply(d1, 1, FUN=warning.score2, TT, cylinders)
  d1[,colname] = tmp[1,]
}
head(d1)

Vignette Info

sessionInfo()

References

[1] M. Cavallaro, J. Coelho, D. Ready, V. Decraene, T. Lamagni, N. D. McCarthy, D. Todkill, M. J. Keeling (2022) Cluster detection with random neighbourhood covering: Application to invasive Group A Streptococcal disease. PLoS Comput Biol 18(11): e1010726. https://doi.org/10.1371/journal.pcbi.1010726

[2] M. Cavallaro, L. Dyson, M. J. Keeling (2023) Manuscript in preparation.

mcavallaro/rancovr documentation built on April 17, 2025, 7:21 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com