DCluster | R Documentation |

DCluster is a collection of several methods related to the detection of spatial clusters of diseases. Many widely used methods, such as Openshaw's GAM, Besag and Newell, Kulldorff and Nagarwalla, and others have been implemented.

Besides the calculation of these statistic, bootstrap can be used to test its departure from the null hypotheses, which will be no clustering in the study area. For possible sampling methods can be used to perform the simulations: permutation, Multinomial, Poisson and Poisson-Gamma.

Minor modifications have been made to the methods to use standardized expected number of cases instead of population, since it provides a better approach to the expected number of cases.

We'll always suppose that we are working on a study region which is divided
into *n* non-overlaping smaller areas where data are measured. Data
measured are usually people suffering from a disease or even deaths. This will
be refered as *Observed number of cases*. For a given area, its observed
number of cases will be denoted by `O_i`

and the sum of these
quantities over the whole study region will be `O_+`

.

In the same way can be defined *Population* and *Standardized
Expected number of cases*, which will be denoted by `P_i`

and
`E_i`

, respectively. The sum of all these quantities
are represented by `P_+`

and `E_+`

.

The basic assumption for the data is that they are independant
observations from a Poisson distribution, whose mean is
`\theta_iE_i`

, where `\theta_i`

is the relative risk. That is,

`O_i \sim Po(\theta_i E_i); \ i=1, \ldots , n`

Null hypotheses is usually equal relative risks, that is

`H_0: \theta_1= \ldots = \theta_n = \lambda`

`\lambda`

may be considered to be known (one, which means standard
risk) or unknown. In the last case, `E_i`

must slightly be corrected
by multiplying it by the overall relative risk `\frac{O_+}{E_+}`

.

Function names follow a common format, which is a follows:

*method name*.statCalculate the statistic itself.*method name*.bootPerform a non-parametric bootstrap.*method name*.pbootPerform a parametric bootstrap.

Openshaw's G.A.M. has generally been implemented in a function called
*gam*, which some methods ( Kulldorff & Nagarwalla, Besag & Newell) also
use, since they are based on a window scan of the whole region. At every point
of the grid, a function is called to determine whether that point is a cluster
or not. The name of this function is *shorten method name.iscluster*.

This function calculates the local value of the statistic involved and
its signifiance by means of bootstrap. The interface provided, through
function *gam*, is quite straightforward to use and it can handle the
three methods mentioned and other supplied by the users.

Four possible bootstrap models have been provided in order to estimate sampling distributions of the statistics provided. The first one is a non-parametric bootstrap, which performs permutations over the observed number of cases, while the three others are parametric bootstrap based on Multinomial, Poisson and Poisson-Gamma distributions.

Permutation method just takes observed number of cases and permute them among all regions, to know whether risk in uniform across the whole study area. It just should be used with care since we'll face the problem of having more observed cases than population in very small populated areas.

Multinomial sampling is based on conditioning the Poisson framework
to `O_+`

. THis way `(O_1, \ldots, O_n)`

follows a multinomial distribution of size `O_+`

and
probabilities `(\frac{E_1}{E_+}, \ldots, \frac{E_n}{E_+})`

.

Poisson sampling just generates observed number of cases from a Poisson
distribution whose mean is `E_i`

.

Poisson-Gamma sampling is based on the Poisson-Gamma model proposed
by *Clayton and Kaldor* (1984):

`O_i|\theta_i \sim Po(\theta_i E_i)`

`\theta_i \sim Ga(\nu, \alpha)`

The distribution of `O_i`

unconditioned to `\theta_i`

is
Negative Binomial with size `\nu`

and probability
`\frac{\alpha}{\alpha+E_i}`

. The two parameters can be
estimated using an Empirical Bayes approach from the Expected and Observed
number of cases. Function *empbaysmooth* is provided for this purpose.

One of the parameters, which is usually called *data*, passed to many of
the functions in this package is a dataframe which contains the data for each
of the regions used in the analysis. Besides, its columns must be labeled:

**Observed**Observed number of cases.**Expected**Standardised expected number of cases.**Population**Population at risk.**x**Easting coordinate of the region centroid.**y**Northing coordinate of the region centroid.

Clayton, David and Kaldor, John (1987). Empirical Bayes Estimates of Age-standardized Relative Risks for Use in Disease Mapping. Biometrics 43, 671-681.

Lawson et al (eds.) (1999). Disease Mapping and Risk Assessment for Public Health. John Wiley and Sons, Inc.

Lawson, A. B. (2001). Statistical Methods in Spatial Epidemiology. John Wiley and Sons, Inc.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.