DCluster | R Documentation |
DCluster is a collection of several methods related to the detection of spatial clusters of diseases. Many widely used methods, such as Openshaw's GAM, Besag and Newell, Kulldorff and Nagarwalla, and others have been implemented.
Besides the calculation of these statistic, bootstrap can be used to test its departure from the null hypotheses, which will be no clustering in the study area. For possible sampling methods can be used to perform the simulations: permutation, Multinomial, Poisson and Poisson-Gamma.
Minor modifications have been made to the methods to use standardized expected number of cases instead of population, since it provides a better approach to the expected number of cases.
We'll always suppose that we are working on a study region which is divided
into n non-overlaping smaller areas where data are measured. Data
measured are usually people suffering from a disease or even deaths. This will
be refered as Observed number of cases. For a given area, its observed
number of cases will be denoted by O_i
and the sum of these
quantities over the whole study region will be O_+
.
In the same way can be defined Population and Standardized
Expected number of cases, which will be denoted by P_i
and
E_i
, respectively. The sum of all these quantities
are represented by P_+
and E_+
.
The basic assumption for the data is that they are independant
observations from a Poisson distribution, whose mean is
\theta_iE_i
, where \theta_i
is the relative risk. That is,
O_i \sim Po(\theta_i E_i); \ i=1, \ldots , n
Null hypotheses is usually equal relative risks, that is
H_0: \theta_1= \ldots = \theta_n = \lambda
\lambda
may be considered to be known (one, which means standard
risk) or unknown. In the last case, E_i
must slightly be corrected
by multiplying it by the overall relative risk \frac{O_+}{E_+}
.
Function names follow a common format, which is a follows:
Calculate the statistic itself.
Perform a non-parametric bootstrap.
Perform a parametric bootstrap.
Openshaw's G.A.M. has generally been implemented in a function called gam, which some methods ( Kulldorff & Nagarwalla, Besag & Newell) also use, since they are based on a window scan of the whole region. At every point of the grid, a function is called to determine whether that point is a cluster or not. The name of this function is shorten method name.iscluster.
This function calculates the local value of the statistic involved and its signifiance by means of bootstrap. The interface provided, through function gam, is quite straightforward to use and it can handle the three methods mentioned and other supplied by the users.
Four possible bootstrap models have been provided in order to estimate sampling distributions of the statistics provided. The first one is a non-parametric bootstrap, which performs permutations over the observed number of cases, while the three others are parametric bootstrap based on Multinomial, Poisson and Poisson-Gamma distributions.
Permutation method just takes observed number of cases and permute them among all regions, to know whether risk in uniform across the whole study area. It just should be used with care since we'll face the problem of having more observed cases than population in very small populated areas.
Multinomial sampling is based on conditioning the Poisson framework
to O_+
. THis way (O_1, \ldots, O_n)
follows a multinomial distribution of size O_+
and
probabilities (\frac{E_1}{E_+}, \ldots, \frac{E_n}{E_+})
.
Poisson sampling just generates observed number of cases from a Poisson
distribution whose mean is E_i
.
Poisson-Gamma sampling is based on the Poisson-Gamma model proposed by Clayton and Kaldor (1984):
O_i|\theta_i \sim Po(\theta_i E_i)
\theta_i \sim Ga(\nu, \alpha)
The distribution of O_i
unconditioned to \theta_i
is
Negative Binomial with size \nu
and probability
\frac{\alpha}{\alpha+E_i}
. The two parameters can be
estimated using an Empirical Bayes approach from the Expected and Observed
number of cases. Function empbaysmooth is provided for this purpose.
One of the parameters, which is usually called data, passed to many of the functions in this package is a dataframe which contains the data for each of the regions used in the analysis. Besides, its columns must be labeled:
Observed number of cases.
Standardised expected number of cases.
Population at risk.
Easting coordinate of the region centroid.
Northing coordinate of the region centroid.
Clayton, David and Kaldor, John (1987). Empirical Bayes Estimates of Age-standardized Relative Risks for Use in Disease Mapping. Biometrics 43, 671-681.
Lawson et al (eds.) (1999). Disease Mapping and Risk Assessment for Public Health. John Wiley and Sons, Inc.
Lawson, A. B. (2001). Statistical Methods in Spatial Epidemiology. John Wiley and Sons, Inc.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.