relrisk.ppp: Nonparametric Estimate of Spatially-Varying Relative Risk

relrisk.pppR Documentation

Nonparametric Estimate of Spatially-Varying Relative Risk

Description

Given a multitype point pattern, this function estimates the spatially-varying probability of each type of point, or the ratios of such probabilities, using kernel smoothing. The default smoothing bandwidth is selected by cross-validation.

Usage

## S3 method for class 'ppp'
relrisk(X, sigma = NULL, ...,
           at = c("pixels", "points"),
           weights = NULL, varcov = NULL, 
           relative=FALSE,
           adjust=1, edge=TRUE, diggle=FALSE,
           se=FALSE, wtype=c("value", "multiplicity"),
           casecontrol=TRUE, control=1, case, fudge=0)

Arguments

X

A multitype point pattern (object of class "ppp" which has factor valued marks).

sigma

Optional. The numeric value of the smoothing bandwidth (the standard deviation of isotropic Gaussian smoothing kernel). Alternatively sigma may be a function which can be used to select a different bandwidth for each type of point. See Details.

...

Arguments passed to bw.relrisk to select the bandwidth, or passed to density.ppp to control the pixel resolution.

at

Character string specifying whether to compute the probability values at a grid of pixel locations (at="pixels") or only at the points of X (at="points").

weights

Optional. Weights for the data points of X. A numeric vector, an expression, or a pixel image.

varcov

Optional. Variance-covariance matrix of anisotopic Gaussian smoothing kernel. Incompatible with sigma.

relative

Logical. If FALSE (the default) the algorithm computes the probabilities of each type of point. If TRUE, it computes the relative risk, the ratio of probabilities of each type relative to the probability of a control.

adjust

Optional. Adjustment factor for the bandwidth sigma.

edge

Logical value indicating whether to apply edge correction.

diggle

Logical. If TRUE, use the Jones-Diggle improved edge correction, which is more accurate but slower to compute than the default correction.

se

Logical value indicating whether to compute standard errors as well.

wtype

Character string (partially matched) specifying how the weights should be interpreted for the calculation of standard error. See Details.

casecontrol

Logical. Whether to treat a bivariate point pattern as consisting of cases and controls, and return only the probability or relative risk of a case. Ignored if there are more than 2 types of points. See Details.

control

Integer, or character string, identifying which mark value corresponds to a control.

case

Integer, or character string, identifying which mark value corresponds to a case (rather than a control) in a bivariate point pattern. This is an alternative to the argument control in a bivariate point pattern. Ignored if there are more than 2 types of points.

fudge

Optional. A single numeric value, or a numeric vector with one entry for each type of point. This value will be added to the estimates of point process intensity, before calculation of the relative risk.

Details

The command relrisk is generic and can be used to estimate relative risk in different ways.

This function relrisk.ppp is the method for point pattern datasets. It computes nonparametric estimates of relative risk by kernel smoothing (Bithell, 1990, 1991; Diggle, 2003; Baddeley, Rubak and Turner, 2015).

If X is a bivariate point pattern (a multitype point pattern consisting of two types of points) then by default, the points of the first type (the first level of marks(X)) are treated as controls or non-events, and points of the second type are treated as cases or events. Then by default this command computes the spatially-varying probability of a case, i.e. the probability p(u) that a point at spatial location u will be a case. If relative=TRUE, it computes the spatially-varying relative risk of a case relative to a control, r(u) = p(u)/(1- p(u)).

If X is a multitype point pattern with m > 2 types, or if X is a bivariate point pattern and casecontrol=FALSE, then by default this command computes, for each type j, a nonparametric estimate of the spatially-varying probability of an event of type j. This is the probability p_j(u) that a point at spatial location u will belong to type j. If relative=TRUE, the command computes the relative risk of an event of type j relative to a control, r_j(u) = p_j(u)/p_k(u), where events of type k are treated as controls. The argument control determines which type k is treated as a control.

If at = "pixels" the calculation is performed for every spatial location u on a fine pixel grid, and the result is a pixel image representing the function p(u) or a list of pixel images representing the functions p_j(u) or r_j(u) for j = 1,\ldots,m. An infinite value of relative risk (arising because the probability of a control is zero) will be returned as NA.

If at = "points" the calculation is performed only at the data points x_i. By default the result is a vector of values p(x_i) giving the estimated probability of a case at each data point, or a matrix of values p_j(x_i) giving the estimated probability of each possible type j at each data point. If relative=TRUE then the relative risks r(x_i) or r_j(x_i) are returned. An infinite value of relative risk (arising because the probability of a control is zero) will be returned as Inf.

Estimation is performed by a simple Nadaraja-Watson type kernel smoother (Bithell, 1990, 1991; Diggle, 2003; Baddeley, Rubak and Turner, 2015, section 14.4). The smoothing bandwidth can be specified in any of the following ways:

  • sigma is a single numeric value, giving the standard deviation of the isotropic Gaussian kernel.

  • sigma is a numeric vector of length 2, giving the standard deviations in the x and y directions of a Gaussian kernel.

  • varcov is a 2 by 2 matrix giving the variance-covariance matrix of the Gaussian kernel.

  • sigma is a function which selects the bandwidth. Bandwidth selection will be applied separately to each type of point. An example of such a function is bw.diggle.

  • sigma and varcov are both missing or null. Then a common smoothing bandwidth sigma will be selected by cross-validation using bw.relrisk.

  • An infinite smoothing bandwidth, sigma=Inf, is permitted and yields a constant estimate of relative risk.

If se=TRUE then standard errors will also be computed, based on asymptotic theory, assuming a Poisson process.

The optional argument weights may provide numerical weights for the points of X. It should be a numeric vector of length equal to npoints(X).

The argument weights can also be an expression. It will be evaluated in the data frame as.data.frame(X) to obtain a vector of weights. The expression may involve the symbols x and y representing the Cartesian coordinates, and the symbol marks representing the mark values.

The argument weights can also be a pixel image (object of class "im"). numerical weights for the data points will be extracted from this image (by looking up the pixel values at the locations of the data points in X).

Value

If se=FALSE (the default), the format is described below. If se=TRUE, the result is a list of two entries, estimate and SE, each having the format described below.

If X consists of only two types of points, and if casecontrol=TRUE, the result is a pixel image (if at="pixels") or a vector (if at="points"). The pixel values or vector values are the probabilities of a case if relative=FALSE, or the relative risk of a case (probability of a case divided by the probability of a control) if relative=TRUE.

If X consists of more than two types of points, or if casecontrol=FALSE, the result is:

  • (if at="pixels") a list of pixel images, with one image for each possible type of point. The result also belongs to the class "solist" so that it can be printed and plotted.

  • (if at="points") a matrix of probabilities, with rows corresponding to data points x_i, and columns corresponding to types j.

The pixel values or matrix entries are the probabilities of each type of point if relative=FALSE, or the relative risk of each type (probability of each type divided by the probability of a control) if relative=TRUE.

If relative=FALSE, the resulting values always lie between 0 and 1. If relative=TRUE, the results are either non-negative numbers, or the values Inf or NA.

Standard error

If se=TRUE, the standard error of the estimate will also be calculated. The calculation assumes a Poisson point process.

If weights are given, then the calculation of standard error depends on the interpretation of the weights. This is controlled by the argument wtype.

  • If wtype="value" (the default), the weights are interpreted as numerical values observed at the data locations. Roughly speaking, standard errors are proportional to the absolute values of the weights.

  • If wtype="multiplicity" the weights are interpreted as multiplicities so that a weight of 2 is equivalent to having a pair of duplicated points at the data location. Roughly speaking, standard errors are proportional to the square roots of the weights. Negative weights are not permitted.

The default rule is now wtype="value" but previous versions of relrisk.ppp (in spatstat.explore versions 3.1-0 and earlier) effectively used wtype="multiplicity".

Author(s)

\adrian

and \rolf.

References

\baddrubaturnbook

Bithell, J.F. (1990) An application of density estimation to geographical epidemiology. Statistics in Medicine 9, 691–701.

Bithell, J.F. (1991) Estimation of relative risk functions. Statistics in Medicine 10, 1745–1751.

Diggle, P.J. (2003) Statistical analysis of spatial point patterns, Second edition. Arnold.

Diggle, P.J., Zheng, P. and Durr, P. (2005) Non-parametric estimation of spatial segregation in a multivariate point process: bovine tuberculosis in Cornwall, UK. Applied Statistics 54, 645–658.

See Also

There is another method relrisk.ppm for point process models which computes parametric estimates of relative risk, using the fitted model.

See also bw.relrisk, density.ppp, Smooth.ppp, eval.im

Examples

   p.oak <- relrisk(urkiola, 20)
   if(interactive()) {
      plot(p.oak, main="proportion of oak")
      plot(eval.im(p.oak > 0.3), main="More than 30 percent oak")
      plot(split(lansing), main="Lansing Woods")
      p.lan <- relrisk(lansing, 0.05, se=TRUE)
      plot(p.lan$estimate, main="Lansing Woods species probability")
      plot(p.lan$SE, main="Lansing Woods standard error")
      wh <- im.apply(p.lan$estimate, which.max)
      types <- levels(marks(lansing))
      wh <- eval.im(types[wh])
      plot(wh, main="Most common species")
   }

spatstat.explore documentation built on Oct. 22, 2024, 9:07 a.m.