kppm: Fit Cluster or Cox Point Process Model

View source: R/kppm.R

kppmR Documentation

Fit Cluster or Cox Point Process Model

Description

Fit a homogeneous or inhomogeneous cluster process or Cox point process model to a point pattern.

Usage

  kppm(X, ...)

  ## S3 method for class 'formula'
kppm(X,
                clusters = c("Thomas","MatClust","Cauchy","VarGamma","LGCP"),
                ...,
                data=NULL)

  ## S3 method for class 'ppp'
kppm(X,
       trend = ~1,
       clusters = c("Thomas","MatClust","Cauchy","VarGamma","LGCP"),
       data = NULL,
       ...,
       covariates=data,
       subset,
       method = c("mincon", "clik2", "palm", "adapcl"),
       penalised = FALSE,
       improve.type = c("none", "clik1", "wclik1", "quasi"),
       improve.args = list(),
       weightfun=NULL,
       control=list(),
       stabilize=TRUE,
       algorithm,
       trajectory=FALSE,
       statistic="K",
       statargs=list(),
       rmax = NULL,
       epsilon=0.01,
       covfunargs=NULL,
       use.gam=FALSE,
       nd=NULL, eps=NULL,
       ppm.improve.type=c("none", "ho", "enet"),
       ppm.improve.args=list())

## S3 method for class 'quad'
kppm(X,
       trend = ~1,
       clusters = c("Thomas","MatClust","Cauchy","VarGamma","LGCP"),
       data = NULL,
       ...,
       covariates=data,
       subset,
       method = c("mincon", "clik2", "palm", "adapcl"),
       penalised = FALSE,
       improve.type = c("none", "clik1", "wclik1", "quasi"),
       improve.args = list(),
       weightfun=NULL,
       control=list(),
       stabilize=TRUE,
       algorithm,
       trajectory=FALSE,
       statistic="K",
       statargs=list(),
       rmax = NULL,
       epsilon=0.01,
       covfunargs=NULL,
       use.gam=FALSE,
       nd=NULL, eps=NULL,
       ppm.improve.type=c("none", "ho", "enet"),
       ppm.improve.args=list())

Arguments

X

A point pattern dataset (object of class "ppp" or "quad") to which the model should be fitted, or a formula in the R language defining the model. See Details.

trend

An R formula, with no left hand side, specifying the form of the log intensity.

clusters

Character string determining the cluster model. Partially matched. Options are "Thomas", "MatClust", "Cauchy", "VarGamma" and "LGCP".

data, covariates

The values of spatial covariates (other than the Cartesian coordinates) required by the model. A named list of pixel images, functions, windows, tessellations or numeric constants.

...

Additional arguments. See Details.

subset

Optional. A subset of the spatial domain, to which the model-fitting should be restricted. A window (object of class "owin") or a logical-valued pixel image (object of class "im"), or an expression (possibly involving the names of entries in data) which can be evaluated to yield a window or pixel image.

method

The fitting method. Either "mincon" for minimum contrast, "clik2" for second order composite likelihood, "adapcl" for adaptive second order composite likelihood, or "palm" for Palm likelihood. Partially matched.

penalised

Logical value specifying whether the objective function (the composite likelihood or contrast) should be modified by adding a penalty against extreme values of cluster scale.

improve.type

Method for updating the initial estimate of the trend. Initially the trend is estimated as if the process is an inhomogeneous Poisson process. The default, improve.type = "none", is to use this initial estimate. Otherwise, the trend estimate is updated by improve.kppm, using information about the pair correlation function. Options are "clik1" (first order composite likelihood, essentially equivalent to "none"), "wclik1" (weighted first order composite likelihood) and "quasi" (quasi likelihood).

improve.args

Additional arguments passed to improve.kppm when improve.type != "none". See Details.

weightfun

Optional weighting function w in the composite likelihoods or Palm likelihood. A function in the R language, or one of the strings "threshold" or "taper". See Details.

control

List of control parameters passed to the optimization function optim.

stabilize

Logical value specifying whether to numerically stabilize the optimization algorithm, by specifying suitable default values of control$fnscale and control$parscale.

algorithm

Character string determining the mathematical algorithm to be used to solve the fitting problem. If method="mincon", "clik2" or "palm" this argument is passed to the generic optimization function optim (renamed as the argument method to optim) with default "Nelder-Mead". If method="adapcl" the argument is passed to the equation solver nleqslv (renamed as the argument method to nleqslv) with default "Bryden".

trajectory

Logical value specifying whether to save the history of all function evaluations performed by the optimization algorithm.

statistic

Name of the summary statistic to be used for minimum contrast estimation: either "K" or "pcf".

statargs

Optional list of arguments to be used when calculating the statistic. See Details.

rmax

Maximum value of interpoint distance to use in the composite likelihood.

epsilon

Tuning parameter for the adaptive composite likelihood method.

covfunargs, use.gam, nd, eps

Arguments passed to ppm when fitting the intensity.

ppm.improve.type, ppm.improve.args

Arguments controlling the initial fit of the trend. Passed to ppm as the arguments improve.type and improve.args respectively.

Details

This function fits a clustered point process model to the point pattern dataset X.

The model may be either a Neyman-Scott cluster process or another Cox process. The type of model is determined by the argument clusters. Currently the options are clusters="Thomas" for the Thomas process, clusters="MatClust" for the \Matern cluster process, clusters="Cauchy" for the Neyman-Scott cluster process with Cauchy kernel, clusters="VarGamma" for the Neyman-Scott cluster process with Variance Gamma kernel (requires an additional argument nu to be passed through the dots; see rVarGamma for details), and clusters="LGCP" for the log-Gaussian Cox process (may require additional arguments passed through ...; see rLGCP for details on argument names). The first four models are Neyman-Scott cluster processes.

The algorithm first estimates the intensity function of the point process using ppm. The argument X may be a point pattern (object of class "ppp") or a quadrature scheme (object of class "quad"). The intensity is specified by the trend argument. If the trend formula is ~1 (the default) then the model is homogeneous. The algorithm begins by estimating the intensity as the number of points divided by the area of the window. Otherwise, the model is inhomogeneous. The algorithm begins by fitting a Poisson process with log intensity of the form specified by the formula trend. (See ppm for further explanation).

The argument X may also be a formula in the R language. The right hand side of the formula gives the trend as described above. The left hand side of the formula gives the point pattern dataset to which the model should be fitted.

If improve.type="none" this is the final estimate of the intensity. Otherwise, the intensity estimate is updated, as explained in improve.kppm. Additional arguments to improve.kppm are passed as a named list in improve.args.

The cluster parameters of the model are then fitted either by minimum contrast estimation, or by a composite likelihood method (maximum composite likelihood, maximum Palm likelihood, or by solving the adaptive composite likelihood estimating equation).

Minimum contrast:

If method = "mincon" (the default) clustering parameters of the model will be fitted by minimum contrast estimation, that is, by matching the theoretical K-function of the model to the empirical K-function of the data, as explained in mincontrast.

For a homogeneous model ( trend = ~1 ) the empirical K-function of the data is computed using Kest, and the parameters of the cluster model are estimated by the method of minimum contrast.

For an inhomogeneous model, the inhomogeneous K function is estimated by Kinhom using the fitted intensity. Then the parameters of the cluster model are estimated by the method of minimum contrast using the inhomogeneous K function. This two-step estimation procedure is due to Waagepetersen (2007).

If statistic="pcf" then instead of using the K-function, the algorithm will use the pair correlation function pcf for homogeneous models and the inhomogeneous pair correlation function pcfinhom for inhomogeneous models. In this case, the smoothing parameters of the pair correlation can be controlled using the argument statargs, as shown in the Examples.

Additional arguments ... will be passed to clusterfit to control the minimum contrast fitting algorithm.

The optimisation is performed by the generic optimisation algorithm optim.

Second order composite likelihood:

If method = "clik2" the clustering parameters of the model will be fitted by maximising the second-order composite likelihood (Guan, 2006). The log composite likelihood is

\sum_{i,j} w(d_{ij}) \log\rho(d_{ij}; \theta) - \left( \sum_{i,j} w(d_{ij}) \right) \log \int_D \int_D w(\|u-v\|) \rho(\|u-v\|; \theta)\, du\, dv

where the sums are taken over all pairs of data points x_i, x_j separated by a distance d_{ij} = \| x_i - x_j\| less than rmax, and the double integral is taken over all pairs of locations u,v in the spatial window of the data. Here \rho(d;\theta) is the pair correlation function of the model with cluster parameters \theta.

The function w in the composite likelihood is a weighting function and may be chosen arbitrarily. It is specified by the argument weightfun. If this is missing or NULL then the default is a threshold weight function, w(d) = 1(d \le R), where R is rmax/2. If it is specified, the argument weightfun should be a function in the R language with one argument. Alternatively weightfun may be one of the strings "threshold" or "taper" representing the functions w(d) = 1(d \le R) and w(d) = min(1, R/d) respectively.

The optimisation is performed by the generic optimisation algorithm optim.

Palm likelihood:

If method = "palm" the clustering parameters of the model will be fitted by maximising the Palm loglikelihood (Tanaka et al, 2008)

\sum_{i,j} w(x_i, x_j) \log \lambda_P(x_j \mid x_i; \theta) - \int_D w(x_i, u) \lambda_P(u \mid x_i; \theta) {\rm d} u

with the same notation as above. Here \lambda_P(u|v;\theta) is the Palm intensity of the model at location u given there is a point at v.

The optimisation is performed by the generic optimisation algorithm optim.

Adaptive Composite likelihood:

If method = "cladap" the clustering parameters of the model will be fitted by solving the adaptive second order composite likelihood estimating equation (Lavancier et al, 2021). The estimating function is

\sum_{u, v} w(\epsilon \frac{| g(0; \theta) - 1 |}{g(\|u-v\|; \theta)-1}) \frac{\nabla_\theta g(\|u-v\|;\theta)}{g(\|u-v\|;\theta)} - \int_D \int_D w(\epsilon \frac{ | g(u,v; \theta) - 1|}{g(\|u-v\|; \theta)-1}) \nabla_\theta g(\|u-v\|; \theta) \rho(u) \rho(v)\, du\, dv

where the sum is taken over all distinct pairs of points. Here g(d;\theta) is the pair correlation function with parameters \theta. The partial derivative with respect to \theta is g'(d; \theta), and \rho(u) denotes the fitted intensity function of the model.

The tuning parameter \epsilon is independent of the data. It can be specified by the argument epsilon and has default value 0.01.

The function w in the estimating function is a weighting function of bounded support [-1,1]. It is specified by the argument weightfun. If this is missing or NULL then the default is w(d) = 1(\|d\| \le 1) \exp(1/(r^2-1)) The estimating equation is solved using the nonlinear equation solver nleqslv from the package nleqslv. The package nleqslv must be installed in order to use this option.

If penalised=TRUE, the fitting procedure is modified by adding a penalty against extreme values of the cluster scale, as proposed by Baddeley et al (2022).

If trajectory=TRUE, the resulting object contains the history of all points in the cluster parameter space which were evaluated by the optimization algorithm. The trajectory can be extracted by traj(fit) or traj(obsurf(fit)) where fit is the fitted model object.

Value

An object of class "kppm" representing the fitted model. There are methods for printing, plotting, predicting, simulating and updating objects of this class.

Cluster parameters for Neyman-Scott models

For Neyman-Scott models, the fitting procedure searches for the best-fitting values of the parameters that control the intensity of parents and the physical scale of the clusters. (Any parameters that control the shape of the clusters must be specified separately and are assumed to be fixed.)

The fitted object fit contains the fitted cluster parameters as the element fit$par in the format described below. Initial estimates for these cluster parameters can be specified using the argument startpar in the same format.

The cluster parameters will be stored in a named numeric vector par of length 2. The first value is always kappa, the intensity of parents (cluster centres). The format is as follows:

  • for clusters="Thomas", a vector c(kappa, sigma2) where sigma2 is the square of the cluster standard deviation;

  • for clusters="MatClust", a vector c(kappa, R) where R is the radius of the cluster;

  • for clusters="Cauchy", a vector c(kappa, eta2) where eta2 = code{4 * scale^2} where scale is the scale parameter for the model as used in rCauchy;

  • for clusters="VarGamma", a vector c(kappa, eta) where eta is equivalent to the scale parameter omega used in rVarGamma.

For clusters="VarGamma" it will be necessary to specify the shape parameter nu as described in the help for rVarGamma. This is specified separately as an argument nu in the call to kppm.

Optimization algorithm

The following details allow greater control over the fitting procedure.

For the first three fitting methods (method="mincon", "clik2" and "palm"), the optimisation is performed by the generic optimisation algorithm optim. The behaviour of this algorithm can be controlled by the following arguments to kppm:

  • startpar determines the initial estimates of the cluster parameters.

  • algorithm determines the particular optimization method. This argument is passed to optim as the argument method. Options are listed in the help for optim. The default is the Nelder-Mead simplex method.

  • control is a named list of control parameters, documented in the help for optim. Useful control arguments include trace, maxit and abstol.

  • lower and upper specify bounds for the cluster parameters, when algorithm="L-BFGS-B" or algorithm="Brent", as described in the help for optim.

For method="adapcl", the estimating equation is solved using the nonlinear equation solver nleqslv from the package nleqslv. The package nleqslv must be installed in order to use this option. The behaviour of this algorithm can be controlled by the following arguments to kppm:

  • startpar determines the initial estimates of the cluster parameters.

  • algorithm determines the method for solving the equation. This argument is passed to nleqslv as the argument method. Options are listed in the help for nleqslv.

  • globStrat determines the global strategy to be applied. This argument is is passed to nleqslv as the argument global. Options are listed in the help for nleqslv.

  • control is a named list of control parameters, documented in the help for nleqslv.

Log-Gaussian Cox Models

To fit a log-Gaussian Cox model, specify clusters="LGCP" and use additional arguments to specify the covariance structure. These additional arguments can be given individually in the call to kppm, or they can be collected together in a list called covmodel.

For example a \Matern model with parameter \nu=0.5 could be specified either by kppm(X, clusters="LGCP", model="matern", nu=0.5) or by kppm(X, clusters="LGCP", covmodel=list(model="matern", nu=0.5)).

The argument model specifies the type of covariance model: the default is model="exp" for an exponential covariance. Additional arguments specify the shape parameters of the covariance model. For example if model="matern" then the additional argument nu is required.

The available models are as follows:

model="exponential":

the exponential covariance function

C(r) = \sigma^2 \exp(-r/h)

where \sigma^2 is the (fitted) variance parameter, and h is the (fitted) scale parameter. No shape parameters are required.

model="gauss":

the Gaussian covariance function

C(r) = \sigma^2 \exp(-(r/h)^2)

where \sigma^2 is the (fitted) variance parameter, and h is the (fitted) scale parameter. No shape parameters are required.

model="stable":

the stable covariance function

C(r) = \sigma^2 \exp(-(r/h)^\alpha)

where \sigma^2 is the (fitted) variance parameter, h is the (fitted) scale parameter, and \alpha is the shape parameter alpha. The parameter alpha must be given, either as a stand-alone argument, or as an entry in the list covmodel.

model="gencauchy":

the generalised Cauchy covariance function

C(r) = \sigma^2 (1 + (x/h)^\alpha)^{-\beta/\alpha}

where \sigma^2 is the (fitted) variance parameter, h is the (fitted) scale parameter, and \alpha and \beta are the shape parameters alpha and beta. The parameters alpha and beta must be given, either as stand-alone arguments, or as entries in the list covmodel.

model="matern":

the Whittle-\Matern covariance function

C(r) = \sigma^2 \frac{1}{2^{\nu-1} \Gamma(\nu)} (\sqrt{2 \nu} \, r/h)^\nu K_\nu(\sqrt{2\nu}\, r/h)

where \sigma^2 is the (fitted) variance parameter, h is the (fitted) scale parameter, and \nu is the shape parameter nu. The parameter nu must be given, either as a stand-alone argument, or as an entry in the list covmodel.

Note that it is not possible to use anisotropic covariance models because the kppm technique assumes the pair correlation function is isotropic.

Error and warning messages

See ppm.ppp for a list of common error messages and warnings originating from the first stage of model-fitting.

Author(s)

\spatstatAuthors

, with contributions from \abdollah and \rasmus. Adaptive composite likelihood method contributed by Chiara Fend and modified by Adrian Baddeley. Penalised optimization developed by Adrian Baddeley, \tilman and \martinH.

References

Baddeley, A., Davies, T.M., Hazelton, M.L., Rakshit, S. and Turner, R. (2022) Fundamental problems in fitting spatial cluster process models. Spatial Statistics 52, 100709. DOI: 10.1016/j.spasta.2022.100709

Guan, Y. (2006) A composite likelihood approach in fitting spatial point process models. Journal of the American Statistical Association 101, 1502–1512.

Guan, Y., Jalilian, A. and Waagepetersen, R. (2015) Quasi-likelihood for spatial point processes. Journal of the Royal Statistical Society, Series B 77, 677-697.

Jalilian, A., Guan, Y. and Waagepetersen, R. (2012) Decomposition of variance for spatial Cox processes. Scandinavian Journal of Statistics 40, 119–137.

Lavancier, F., Poinas, A., and Waagepetersen, R. (2021) Adaptive estimating function inference for nonstationary determinantal point processes. Scandinavian Journal of Statistics, 48 (1), 87–107.

Tanaka, U. and Ogata, Y. and Stoyan, D. (2008) Parameter estimation and model selection for Neyman-Scott point processes. Biometrical Journal 50, 43–57.

Waagepetersen, R. (2007) An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics 63, 252–258.

See Also

Methods for kppm objects: plot.kppm, fitted.kppm, predict.kppm, simulate.kppm, update.kppm, vcov.kppm, methods.kppm, as.ppm.kppm, as.fv.kppm, Kmodel.kppm, pcfmodel.kppm.

See also improve.kppm for improving the fit of a kppm object.

Minimum contrast fitting algorithm: higher level interface clusterfit; low-level algorithm mincontrast.

Alternative fitting algorithms: thomas.estK, matclust.estK, lgcp.estK, cauchy.estK, vargamma.estK, thomas.estpcf, matclust.estpcf, lgcp.estpcf, cauchy.estpcf, vargamma.estpcf.

Summary statistics: Kest, Kinhom, pcf, pcfinhom.

For fitting Poisson or Gibbs point process models, see ppm.

Examples

  online <- interactive()
  if(!online) op <- spatstat.options(npixel=32, ndummy.min=16)

  # method for point patterns
  kppm(redwood, ~1, "Thomas")
  # method for formulas
  kppm(redwood ~ 1, "Thomas")

  # different models for clustering
  if(online) kppm(redwood ~ x, "MatClust") 
  kppm(redwood ~ x, "MatClust", statistic="pcf", statargs=list(stoyan=0.2)) 
  kppm(redwood ~ x, cluster="Cauchy", statistic="K")
  kppm(redwood, cluster="VarGamma", nu = 0.5, statistic="pcf")

  # log-Gaussian Cox process (LGCP) models
  kppm(redwood ~ 1, "LGCP", statistic="pcf")
  kppm(redwood ~ x, "LGCP", statistic="pcf",
                            model="matern", nu=0.3,
                            control=list(maxit=10))

  # Different fitting techniques
  fitc <- kppm(redwood ~ 1, "Thomas", method="c")
  fitp <- kppm(redwood ~ 1, "Thomas", method="p")
  # penalised fit
  fitmp <- kppm(redwood ~ 1, "Thomas", penalised=TRUE)
  # quasi-likelihood improvement 
  fitq <- kppm(redwood ~ x, "Thomas", improve.type = "quasi")

  if(!online) spatstat.options(op)

spatstat.model documentation built on Sept. 30, 2024, 9:26 a.m.