# ppmlasso: Fit point process models with LASSO penalties In ppmlasso: Point Process Models with LASSO Penalties

## Description

The ppmlasso function fits point process models (either Poisson or area-interaction models) with a sequence of LASSO, adaptive LASSO or elastic net penalties (a "regularisation path").

## Usage

 1 2 3 4 5 6 7 8 ppmlasso(formula, sp.xy, env.grid, sp.scale, coord = c("X", "Y"), data = ppmdat(sp.xy = sp.xy, sp.scale = sp.scale, back.xy = env.grid, coord = c("X", "Y"), sp.file = NA, quad.file = NA, file.name = "TestPPM"), lamb = NA, n.fits = 200, ob.wt = NA, criterion = "bic", alpha = 1, family = "poisson", tol = 1e-09, gamma = 0, init.coef = NA, mu.min = 1e-16, mu.max = 1/mu.min, r = NA, interactions = NA, availability = NA, max.it = 25, standardise = TRUE, n.blocks = NA, block.size = sp.scale * 100, seed = 1) 

## Arguments

 formula The formula of the fitted model. For a point process model, the correct form is ~ variables. sp.xy A matrix of species locations containing at least one column representing longitude and one column representing latitude. Environmental variables are interpolated to the locations of sp.xy using the env.var function, unless the data argument is supplied. env.grid The geo-referenced matrix of environmental grids. This matrix is used to generate quadrature points using the sample.quad function, interpolate environmental data to the species locations of sp.xy using the env.var function, and calculate observation weights using the ppmdat function, unless the data argument is supplied. This creates a data matrix data which provides the variables for the formula argument. sp.scale The spatial resolution at which to define the regular grid of quadrature points. sample.quad will subsample from the rows of data that coincide with a regular grid at a resolution of sp.scale. coord A vector containing the names of the longitude and latitude coordinates. data An optional data matrix generated from the ppmdat function. Supplying a matrix to data is an alternative way of providing the environmental variables used in the formula argument, instead of specifying sp.xy and env.grid. lamb A vector of penalty values that will be used to create the regularisation path. If lamb = NA, the penalty values are automatically generated from the data and the n.fits argument. n.fits The number of models fitted in the regularisation path. If lamb = NA, the n.fits penalty values will be equally spaced on a logarithmic scale from exp(-10) to λ_{max}, the smallest penalty that shrinks all parameter coefficients to zero. ob.wt Quadrature weights, usually inherited from the ppmdat function. criterion The penalisation criteria to be optimised by the regularisation path. The options include "aic", "bic", "blockCV", "hqc", "gcv", "nlgcv" and "msi". alpha The elastic net parameter. The form of the penalty is α*λ*∑_{j = 1}^p |β_j| + (1 - α)*λ*∑_{j = 1}^p (β_j)^2. The default value alpha = 1 corresponds to a LASSO penalty, while alpha = 0 corresponds to a ridge regression penalty. family The family of models to be fitted – family = "poisson" for Poisson point process models or family = "area.inter" for area-interaction models. tol The convergence threshold for the descent algorithm. The algorithm continues for a maximum of max.it iterations until the difference in likelihood between successive fits falls below tol. gamma The exponent of the adaptive weights for the adaptive LASSO penalty. The default value gamma = 0 corresponds to a normal LASSO penalty. init.coef The initial coefficients used for an adaptive LASSO penalty. mu.min The threshold for small fitted values. Any fitted value less than the threshold is set to mu.min. mu.max The threshold for large fitted values. Any fitted value larger than the threshold will be set to mu.max. r The radius of point interactions, required if family = "area.inter". interactions A vector of point interactions calculated from the point.interactions function necessary for fitting area-interaction models. If interactions = NA and family = "area.inter", point interactions will be automatically calculated for radius r to the locations of data. availability An optional binary matrix used in calculating point interactions indicating whether locations are available (1) or not (0). See point.interactions for more details. max.it The maximum number of iterations of the descent algorithm for fitting the model. standardise A logical argument indicating whether the environmental variables should be standardised to have mean 0 and variance 1. It is recommended that variables are standardised for analysis. n.blocks This argument controls the number of cross validation groups into which the spatial blocks are divided if the criterion argument is set to "blockCV". See details. block.size The length of the edges for the spatial blocks created if the criterion argument is set to "blockCV". Only square spatial blocks are currently supported. See details. seed The random seed used for controlling the allocation of spatial blocks to cross validation groups if the criterion argument is set to "blockCV".

## Details

This function fits a regularisation path of point process models provided a list of species locations and a geo-referenced grid of environmental data. It is assumed that Poisson point process models (Warton & Shepherd, 2010) fit intensity as a log-linear model of environmental covariates, and that area-interaction models (Widom & Rowlinson, 1970; Baddeley & van Lieshout, 1995) fit conditional intensity as a log-linear model of environmental covariates and point interactions. Parameter coefficients are estimated by maximum likelihood for Poisson point process models and by maximum pseudolikelihood (Besag, 1977) for area-interaction models. The expressions for both the likelihood and pseudolikelihood involve an intractable integral which is approximated using a quadrature scheme (Berman & Turner, 1992).

Each model in the regularisation path is fitted by extending the Osborne descent algorithm (Osborne, 2000) to generalised linear models with penalised iteratively reweighted least squares.

Three classes of penalty p(β) are available for the vector of parameter coefficients β:

For the LASSO (Tibshirani, 1996), p(β) = λ*∑_{j = 1}^p |β_j|

For the adaptive LASSO (Zou, 2006), p(β) = λ*∑_{j = 1}^p w_j*|β_j|, where w_j = 1/|\hat{β}_{init, j}|^γ for some initial estimate of parameters \hat{β}_{init}.

For the elastic net (Zou & Hastie, 2005), α*λ*∑_{j = 1}^p |β_j| + (1 - α)*λ*∑_{j = 1}^p (β_j)^2. Note that this form of the penalty is a restricted case of the general elastic net penalty.

There are various criteria available for managing the bias-variance tradeoff (Renner, 2013). The default choice is BIC, the Bayesian Information Criterion, which has been shown to have good performance.

An alternative criterion useful when data are sparse is MSI, the maximum score of the intercept model (Renner, in prep). For a set of m presence locations, the MSI penalty is λ_{MSI} = λ_{max}/√{m}, where λ_{max} is the smallest penalty that shrinks all environmental coefficients to zero. The MSI penalty differs from the other criteria in that does not require an entire regularisation path to be fitted.

It is also possible to control the magnitude of the penalty by spatial cross validation by setting the criterion argument to "blockCV". The study region is then divided into square blocks with edge lengths controlled by the block.size argument, which are assigned to one of a number of cross validation groups controlled by the n.groups argument. The penalty which maximises the predicted log-likelihood is chosen.

## Value

An object of class "ppmlasso", with elements:

 betas A matrix of fitted coefficients of the n.fits models. lambdas A vector containing the n.fits penalty values. likelihoods A vector containing the likelihood of n.fits fitted models. pen.likelihoods A vector containing the penalised likelihood of n.fits fitted models. beta A vector containing the coefficients of the model that optimises the specified criterion. lambda The penalty value of the model that optimises the specified criterion. mu A vector of fitted values from the model that optimises the specified criterion. likelihood The likelihood of the model that optimises the specified criterion. criterion The specified criterion of the function call. family The specified family of the function call. gamma The specified gamma of the function call. alpha The specified alpha of the function call. init.coef The specified init.coef of the function call. criterion.matrix A matrix with n.fits rows corresponding to the observed values of AIC, BIC, HQC, GCV, and non-linear GCV. data The design matrix. For the point process models fitted with this function, mu = e^{data*beta}. pt.interactions The calculated point interactions. wt The vector of quadrature weights. pres A vector indicating presence (1) or quadrature point (0). x A vector of point longitudes. y A vector of point latitudes. r The radius of point interactions. call The function call. formula The formula argument. s.means If standardise = TRUE, the means of each column of data prior to standardisation. s.sds If standardise = TRUE, the standard deviations of each column of data prior to standardisation. cv.group The cross validation group associated with each point in the data set. n.blocks The number of cross validation groups specified.

Ian W. Renner

## References

Baddeley, A.J. & van Lieshout, M.N.M. (1995). Area-interaction point processes. Annals of the Institute of Statistical Mathematics 47, 601-619.

Berman, M. & Turner, T.R. (1992). Approximating point process likelihoods with GLIM. Journal of the Royal Statistics Society, Series C 41, 31-38.

Besag, J. (1977). Some methods of statistical analysis for spatial data. Bulletin of the International Statistical Institute 47, 77-91.

Osborne, M.R., Presnell, B., & Turlach, B.A. (2000). On the lasso and its dual. Journal of Computational and Graphical Statistics 9, 319-337.

Renner, I.W. & Warton, D.I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics 69, 274-281.

Renner, I.W. (2013). Advances in presence-only methods in ecology. http://unsworks.unsw.edu.au/fapi/datastream/unsworks:11510/SOURCE01

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267-288.

Warton, D.I. & Shepherd, L.C. (2010). Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology. Annals of Applied Statistics 4, 1383-1402.

Widom, B. & Rowlinson, J.S. (1970). New model for the study of liquid-vapor phase transitions. The Journal of Chemical Physics 52, 1670-1684.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418-1429.

Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301-320.

print.ppmlasso for printing features of the fitted regularisation path.

predict.ppmlasso for predicting intensity for a set of new data.

envelope.ppmlasso for constructing a K-envelope of the model which optimises the given criterion from the spatstat package.

diagnose.ppmlasso for diagnostic plots from the spatstat package.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 # Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree=2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20) #Fit a regularisation path of area-interaction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree=2) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains\$availability, n.fits = 20) 

### Example output

Loading required package: spatstat

spatstat 1.59-0       (nickname: 'J'ai omis les oeufs de caille')
For an introduction to spatstat, type 'beginner'

Note: R version 3.4.4 (2018-03-15) is more than 9 months old; we strongly recommend upgrading to the latest version
Calculating species environmental data for variable: FC
Calculating species environmental data for variable: D_MAIN_RDS
Calculating species environmental data for variable: D_URBAN
Calculating species environmental data for variable: RAIN_ANN
Calculating species environmental data for variable: TMP_MAX
Calculating species environmental data for variable: TMP_MIN
[1] "Output saved in the file SpEnvData.RData"
[1] "Output saved in the file TestPPM.RData"
Fitting Models: 1 of 20
Fitting Models: 2 of 20
Fitting Models: 3 of 20
Fitting Models: 4 of 20
Fitting Models: 5 of 20
Fitting Models: 6 of 20
Fitting Models: 7 of 20
Fitting Models: 8 of 20
Fitting Models: 9 of 20
Fitting Models: 10 of 20
Fitting Models: 11 of 20
Fitting Models: 12 of 20
Fitting Models: 13 of 20
Fitting Models: 14 of 20
Fitting Models: 15 of 20
Fitting Models: 16 of 20
Fitting Models: 17 of 20
Fitting Models: 18 of 20
Fitting Models: 19 of 20
Fitting Models: 20 of 20
Calculating species environmental data for variable: FC
Calculating species environmental data for variable: D_MAIN_RDS
Calculating species environmental data for variable: D_URBAN
Calculating species environmental data for variable: RAIN_ANN
Calculating species environmental data for variable: TMP_MAX
Calculating species environmental data for variable: TMP_MIN
[1] "Output saved in the file SpEnvData.RData"
[1] "Output saved in the file TestPPM.RData"
Calculating point interactions
Fitting Models: 1 of 20
Fitting Models: 2 of 20
Fitting Models: 3 of 20
Fitting Models: 4 of 20
Fitting Models: 5 of 20
Fitting Models: 6 of 20
Fitting Models: 7 of 20
Fitting Models: 8 of 20
Fitting Models: 9 of 20
Fitting Models: 10 of 20
Fitting Models: 11 of 20
Fitting Models: 12 of 20
Fitting Models: 13 of 20
Fitting Models: 14 of 20
Fitting Models: 15 of 20
Fitting Models: 16 of 20
Fitting Models: 17 of 20
Fitting Models: 18 of 20
Fitting Models: 19 of 20
Fitting Models: 20 of 20


ppmlasso documentation built on May 2, 2019, 8:20 a.m.