View source: R/ppmlasso_functions.R
ppmlasso  R Documentation 
The ppmlasso function fits point process models (either Poisson or areainteraction models) with a sequence of LASSO, adaptive LASSO or elastic net penalties (a "regularisation path").
ppmlasso(formula, sp.xy, env.grid, sp.scale, coord = c("X", "Y"), data = ppmdat(sp.xy = sp.xy, sp.scale = sp.scale, back.xy = env.grid, coord = c("X","Y"), sp.file = NA, quad.file = NA, datfilename = "PPMDat", writefile = writefile), lamb = NA, n.fits = 200, ob.wt = NA, criterion = "bic", alpha = 1, family = "poisson", tol = 1.e9, gamma = 0, init.coef = NA, mu.min = 1.e16, mu.max = 1/mu.min, r = NA, interactions = NA, availability = NA, max.it = 25, min.lamb = 10, standardise = TRUE, n.blocks = NA, block.size = sp.scale*100, seed = 1, writefile = TRUE)
formula 
The formula of the fitted model. For a point process model, the correct form is 
sp.xy 
A matrix of species locations containing at least one column representing
longitude and one column representing latitude. Environmental variables are
interpolated to the locations of 
env.grid 
The georeferenced matrix of environmental grids. This matrix is used to
generate quadrature points using the 
sp.scale 
The spatial resolution at which to define the regular grid of quadrature
points. 
coord 
A vector containing the names of the longitude and latitude coordinates. 
data 
An optional data matrix generated from the 
lamb 
A vector of penalty values that will be used to create the regularisation path.
If 
n.fits 
The number of models fitted in the regularisation path. If 
ob.wt 
Quadrature weights, usually inherited from the 
criterion 
The penalisation criteria to be optimised by the regularisation path. The
options include 
alpha 
The elastic net parameter. The form of the penalty is α*λ*∑_{j = 1}^p β_j + (1  α)*λ*∑_{j = 1}^p (β_j)^2. The default value 
family 
The family of models to be fitted – 
tol 
The convergence threshold for the descent algorithm. The algorithm continues
for a maximum of 
gamma 
The exponent of the adaptive weights for the adaptive LASSO penalty. The
default value 
init.coef 
The initial coefficients used for an adaptive LASSO penalty. 
mu.min 
The threshold for small fitted values. Any fitted value less than the threshold
is set to 
mu.max 
The threshold for large fitted values. Any fitted value larger than the threshold
will be set to 
r 
The radius of point interactions, required if 
interactions 
A vector of point interactions calculated from the 
availability 
An optional binary matrix used in calculating point interactions indicating
whether locations are available (1) or not (0). See 
max.it 
The maximum number of iterations of the descent algorithm for fitting the model. 
min.lamb 
The power x of smallest penalty e^x among the 
standardise 
A logical argument indicating whether the environmental variables should be standardised to have mean 0 and variance 1. It is recommended that variables are standardised for analysis. 
n.blocks 
This argument controls the number of cross validation groups into which the spatial blocks
are divided if the 
block.size 
The length of the edges for the spatial blocks created if the 
seed 
The random seed used for controlling the allocation of spatial blocks to cross validation groups
if the 
writefile 
A logical argument passed to the 
This function fits a regularisation path of point process models provided a list of species locations and a georeferenced grid of environmental data. It is assumed that Poisson point process models (Warton & Shepherd, 2010) fit intensity as a loglinear model of environmental covariates, and that areainteraction models (Widom & Rowlinson, 1970; Baddeley & van Lieshout, 1995) fit conditional intensity as a loglinear model of environmental covariates and point interactions. Parameter coefficients are estimated by maximum likelihood for Poisson point process models and by maximum pseudolikelihood (Besag, 1977) for areainteraction models. The expressions for both the likelihood and pseudolikelihood involve an intractable integral which is approximated using a quadrature scheme (Berman & Turner, 1992).
Each model in the regularisation path is fitted by extending the Osborne descent algorithm (Osborne, 2000) to generalised linear models with penalised iteratively reweighted least squares.
Three classes of penalty p(β) are available for the vector of parameter coefficients β:
For the LASSO (Tibshirani, 1996), p(β) = λ*∑_{j = 1}^p β_j
For the adaptive LASSO (Zou, 2006), p(β) = λ*∑_{j = 1}^p w_j*β_j, where w_j = 1/\hat{β}_{init, j}^γ for some initial estimate of parameters \hat{β}_{init}.
For the elastic net (Zou & Hastie, 2005), α*λ*∑_{j = 1}^p β_j + (1  α)*λ*∑_{j = 1}^p (β_j)^2. Note that this form of the penalty is a restricted case of the general elastic net penalty.
There are various criteria available for managing the biasvariance tradeoff (Renner, 2013). The default choice is BIC, the Bayesian Information Criterion, which has been shown to have good performance.
An alternative criterion useful when data are sparse is MSI, the maximum score of the intercept model (Renner, in prep). For a set of m presence locations, the MSI penalty is λ_{MSI} = λ_{max}/√{m}, where λ_{max} is the smallest penalty that shrinks all environmental coefficients to zero. The MSI penalty differs from the other criteria in that does not require an entire regularisation path to be fitted.
It is also possible to control the magnitude of the penalty by spatial cross validation by setting the
criterion
argument to "blockCV"
. The study region is then divided into square blocks with edge
lengths controlled by the block.size
argument, which are assigned to one of a number of cross validation
groups controlled by the n.groups
argument. The penalty which maximises the predicted loglikelihood is
chosen.
An object of class "ppmlasso"
, with elements:
betas 
A matrix of fitted coefficients of the 
lambdas 
A vector containing the 
likelihoods 
A vector containing the likelihood of 
pen.likelihoods 
A vector containing the penalised likelihood of 
beta 
A vector containing the coefficients of the model that optimises the specified 
lambda 
The penalty value of the model that optimises the specified 
mu 
A vector of fitted values from the model that optimises the specified 
likelihood 
The likelihood of the model that optimises the specified 
criterion 
The specified 
family 
The specified 
gamma 
The specified 
alpha 
The specified 
init.coef 
The specified 
criterion.matrix 
A matrix with 
data 
The design matrix. For the point process models fitted with this function,

pt.interactions 
The calculated point interactions. 
wt 
The vector of quadrature weights. 
pres 
A vector indicating presence (1) or quadrature point (0). 
x 
A vector of point longitudes. 
y 
A vector of point latitudes. 
r 
The radius of point interactions. 
call 
The function call. 
formula 
The 
s.means 
If 
s.sds 
If 
cv.group 
The cross validation group associated with each point in the data set. 
n.blocks 
The number of cross validation groups specified. 
Ian W. Renner
Baddeley, A.J. & van Lieshout, M.N.M. (1995). Areainteraction point processes. Annals of the Institute of Statistical Mathematics 47, 601619.
Berman, M. & Turner, T.R. (1992). Approximating point process likelihoods with GLIM. Journal of the Royal Statistics Society, Series C 41, 3138.
Besag, J. (1977). Some methods of statistical analysis for spatial data. Bulletin of the International Statistical Institute 47, 7791.
Osborne, M.R., Presnell, B., & Turlach, B.A. (2000). On the lasso and its dual. Journal of Computational and Graphical Statistics 9, 319337.
Renner, I.W. & Warton, D.I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics 69, 274281.
Renner, I.W. (2013). Advances in presenceonly methods in ecology. https://unsworks.unsw.edu.au/fapi/datastream/unsworks:11510/SOURCE01
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267288.
Warton, D.I. & Shepherd, L.C. (2010). Poisson point process models solve the "pseudoabsence problem" for presenceonly data in ecology. Annals of Applied Statistics 4, 13831402.
Widom, B. & Rowlinson, J.S. (1970). New model for the study of liquidvapor phase transitions. The Journal of Chemical Physics 52, 16701684.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 14181429.
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301320.
print.ppmlasso
for printing features of the fitted regularisation path.
predict.ppmlasso
for predicting intensity for a set of new data.
envelope.ppmlasso
for constructing a Kenvelope of the model which optimises
the given criterion from the spatstat
package.
diagnose.ppmlasso
for diagnostic plots from the spatstat
package.
# Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) #Fit a regularisation path of areainteraction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains$availability, n.fits = 20, writefile = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.