S.RAB: Fit a spatial generalised linear model with anisotropic basis...

View source: R/S.RAB.R

S.RABR Documentation

Fit a spatial generalised linear model with anisotropic basis functions to data for computationally efficient localised spatial smoothing, where the parameters are estimated by penalised maximum likelihood estimation with a ridge regression penalty.

Description

Fit a spatial generalised linear model to areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and a set of K anisotropic spatial basis functions. The basis functions are constructed from the set of geodesic distances between all pairs of areal units and a vector of ancillary data V, and the latter should have a similar spatial pattern to the residual (after covariate adjustment) spatial structure in the data on the linear predictor scale. Parameter estimtion is carried out via penalised maximum likelihood methods, and the basis function coefficients are constrained by a ridge regression penalty to prevent overfitting. The glmnet package is used for parameter estimation. Missing (NA) values are allowed in the response, and predictions are made for these values. This model implements localised spatial smoothing and allows for boundaries in the data surface using a computationally efficient approach.

Usage

S.RAB(formula, family, data=NULL, trials=NULL, W, V, nlambda=100, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response, offset and each covariate is a vector of length K*1. The response can contain missing (NA) values.

family

One of either "binomial", "gaussian", or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length as the response containing the total number of trials for each area. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

V

A vector of ancillary data of length K, which should have a similar spatial pattern to the residual (after covariate adjustment) spatial structure in the data on the linear predictor scale.

nlambda

The number of possible values to use for the penalty parameter lambda in the glmnet() estimation function. Defaults to 100.

verbose

Logical, should the function update the user on its progress.

Value

beta.hat

The estimated regression parameters.

sigma2.hat

The estimated error variance in the Gaussian data likelihood model. If a Gaussian model is not specified it is NA.

lambda.hat

The estimated ridge regression penalty parameter.

I

The level of residual spatial autocorrelation as measured by Moran's I statistic.

fitted.values

The fitted values from the model.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area. The types are "response" (raw), and "pearson".

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model.string

A text string describing the model fit.

X

The design matrix of covariates and spatial basis functions.

model

The fitted model object from the glmnet() function.

Author(s)

Duncan Lee

References

Lee, D (2024). Computationally efficient localised spatial smoothing of disease rates using anisotropic basis functions and penalised regression fitting, Spatial Statistics, 59, 100796.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### Load other libraries required
library(MASS)

#### Set up a square lattice region
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)

#### set up distance and neighbourhood (W, based on sharing a common border) matrices
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	
	
#### Generate the spatial covariance structure
dists <- as.numeric(distance[upper.tri(distance)])
dists.quant <- quantile(dists, 0.05)
rho <- log(0.75) / -dists.quant
Sigma <- exp(-rho * distance)

#### Generate the boundaries
groups <-rep(0, K) 
groups[Grid$Var2>5] <- 1

#### Generate the covariates and response data
x1 <- rnorm(K)
x2 <- rnorm(K)
phi <- mvrnorm(n=1, mu=rep(0,K), Sigma=0.1 * exp(-rho * distance))
logit <- x1 +  x2 + phi + 0.4 * groups
prob <- exp(logit) / (1 + exp(logit))
trials <- rep(100,K)
Y <- rbinom(n=K, size=trials, prob=prob)

#### Generate the ancillary data
V <- rnorm(n=K, mean=phi + 0.4*groups , sd=rep(0.05,K))

#### Run the RAB model
mod <- S.RAB(formula=Y~x1+x2, family="binomial", data=NULL, trials=trials, W=W, 
        V=V, nlambda=50, verbose=TRUE)

CARBayes documentation built on May 29, 2024, 7:44 a.m.