Fit copCAR model to discrete areal data.

Description

Fit the copCAR model to areal data consisting of Poisson or Bernoulli marginal observations.

Usage

1
2
3
copCAR(formula, A, family, method = c("CML", "DT", "CE"),
  conf.int = c("none", "bootstrap", "asymptotic"), data, offset = NULL,
  control = list())

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of the model specification are given under "Details".

A

the symmetric binary adjacency matrix for the underlying graph. adjacency.matrix generates an adjacency matrix for the m by n square lattice.

family

the marginal distribution of the observations at the areal units and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.) Supported families are binomial and poisson.

method

the method for inference. copCAR supports the continous extension ("CE"), distributional transform ("DT"), and composite marginal likelihood ("CML").

conf.int

the method for computing confidence intervals. "asympototic" is appropriate for the continous extension. "bootstrap" performs a parametric boostrap appropriate for the distributional transform and composite marginal likelihood.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which copCAR is called.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of observations. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

control

a list of parameters for controlling the fitting process.

conf.level

the value 1 - α used for computing confidence intervals. Defaults to 0.95.

boot.iter

the size of the parametric bootstrap sample. This applies when conf.int = "bootstrap". Defaults to 500.

m

the number of independent standard uniforms used to approximate the expected likelhood when method = "CE". This applies when method = "CE". Defaults to 1000.

mcse

logical. Should the Monte Carlo standard error for ρ be computed? Use only when method = "CE". Defaults to FALSE. The Monte Carlo standard error is calculated using a sample size of mcse.iter.

mcse.iter

the Monte Carlo standard error sample size for ρ when method = "CE" and mcse = TRUE. Defaults to 100.

rhoMax

the value ρ^{\max}, which is the maximum value of ρ used to approximate the CAR variances when method = "CE" or method = "DT". If missing, assumed to be 0.999.

epsilon

the tolerance ε > 0 used to approximate the CAR variances when method = "CE" or method = "DT". If missing, assumed to be 0.001.

verbose

a logical value indicating whether to print bootstrap or mcse progress to the screen. Defaults to FALSE.

Details

This function performs frequentist inference for the copCAR model of Hughes (2014), a copula-based areal regression model that uses the conditional autoregression (CAR) from the spatial generalized linear mixed model (Besag, 1974). Specifically, copCAR uses the CAR copula, a Caussian copula based on the proper CAR. The CAR copula is specified as

C_{Q^{-1}}(u) = Φ_{Q^{-1}}(Φ_{σ_1}^{-1}(u_1), …, Φ_{σ_n}^{-1}(u_n)),

where Φ_{σ_i} denotes the cdf of the normal distribution with mean zero and variance σ_i^2, Q = D - ρ A such that τ Q is the precision matrix of the proper CAR, A is the adjacency matrix for the underlying graph, D = diag(d_1, …, d_n) where d_i is the degree of vertex i of the underlying graph, and u = (u_1, …, u_n)' is a realization of the copula such that z_i = F_i^{-1}(u_i) for the marginal observation z_i having desired marginal distribution function F_i. For Bernoulli marginals, the expectation is (1 + \exp(-x_i'β))^{-1}; for Poisson marginals, the expectation is \exp(x_i'β), where β = (β_1, …, β_p)' is the regression coefficient. Note that the CAR variances (σ_1^2, …, σ_n^2)' = vecdiag(Q^{-1}) are not free parameters but are determined by the spatial dependence parameter ρ.

The spatial dependence parameter ρ \in [0, 1) and regression coefficient β = (β_1, …, β_p)' \in R^p can be estimated using the continous extension (CE) (Madsen, 2009), distribtional transform (DT) (Kazianka and Pilz, 2010), or composite marginal likelihood (CML) (Varin, 2008).

The CE approach optimizes an approximate maximum likelihood by sampling m independent standard uniform vectors of length n used to transform the discrete observations into continous random variables via convolution (Denuit and Lambert, 2005). The size of m can be choosen by computing Monte Carlo standard errors (Flegal et al., 2008). If the Monte Carlo standard error of the estimate for ρ is small relative to the sample mean, that is, if the estimated coefficient of variation is sufficiently small, the current value of m is sufficiently large. The CE is exact up to Monte Carlo standard error, but is computationally intensive and not suitable for Bernoulli marginals. If requested, asymptotic confidence intervals for the parameters are computed using the observed inverse Fisher information.

The DT stochastically "smoothes" the jumps of the discrete distribution function, an approach that goes at least as far back as Ferguson (1967). The DT-based approximation performs well for Poisson marginals. Since the log-likelihood is misspecified, the asympototic covariance matrix is the Godambe information matrix (Godambe, 1960). This is estimated using a parametric bootstrap for the variance of the score when computing confidence intervals for the parameters.

The CML approach specifies the likelihood as a product of pairwise likelihoods of adjacent observations, and performs well for both Poisson and Bernoulli data. Similar to the DT, the log-likelihood is misspecified, so the confidence intervals for the parameters are computed via a parametric bootstrap.

In the CE and DT approaches, the CAR variances are approximated by (\tilde{σ}_1^2, …, \tilde{σ}_n^2)' such that (σ_i^2 - \tilde{σ}_i^2) < ε for every i = 1, …, n for a specified tolerance ε > 0 and every ρ \in [0, ρ^{\max}).

Value

copCAR returns an object of S3 class "copCAR", which is a list containing the following components:

coefficients

the point estimate of (ρ, β')'.

conf.int

(if conf.int is not "none") the confidence intervals for (ρ, β')'.

conf.type

the type of confidence interval specified.

conf.level

(if conf.int is not "none") the confidence level, 1 - α.

mcse

(if method = "CE" and mcse = TRUE) the Monte Carlo standard error of ρ.

mcse.iter

(if method = "CE" and mcse = TRUE) the Monte Carlo standard error sample size.

mcse.cv

(if method = "CE" and mcse = TRUE) the estimated coefficient of variation of ρ.

I.inv

(if conf.int = "asymptotic") the estimated inverse observed Fisher information matrix (hessian) for (Φ^{-1}(ρ), β')'.

G.inv

(if conf.int = "bootstrap") the estimated inverse Godambe information matrix (sandwich) for (Φ^{-1}(ρ), β')'.

se

(if conf.int = "bootstrap" or conf.int = "asymptotic") the estimated standard errors for (Φ^{-1}(ρ), β')'.

boot.iter

(if conf.int = "bootstrap") the number of parametric bootstrap samples.

Z

the response vector used.

X

the design matrix.

model

the model frame.

npar

the number of model parameters.

marginal.linear.predictors

linear predictors for the margins.

marginal.fitted.values

fitted values for the margins.

call

the matched call.

formula

the formula supplied.

method

the method used for inference.

convergence

the integer code returned by optim subsequent to optimizing the log-likelihood.

message

a character string to go along with convergence.

terms

the terms object used.

data

the data argument.

xlevels

(where relevant) a record of the levels of the factors used in fitting.

control

a list containing the names and values of the control parameters.

value

the value of the negative log-likelihood at its minimum.

References

Besag, J. (1974) Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Soceity, Series B, Methodological, 36(2), 192–236.

Denuit, M. and Lambert, P. (2005) Constraints on concordance measures in bivariate discrete data. Journal of Multivariate Analysis, 93, 40–57.

Ferguson, T. (1967) Mathematical statistics: a decision theoretic approach, New York: Academic Press.

Flegal, J., Haran, M., and Jones, G. (2008) Markov Chain Monte Carlo: can we trust the third significant figure? Statistical Science, 23(2), 250–260.

Godambe, V. (1960) An optimum property of regular maximum likelihood estimation. The Annals of Mathmatical Statistics, 31(4), 1208–1211.

Kazianka, H. and Pilz, J. (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stochastic Environmental Research and Risk Assessment, 24(5), 661–673.

Madsen, L. (2009) Maximum likelihood estimation of regression parameters with spatially dependent discrete data. Journal of Agricultural, Biological, and Environmental Statistics, 14(4), 375–391.

Varin, C. (2008) On composite marginal likelihoods. Advances in Statistical Analysis, 92(1), 1–28.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Not run: 
# Simulate data and fit copCAR model.

# Use the 20 x 20 square lattice as the underlying graph.
m = 20
A = adjacency.matrix(m)

# Set dependence parameter and regression coefficients.
rho = 0.8
beta = c(1, 1)

# Create design matrix by assigning coordinates to each vertex
# such that the coordinates are restricted to the unit square.
x = rep(0:(m - 1) / (m - 1), times = m)
y = rep(0:(m - 1) / (m - 1), each = m)
X = cbind(x, y)

# Draw Poisson data from copCAR model.
Z = rcopCAR(rho, beta, X, A, family = poisson(link = "log"))

# Fit the copCAR model using the continous extension and
# compute 95% (default) aysmptotic CI for rho and beta.
fit.CE = copCAR(Z ~ X - 1, A, family = poisson, method = "CE", conf.int = "asymptotic")
summary(fit.CE)

# Fit the copCAR model using the distributional transform and
# compute 90% CI for rho and beta using 100 bootstrap iterations.
fit.DT = copCAR(Z ~ X - 1, A, family = poisson, method = "DT", conf.int = "bootstrap",
                control = list(conf.level = 0.90, boot.iter = 100))
summary(fit.DT)

# Fit the copCAR model using composite marginal likelihood and
# do not compute a CI for rho and beta.
fit.CML = copCAR(Z ~ X - 1, A, family = poisson, method = "CML", conf.int = "none")
summary(fit.CML)

## End(Not run)