Empirical Bayes estimation for SGLMM

Share:

Description

Empirical Bayes estimation for SGLMM

Usage

1
2
3
4
5
6
7
8
ebsglmm(formula, family = c("gaussian", "binomial", "poisson", "Gamma",
  "GEV.binomial", "GEVD.binomial", "Wallace.binomial"), data, weights, subset,
  atsample, parskel, paroptim, corrfcn = c("matern", "spherical",
  "powerexponential"), Nout, Nthin = 1, Nbi = 0, Npro, Nprt = 1,
  Nprb = 0, betm0, betQ0, ssqdf, ssqsc, zstart, dispersion = 1,
  bfsize1 = 0.8, reference = 1, bfmethod = c("RL", "MW"),
  transf = FALSE, useCV = TRUE, longlat = FALSE, control = list(),
  verbose = TRUE)

Arguments

formula

A representation of the model in the form response ~ terms. The response must be set to NA's at the prediction locations (see the example in mcsglmm for how to do this using stackdata). At the observed locations the response is assumed to be a total of replicated measurements. The number of replications is inputted using the argument weights.

family

The distribution of the data. The "GEVbinomial" family is the binomial family with link the GEV link (see Details).

data

An optional data frame containing the variables in the model.

weights

An optional vector of weights. Number of replicated samples for Gaussian and gamma, number of trials for binomial, time length for Poisson.

subset

An optional vector specifying a subset of observations to be used in the fitting process.

atsample

A formula in the form ~ x1 + x2 + ... + xd with the coordinates of the sampled locations.

parskel

A data frame with the components "linkp", "phi", "omg", and "kappa", corresponding to the link function, the spatial range, the relative nugget, and the spatial smoothness parameters. The latter can be omitted if not used in the correlation function. Let k denote the number of rows. Then, k different MCMC samples will be taken from the models with parameters fixed at those values. For a square grid the output from the function expand.grid can be used here.

paroptim

A named list with the components "linkp", "phi", "omg", "kappa". Each component must be numeric with length 1, 2, or 3 with elements in increasing order but for the binomial family linkp is also allowed to be the character "logit" and "probit". If its length is 1, then the corresponding parameter is considered to be fixed at that value. If 2, then the two numbers denote the lower and upper bounds for the optimisation of that parameter (infinities are allowed). If 3, these correspond to lower bound, starting value, upper bound for the estimation of that parameter.

corrfcn

Spatial correlation function. See Details.

Nout

A scalar or vector of size k. Number of MCMC samples to take for each run of the MCMC algorithm for the estimation of the Bayes factors. See argument parskel.

Nthin

A scalar or vector of size k. The thinning of the MCMC algorithm for the estimation of the Bayes factors.

Nbi

A scalar or vector of size k. The burn-in of the MCMC algorithm for the estimation of the Bayes factors.

Npro

A scalar. The number of Gibbs samples to take for estimation of the conjugate parameters and for prediction at the unsampled locations while the other parameters are fixed at their empirical Bayes estimates.

Nprt

The thinning of the Gibbs algorithm for the estimation of the conjugate parameters and for prediction.

Nprb

The burn-in of the Gibbs algorithm for the estimation of the conjugate parameters and for prediction.

betm0

Prior mean for beta (a vector or scalar).

betQ0

Prior standardised precision (inverse variance) matrix. Can be a scalar, vector or matrix. The first two imply a diagonal with those elements. Set this to 0 to indicate a flat improper prior.

ssqdf

Degrees of freedom for the scaled inverse chi-square prior for the partial sill parameter.

ssqsc

Scale for the scaled inverse chi-square prior for the partial sill parameter.

zstart

Optional starting value for the MCMC for the GRF. This can be either a scalar, a vector of size n where n is the number of sampled locations, or a matrix with dimensions n by k where k is the number of the skeleton points in parskel.

dispersion

The fixed dispersion parameter.

bfsize1

A scalar or vector of length k with all integer values or all values in (0, 1]. How many samples (or what proportion of the sample) to use for estimating the Bayes factors at the first stage. The remaining sample will be used for estimating the Bayes factors in the second stage. Setting it to 1 will perform only the first stage.

reference

An integer between 1 and k. Which model to be used as a reference, i.e. the one that goes in the denominator of the Bayes factors.

bfmethod

Which method to use to calculate the Bayes factors: Reverse logistic or Meng-Wong.

transf

Whether to use the transformed sample mu for the computations. Otherwise it uses z.

useCV

Whether to use control variates for finer corrections.

longlat

How to compute the distance between locations. If FALSE, Euclidean distance, if TRUE Great Circle distance. See spDists.

control

A list of control parameters for the optimisation. See optim.

verbose

Whether to print messages when completing each stage on screen.

Details

Currently the following spatial correlation functions are implemented. Below, h denotes the distance between locations, d is the dimensionality of the locations, phi is the spatial range parameter and kappa is an additional parameter. The correlation r(u) beween locations with distance u apart is

Matern

r(h) = (1/(2^(kappa-1) * Gamma(kappa))) * ((h/phi)^kappa) * K_{kappa}(h/phi)

spherical

r(h) = 1 - 1.5 * (h/phi) + 0.5(h/phi)^3 if h < phi , 0 otherwise

Note that this is a valid correlation only for d <= 3.

powerexponential

r(h) = exp{-(h/phi)^kappa}

Note that this is a valid correlation only for 0 < kappa <= 2.

The GEV (Generalised Extreme Value) link is defined by

mu = 1 - \exp[-max(0, 1 + nu x)^(1/nu)]

for any real nu. At nu = 0 it reduces to the complementary log-log link.

Value

A list with components

  • parest The parameter estimates

  • skeleton The skeleton points used with the corresponding logarithm of the Bayes factors at those points.

  • optim The output from the optim function.

  • mcmcsample The MCMC samples for the remaining parameters and the random field. These samples correspond to the Gibbs and Metropolis-Hasting samples after fixing the parameters estimated by empirical Bayes at their empirical Bayes estimates.

  • sys_time The time taken to complete the MCMC sampling, calculation of the importance weights, the optimization and the final MCMC sampling.

References

Roy, V., Evangelou, E., and Zhu, Z. (2015). Efficient estimation and prediction for the Bayesian spatial generalized linear mixed model with flexible link functions. Biometrics. http://dx.doi.org/10.1111/biom.12371

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
data(rhizoctonia)

### Define the model
corrf <- "spherical"
kappa <- 0
ssqdf <- 1
ssqsc <- 1
betm0 <- 0
betQ0 <- .01

### Skeleton points
philist <- c(100,140,180)
linkp <- "logit"
omglist <- c(0,.5,1)
parlist <- expand.grid(phi = philist, linkp = linkp, omg = omglist,
                       kappa = kappa)
paroptim <- list(linkp = linkp, phi = c(100, 200), omg = c(0, 2),
                 kappa = kappa)

### MCMC sizes
Nout <- Npro <- 100
Nthin <- Nprt <- 1
Nbi <- Nprb <- 0

est <- ebsglmm(Infected ~ 1, 'binomial', rhizoctonia, weights = Total,
               atsample = ~ Xcoord + Ycoord, parskel = parlist,
               paroptim = paroptim, corrfcn = corrf,
               Nout = Nout, Nthin = Nthin, Nbi = Nbi,
               Npro = Npro, Nprt = Nprt, Nprb = Nprb,
               betm0 = betm0, betQ0 = betQ0, ssqdf = ssqdf, ssqsc = ssqsc,
               dispersion = 1, useCV=TRUE)

## End(Not run)