# binary.probit.Bayes: Bayesian estimation for the two-levels binary probit model In PrevMap: Geostatistical Modelling of Spatially Referenced Prevalence Data

## Description

This function performs Bayesian estimation for a geostatistical binary probit model. It also allows to specify a two-levels model so as to include individual-level and household-level (or any other unit comprising a group of individuals, e.g. village, school, compound, etc...) variables.

## Usage

 1 2 binary.probit.Bayes(formula, coords, data, ID.coords, control.prior, control.mcmc, kappa, low.rank = FALSE, knots = NULL, messages = TRUE) 

## Arguments

 formula an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. coords an object of class formula indicating the geographic coordinates. data a data frame containing the variables in the model. ID.coords vector of ID values for the unique set of spatial coordinates obtained from create.ID.coords. These must be provided in order to specify spatial random effects at household-level. Warning: the household coordinates must all be distinct otherwise see jitterDupCoords. Default is NULL. control.prior output from control.prior. control.mcmc output from control.mcmc.Bayes. kappa value for the shape parameter of the Matern covariance function. low.rank logical; if low.rank=TRUE a low-rank approximation is required. Default is low.rank=FALSE. knots if low.rank=TRUE, knots is a matrix of spatial knots used in the low-rank approximation. Default is knots=NULL. messages logical; if messages=TRUE then status messages are printed on the screen (or output device) while the function is running. Default is messages=TRUE.

## Details

This function performs Bayesian estimation for the parameters of the geostatistical binary probit model. Let i and j denote the indices of the i-th household and j-th individual within that household. The response variable Y_{ij} is a binary indicator taking value 1 if the individual has been tested positive for the disease of interest and 0 otherwise. Conditionally on a zero-mean stationary Gaussian process S(x_{i}), Y_{ij} are mutually independent Bernoulli variables with probit link function Φ^{-1}(\cdot), i.e.

Φ^{-1}(p_{ij}) = d_{ij}'β + S(x_{i}),

where d_{ij} is a vector of covariates, both at individual- and household-level, with associated regression coefficients β. The Gaussian process S(x) has isotropic Matern covariance function (see matern) with variance sigma2, scale parameter phi and shape parameter kappa.

Priors definition. Priors can be defined through the function control.prior. The hierarchical structure of the priors is the following. Let θ be the vector of the covariance parameters c(sigma2,phi); each component of θ has independent priors that can be freely defined by the user. However, in control.prior uniform and log-normal priors are also available as default priors for each of the covariance parameters. The vector of regression coefficients beta has a multivariate Gaussian prior with mean beta.mean and covariance matrix beta.covar.

Updating regression coefficents and random effects using auxiliary variables. To update β and S(x_{i}), we use an auxiliary variable technique based on Rue and Held (2005). Let V_{ij} denote a set of random variables that conditionally on β and S(x_{i}), are mutually independent Gaussian with mean d_{ij}'β + S(x_{i}) and unit variance. Then, Y_{ij}=1 if V_{ij} > 0 and Y_{ij}=0 otherwise. Using this representation of the model, we use a Gibbs sampler to simulate from the full conditionals of β, S(x_{i}) and V_{ij}. See Section 4.3 of Rue and Held (2005) for more details.

Updating the covariance parameters with a Metropolis-Hastings algorithm. In the MCMC algorithm implemented in binary.probit.Bayes, the transformed parameters

(θ_{1}, θ_{2})=(\log(σ^2)/2,\log(σ^2/φ^{2 κ}))

are independently updated using a Metropolis Hastings algorithm. At the i-th iteration, a new value is proposed for each parameter from a univariate Gaussian distrubion with variance h_{i}^2. This is tuned using the following adaptive scheme

h_{i} = h_{i-1}+c_{1}i^{-c_{2}}(α_{i}-0.45),

where α_{i} is the acceptance rate at the i-th iteration, 0.45 is the optimal acceptance rate for a univariate Gaussian distribution, whilst c_{1} > 0 and 0 < c_{2} < 1 are pre-defined constants. The starting values h_{1} for each of the parameters θ_{1} and θ_{2} can be set using the function control.mcmc.Bayes through the arguments h.theta1, h.theta2 and h.theta3. To define values for c_{1} and c_{2}, see the documentation of control.mcmc.Bayes.

Low-rank approximation. In the case of very large spatial data-sets, a low-rank approximation of the Gaussian spatial process S(x) might be computationally beneficial. Let (x_{1},…,x_{m}) and (t_{1},…,t_{m}) denote the set of sampling locations and a grid of spatial knots covering the area of interest, respectively. Then S(x) is approximated as ∑_{i=1}^m K(\|x-t_{i}\|; φ, κ)U_{i}, where U_{i} are zero-mean mutually independent Gaussian variables with variance sigma2 and K(.;φ, κ) is the isotropic Matern kernel (see matern.kernel). Since the resulting approximation is no longer a stationary process (but only approximately), sigma2 may take very different values from the actual variance of the Gaussian process to approximate. The function adjust.sigma2 can then be used to (approximately) explore the range for sigma2. For example if the variance of the Gaussian process is 0.5, then an approximate value for sigma2 is 0.5/const.sigma2, where const.sigma2 is the value obtained with adjust.sigma2.

## Value

An object of class "Bayes.PrevMap". The function summary.Bayes.PrevMap is used to print a summary of the fitted model. The object is a list with the following components:

estimate: matrix of the posterior samples of the model parameters.

S: matrix of the posterior samples for each component of the random effect.

const.sigma2: vector of the values of the multiplicative factor used to adjust the values of sigma2 in the low-rank approximation.

y: binary observations.

D: matrix of covariarates.

coords: matrix of the observed sampling locations.

kappa: shape parameter of the Matern function.

ID.coords: set of ID values defined through the argument ID.coords.

knots: matrix of spatial knots used in the low-rank approximation.

h1: vector of values taken by the tuning parameter h.theta1 at each iteration.

h2: vector of values taken by the tuning parameter h.theta2 at each iteration.

call: the matched call.

## Author(s)

Emanuele Giorgi [email protected]

Peter J. Diggle [email protected]

## References

Giorgi, E., Diggle, P.J. (2017). PrevMap: an R package for prevalence mapping. Journal of Statistical Software. 78(8), 1-29. doi: 10.18637/jss.v078.i08

Rue, H., Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, London.

Higdon, D. (1998). A process-convolution approach to modeling temperatures in the North Atlantic Ocean. Environmental and Ecological Statistics 5, 173-190.

control.mcmc.Bayes, control.prior,summary.Bayes.PrevMap, matern, matern.kernel, create.ID.coords.