# Bayesian estimation for the two-levels binary probit model

### Description

This function performs Bayesian estimation for a geostatistical binary probit model. It also allows to specify a two-levels model so as to include individual-level and household-level (or any other unit comprising a group of individuals, e.g. village, school, compound, etc...) variables.

### Usage

1 2 | ```
binary.probit.Bayes(formula, coords, data, ID.coords, control.prior,
control.mcmc, kappa, low.rank = FALSE, knots = NULL, messages = TRUE)
``` |

### Arguments

`formula` |
an object of class |

`coords` |
an object of class |

`data` |
a data frame containing the variables in the model. |

`ID.coords` |
vector of ID values for the unique set of spatial coordinates obtained from |

`control.prior` |
output from |

`control.mcmc` |
output from |

`kappa` |
value for the shape parameter of the Matern covariance function. |

`low.rank` |
logical; if |

`knots` |
if |

`messages` |
logical; if |

### Details

This function performs Bayesian estimation for the parameters of the geostatistical binary probit model. Let *i* and *j* denote the indices of the *i*-th household and *j*-th individual within that household. The response variable *Y_{ij}* is a binary indicator taking value 1 if the individual has been tested positive for the disease of interest and 0 otherwise. Conditionally on a zero-mean stationary Gaussian process *S(x_{i})*, *Y_{ij}* are mutually independent Bernoulli variables with probit link function *Φ^{-1}(\cdot)*, i.e.

*Φ^{-1}(p_{ij}) = d_{ij}'β + S(x_{i}),*

where *d_{ij}* is a vector of covariates, both at individual- and household-level, with associated regression coefficients *β*. The Gaussian process *S(x)* has isotropic Matern covariance function (see `matern`

) with variance `sigma2`

, scale parameter `phi`

and shape parameter `kappa`

.

**Priors definition.** Priors can be defined through the function `control.prior`

. The hierarchical structure of the priors is the following. Let *θ* be the vector of the covariance parameters `c(sigma2,phi)`

; each component of *θ* has independent priors that can be freely defined by the user. However, in `control.prior`

uniform and log-normal priors are also available as default priors for each of the covariance parameters. The vector of regression coefficients `beta`

has a multivariate Gaussian prior with mean `beta.mean`

and covariance matrix `beta.covar`

.

**Updating regression coefficents and random effects using auxiliary variables.** To update *β* and *S(x_{i})*, we use an auxiliary variable technique based on Rue and Held (2005). Let *V_{ij}* denote a set of random variables that conditionally on *β* and *S(x_{i})*, are mutually independent Gaussian with mean *d_{ij}'β + S(x_{i})* and unit variance. Then, *Y_{ij}=1* if *V_{ij} > 0* and *Y_{ij}=0* otherwise. Using this representation of the model, we use a Gibbs sampler to simulate from the full conditionals of *β*, *S(x_{i})* and *V_{ij}*. See Section 4.3 of Rue and Held (2005) for more details.

**Updating the covariance parameters with a Metropolis-Hastings algorithm.** In the MCMC algorithm implemented in `binary.probit.Bayes`

, the transformed parameters

*(θ_{1}, θ_{2})=(\log(σ^2)/2,\log(σ^2/φ^{2 κ}))*

are independently updated using a Metropolis Hastings algorithm. At the *i*-th iteration, a new value is proposed for each parameter from a univariate Gaussian distrubion with variance *h_{i}^2*. This is tuned using the following adaptive scheme

*h_{i} = h_{i-1}+c_{1}i^{-c_{2}}(α_{i}-0.45),*

where *α_{i}* is the acceptance rate at the *i*-th iteration, 0.45 is the optimal acceptance rate for a univariate Gaussian distribution, whilst *c_{1} > 0* and *0 < c_{2} < 1* are pre-defined constants. The starting values *h_{1}* for each of the parameters *θ_{1}* and *θ_{2}* can be set using the function `control.mcmc.Bayes`

through the arguments `h.theta1`

, `h.theta2`

and `h.theta3`

. To define values for *c_{1}* and *c_{2}*, see the documentation of `control.mcmc.Bayes`

.

**Low-rank approximation.**
In the case of very large spatial data-sets, a low-rank approximation of the Gaussian spatial process *S(x)* might be computationally beneficial. Let *(x_{1},…,x_{m})* and *(t_{1},…,t_{m})* denote the set of sampling locations and a grid of spatial knots covering the area of interest, respectively. Then *S(x)* is approximated as *∑_{i=1}^m K(\|x-t_{i}\|; φ, κ)U_{i}*, where *U_{i}* are zero-mean mutually independent Gaussian variables with variance `sigma2`

and *K(.;φ, κ)* is the isotropic Matern kernel (see `matern.kernel`

). Since the resulting approximation is no longer a stationary process (but only approximately), `sigma2`

may take very different values from the actual variance of the Gaussian process to approximate. The function `adjust.sigma2`

can then be used to (approximately) explore the range for `sigma2`

. For example if the variance of the Gaussian process is `0.5`

, then an approximate value for `sigma2`

is `0.5/const.sigma2`

, where `const.sigma2`

is the value obtained with `adjust.sigma2`

.

### Value

An object of class "Bayes.PrevMap".
The function `summary.Bayes.PrevMap`

is used to print a summary of the fitted model.
The object is a list with the following components:

`estimate`

: matrix of the posterior samples of the model parameters.

`S`

: matrix of the posterior samples for each component of the random effect.

`const.sigma2`

: vector of the values of the multiplicative factor used to adjust the values of `sigma2`

in the low-rank approximation.

`y`

: binary observations.

`D`

: matrix of covariarates.

`coords`

: matrix of the observed sampling locations.

`kappa`

: shape parameter of the Matern function.

`ID.coords`

: set of ID values defined through the argument `ID.coords`

.

`knots`

: matrix of spatial knots used in the low-rank approximation.

`h1`

: vector of values taken by the tuning parameter `h.theta1`

at each iteration.

`h2`

: vector of values taken by the tuning parameter `h.theta2`

at each iteration.

`call`

: the matched call.

### Author(s)

Emanuele Giorgi e.giorgi@lancaster.ac.uk

Peter J. Diggle p.diggle@lancaster.ac.uk

### References

Rue, H., Held, L. (2005). *Gaussian Markov Random Fields: Theory and Applications.* Chapman & Hall, London.

Higdon, D. (1998). *A process-convolution approach to modeling temperatures in the North Atlantic Ocean.* Environmental and Ecological Statistics 5, 173-190.

### See Also

`control.mcmc.Bayes`

, `control.prior`

,`summary.Bayes.PrevMap`

, `matern`

, `matern.kernel`

, `create.ID.coords`

.