BayesSAE: MCMC Draws Based on Area-Level Models
In BayesSAE: Bayesian Analysis of Small Area Estimation

Description Usage Arguments Details Value Author(s) References Examples

This function obtains MCMC draws of specific small area area-level models defined by the sampling model and linking model.

1
2
3

BayesSAE(formula, innov = "normal", df = NULL, b = NA, spatial = FALSE, tran = 
     "F", prox = NULL, beta.start = NULL, theta.start = NULL, lam.start = runif(1), 
     prior = NULL, mcmc = 5000, burnin = 2500, thin = 5, data)

`formula`	a symbolic description of the model to be fitted (of type y ~ x \| z). `y` is the response variable in the sampling model while `x` is the design matrix in the linking model, and `z` is the estimated variance of direct estimation in the sampling model. See Rao (2003) for details about the sampling model and linking model in small area estimation.
`innov`	distribution of innovations in the sampling model. to be chosen between `"normal"` and "t".
`df`	a vector containing degrees of freedom for the `t` innovation in the sampling model if `innov = "t"`.
`b`	an optional weights vector to be used in the fitting process. Number of domains must be the same as that of the direct estimators. By default is a vector of ones. See Details.
`spatial`	a logical variable indicating whether it's a spatial model or not.
`tran`	the transformation to be taken on the responsive variable in the linking model, to be chosen between `"F"`, `"log"` or `"logit"`. `"F"` is the default value and indicates no transformation taken. See Details.
`prox`	`l * 2` matrix defining the neighbourhood matrix. See also Details.
`beta.start`	initial values for beta's. By default is the coefficients of regression model where the response vector is directly regressed on the design matrix in the linking model.
`theta.start`	initial values for theta's. By default is the response vector.
`lam.start`	initial value for lambda in the spatial model. See You and Chapman (2006).
`prior`	a list of objects specifying priors. See Details.
`mcmc`	user-specified number of MCMC draws.
`burnin`	The number of burnin iterations for the sampler. See Gelman (2006).
`thin`	the thinning interval used in the simulation. See Gelman (2006).
`data`	an optional data frame, list or environment containing variables in the model.

Let θ_i donates variable of interest for each domain i, x_i the regressors, β the regression coefficient, v_i i.i.d normal innovations. If argument b is specified, the linking model is of the form: θ_i = x_i β + b_i v_i.

If tran = "log", the linking model will be: \log(θ_i) = x_i β + b_i v_i. tran = "logit" means that logit transformation will be taken and the model will be: logit(θ_i) = x_i β + b_i v_i. Both are unmatched area level models. See Rao (2003).

The neighbourhood matrix has the ith diagonal element equal to the number of neighbours of area i, and off-diagonal elements equal to -1 if the corresponding areas are neighbours otherwise 0. See You and Chapman (2006).

The ith tuple in the argument prox indicates that area prox[i, 1] and area prox[i, 2] are neighbours. Duplicated tuples will be omitted. For example, if the first row of prox is (1, 2) while the second is (2, 1), the second row will be deleted. The two elements within each tuple are not supposed to be the same.

Initial values are crucial to MCMC convergence. EBLUP predictors of θ's and β's can provides good starting values for MCMC procedure.

The list prior should include following attributes for basic Fay-Herriot model:

beta.type: to be chosen between "non_in" or "normal". If beta.type = "non_in", non-informative prior would be specified for β. Otherwise, prior for β would be normal distribution.
beta.prior: a list contains components beta0 and eps1 if beta.type = "normal". As a result β will be distributed with mean μ = β_0 and covariance matrix Σ = diag(rep(1/eps1, p)) where p is length of β including the intercept term.
sigv.type: to be chosen between "inv_gamma" and "unif". σ_v^2 is the variance of residual in the linking model. If sigv.type = "inv_gamma", inverse gamma prior would be specified for σ_v^2. Otherwise, σ_v^2 would be uniformly distributed.
sigv.prior: a list containing components a0 and b0 as shape and rate parameter in the gamma prior. if sigv.type = "inv_gamma". Otherwise the list should contain the eps2 component and consequently σ_v^2 would be uniformly distributed on (0, 1 / eps2)

Besides, the prior list should also include attribute sig2.prior to specify priors for σ_i^2 in the You-Chapman model. See You and Chapman (2006). The sig2.prior is also a list contains the components ai and bi. Both ai and bi are vectors whose length are the same as number of domains. Thus, prior for σ_i^2 would be inverse gamma distribution with shape parameter ai[i] and rate parameter bi[i]. Default value of elements in ai and bi are all 0.05.

The function returns a object of class "BayesSAE" containing the following components:

`mcmc`	an mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package
`type`	character string indicating the type of the model. For instance "UFH" indicates that it's an unmatched Fay-Herriot model while "SYC" refers to spatial You-Chapman model
`Y`	response vector in the sampling model
`X`	design matrix in the linking model
`Z`	variance of direct estimation in the sampling model
`lam.rate`	the acceptance rate of λ since draws of λ are generated by M-H algorithm
`D_avg`	average deviance, defined as -2/n ∑_{i=1}^n p(y\|θ^{i}) where θ^{i} donates the ith posterior draw of θ.
`D_theta.hat`	discrepancy between data and model depending on a point estimator for θ, defined as -2 p(y\|θ^') where θ^' is the point estimator for θ. Here we calculate the posterior mean as the point estimator
`DIC`	deviance information criterion, defined as `2D_avg - D_theta.hat`. Small `DIC` value indicates strong out-of-sample predictive power
`theta.HB`	a vector of length the same as number of domains provides Rao-Blackwell estimators for each area.
`call`	the original function call
`HB`	Rao-Blackwellization of theta's
`spatial`	logical variable indicating whether the model is spatial or not
`tran`	character indicating the transformation of response variable in the linking model

If it's an unmatched model, θ_i's are generated by M-H algorithm and theta.rate as a vector of length m provides acceptance rate for each θ_i respectively is involved

Chengchun Shi

You, Y. and Chapman, B. (2006) Small Area Estimation Using Area Level Models and Estimated Sampling Variances. Survey Methodology, 32: 97-103.

Rao, J. N. K. (2003) Small Area Estimation. John Wiley and Sons.

Gelman, A. and Carlin, J. B. and Stern, H. S. and Rubin, D. B. (2006). Bayesian Data Analysis, CRC Press Company.

# load data set
data(SAIPE)
m <- length(SAIPE$SACPR)

# basic Fay-Herriort models (FH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000)

# You-Chapman models (YC)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t", 
    df = rep(50, m))
	
# spatial model with unknown sampling variance (SYC)
# define the neighbourhood matrix
prox <- cbind(sample(1:51, 50, replace = TRUE), sample(1:51, 50, replace = TRUE))
prox <- prox[prox[,1] != prox[,2], ]
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t",
    df = rep(50, m), spatial = TRUE, prox = prox)

# Unmatched models (UFH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, tran = "log")