hbm_betalogitnorm: Small Area Estimation using Hierarchical Bayesian under Beta...

View source: R/hbm_betalogitnorm.R

hbm_betalogitnormR Documentation

Small Area Estimation using Hierarchical Bayesian under Beta Distribution

Description

This function is implemented a Hierarchical Bayesian Small Area Estimation (HBSAE) model under a beta distribution using Bayesian inference with the brms package.

The range of the variable data (y) that is intended as a beta distribution must be 0<y<1. The data proportion is supposed to be implemented with this function.

The function utilizes the Bayesian regression modeling framework provided by brms, which interfaces with 'Stan' for efficient Markov Chain Monte Carlo sampling. The brm() function from brms is used to estimate posterior distributions based on user-defined hierarchical and spatial structures.

Usage

hbm_betalogitnorm(
  response,
  predictors,
  n = NULL,
  deff = NULL,
  link_phi = "identity",
  group = NULL,
  sre = NULL,
  sre_type = NULL,
  car_type = NULL,
  sar_type = NULL,
  M = NULL,
  data,
  handle_missing = NULL,
  m = 5,
  prior = NULL,
  control = list(),
  chains = 4,
  iter = 4000,
  warmup = floor(iter/2),
  cores = 1,
  sample_prior = "no",
  stanvars = NULL,
  ...
)

Arguments

response

The dependent (outcome) variable in the model. This variable represents the main response being predicted or analyzed.

predictors

A list of independent (explanatory) variables used in the model. These variables form the fixed effects in the regression equation.

n

The number of sample units for each region used in the survey

deff

Design Effect

link_phi

Link function for the second parameter (phi), typically representing precision, shape, or dispersion depending on the family used (e.g., "log", "identity")

group

The name of the grouping variable (e.g., area, cluster, region) used to define the hierarchical structure for random effects. This variable should correspond to a column in the input data and is typically used to model area-level variation through random intercepts

sre

An optional grouping factor mapping observations to spatial locations. If not specified, each observation is treated as a separate location. It is recommended to always specify a grouping factor to allow for handling of new data in postprocessing methods.

sre_type

Determines the type of spatial random effect used in the model. The function currently supports "sar" and "car"

car_type

Type of the CAR structure. Currently implemented are "escar" (exact sparse CAR), "esicar" (exact sparse intrinsic CAR), "icar" (intrinsic CAR), and "bym2".

sar_type

Type of the SAR structure. Either "lag" (for SAR of the response values) or "error" (for SAR of the residuals).

M

The M matrix in SAR is a spatial weighting matrix that shows the spatial relationship between locations with certain weights, while in CAR, the M matrix is an adjacency matrix that only contains 0 and 1 to show the proximity between locations. SAR is more focused on spatial influences with different intensities, while CAR is more on direct adjacency relationships. If sre is specified, the row names of M have to match the levels of the grouping factor

data

Dataset used for model fitting

handle_missing

Mechanism to handle missing data (NA values) to ensure model stability and avoid estimation errors. Three approaches are supported. The "deleted" approach performs complete case analysis by removing all rows with any missing values before model fitting. This is done using a simple filter such as complete.cases(data). It is recommended when the missingness mechanism is Missing Completely At Random (MCAR). The "multiple" approach applies multiple imputation before model fitting. Several imputed datasets are created (e.g., using the mice package or the brm_multiple() function in brms), the model is fitted separately to each dataset, and the results are combined. This method is suitable when data are Missing At Random (MAR). The "model" approach uses model-based imputation within the Bayesian model itself. Missing values are incorporated using the mi() function in the model formula (e.g., y ~ mi(x1) + mi(x2)), allowing the missing values to be jointly estimated with the model parameters. This method also assumes a MAR mechanism and is applicable only for continuous variables. If data are suspected to be Missing Not At Random (MNAR), none of the above approaches directly apply. Further exploration, such as explicitly modeling the missingness process or conducting sensitivity analyses, is recommended.

m

Number of imputations to perform when using the "multiple" approach for handling missing data (default: 5). This parameter is only used if handle_missing = "multiple". It determines how many imputed datasets will be generated. Each imputed dataset is analyzed separately, and the posterior draws are then combined to account for both within-imputation and between-imputation variability, following Rubin’s rules. A typical choice is between 5 and 10 imputations, but more may be needed for higher missingness rates.

prior

Priors for the model parameters (default: NULL). Should be specified using the brms::prior() function or a list of such objects. For example, prior = prior(normal(0, 1), class = "b") sets a Normal(0,1) prior on the regression coefficients. Multiple priors can be combined using c(), e.g., prior = c(prior(normal(0, 1), class = "b"), prior(exponential(1), class = "sd")). If NULL, default priors from brms will be used.

control

A list of control parameters for the sampler (default: list())

chains

Number of Markov chains (default: 4)

iter

Total number of iterations per chain (default: 4000)

warmup

Number of warm-up iterations per chain (default: floor(iter/2))

cores

Number of CPU cores to use (default: 1)

sample_prior

(default: "no")

stanvars

An optional stanvar or combination of stanvar objects used to define the hyperpriors for the hyperparameter phi. By default, if phi is not fixed, a gamma prior is used: phi ~ gamma(alpha, beta), where alpha and beta can be defined via stanvars. Use "+" to combine multiple stanvar definitions.

For example: stanvar(scode = "alpha ~ gamma(2, 1);", block = "model") + stanvar(scode = "beta ~ gamma(1, 1);", block = "model")

To use the default hyperprior for phi, set stanvars = NULL.

...

Additional arguments passed to the brm() function.

Value

A hbmfit object

Author(s)

Sofi Zamzanah

References

Liu, B. (2009). Hierarchical Bayes Estimation and Empirical Best Prediction of Small-Area Proportions. College Park, University of Maryland. Rao, J. N. K., & Molina, I. (2015). Small Area Estimation. John Wiley & Sons, page 390. Gelman, A. (2006). Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper). Bayesian Analysis, 1(3), 527–528. Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models.

Examples



# Load the example dataset
library(hbsaems)
data("data_betalogitnorm")

# Prepare the dataset
data <- data_betalogitnorm

# Fit Beta Model
model1 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
data = data
)
summary(model1)

# if you have the information of n and deff values you can use the following model
model1 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
data = data
)
summary(model1)

# From this stage to the next will be explained the construction of the model with 
# the condition that the user has information on the value of n and deff. 
# If you do not have information related to the value of n and deff 
# then simply delete the parameters n and deff in your model.

# Fit Beta Model with Grouping Variable as Random Effect
model2 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
group = "group",
data = data
)
summary(model2)

# Fit Beta Model With Missing Data
data_miss <- data
data_miss[5:7, "y"] <- NA

# a. Handling missing data by deleted (Only if missing in response)
model3 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
data = data_miss,
handle_missing = "deleted"
)
summary(model3)

# b. Handling missing data using multiple imputation (m=5)
model4 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
data = data_miss,
handle_missing = "multiple"
)
summary(model4)

# c. Handle missing data during model fitting using mi()
data_miss <- data
data_miss$x1[3:5] <- NA 
data_miss$x2[14:17] <- NA 
model5 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
group = "group",
data = data_miss,
handle_missing = "model"
)

# Fit Logit-Normal Model With Spatial Effect
data("adjacency_matrix_car")
M <- adjacency_matrix_car

model6 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
n = "n",
deff = "deff",
sre = "sre",
sre_type = "car",
M = M,
data = data
)
summary(model6)


# have input of argument stanvars as prior distribution of alpha and beta

model7 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
data = data,
stanvars = stanvar(scode = "alpha ~ gamma(2, 1);", block = "model") +
stanvar(scode = "beta ~ gamma(1, 1);", block = "model") #stanvars of alpha and beta
)

summary(model7)

# have input of argument stanvars as prior distribution of beta

model8 <- hbm_betalogitnorm(
response = "y",
predictors = c("x1", "x2", "x3"),
data = data,
stanvars = stanvar(scode = "beta ~ gamma(1, 1);", block = "model") #stanvars of beta

 ) 
summary(model8)



hbsaems documentation built on Aug. 8, 2025, 7:28 p.m.