hbm: hbm : Hierarchical Bayesian Small Area Models

View source: R/hbm.R

hbmR Documentation

hbm : Hierarchical Bayesian Small Area Models

Description

This function provide flexible modeling approaches to estimate area-level statistics while incorporating auxiliary information and spatial structures. This function allows users to fit Bayesian models using the brms package and supports Gaussian, Bernoulli, Poisson, and other distributions. It also accommodates spatial random effects (CAR and SAR) and missing data handling (deletion, model-based imputation, and multiple imputation).

Usage

hbm(
  formula,
  hb_sampling = "gaussian",
  hb_link = "identity",
  link_phi = "log",
  re = NULL,
  sre = NULL,
  sre_type = NULL,
  car_type = NULL,
  sar_type = NULL,
  M = NULL,
  data,
  prior = NULL,
  handle_missing = NULL,
  m = 5,
  control = list(),
  chains = 4,
  iter = 4000,
  warmup = floor(iter/2),
  cores = 1,
  sample_prior = "no",
  ...
)

Arguments

formula

Formula specifying the model structure of auxiliary variables and direct estimates The formula must be provided as a brmsformula or formula object. For multivariate models with multiple auxiliary variables, use the + operator to combine multiple bf() formulas. Example: formula(y ~ x1 + x2 + x3), bf(y ~ x1 + x2 + x3), or bf(y | mi() ~ mi(x1)) + bf(x1 | mi() ~ x2)

hb_sampling

A character string naming the distribution family of the response variable to be used in the model (e.g., "gaussian", "bernoulli", "poisson")

hb_link

A specification for the model link function. This can be a name/expression or character string. See the ’Details’ section for more information on link functions supported by each family.

link_phi

Link function for the second parameter (phi), typically representing precision, shape, or dispersion depending on the family used (e.g., "log", "identity")

re

Random effects formula specifying the grouping structure in the data. For example, re = ~(1|area), where "area" is the grouping variable or cluster ID indicating that observations within the same area share a common random effect. If not specified, each row will be treated as its own group, meaning a separate random effect is estimated for each observation.

sre

An optional grouping factor mapping observations to spatial locations. If not specified, each observation is treated as a separate location. It is recommended to always specify a grouping factor to allow for handling of new data in postprocessing methods.

sre_type

Determines the type of spatial random effect used in the model. The function currently supports "sar" and "car"

car_type

Type of the CAR structure. Currently implemented are "escar" (exact sparse CAR), "esicar" (exact sparse intrinsic CAR), "icar" (intrinsic CAR), and "bym2".

sar_type

Type of the SAR structure. Either "lag" (for SAR of the response values) or "error" (for SAR of the residuals).

M

The M matrix in SAR is a spatial weighting matrix that shows the spatial relationship between locations with certain weights, while in CAR, the M matrix is an adjacency matrix that only contains 0 and 1 to show the proximity between locations. SAR is more focused on spatial influences with different intensities, while CAR is more on direct adjacency relationships. If sre is specified, the row names of M have to match the levels of the grouping factor

data

Dataset used for model fitting

prior

Priors for the model parameters (default: NULL). Should be specified using the brms::prior() function or a list of such objects. For example, prior = prior(normal(0, 1), class = "b") sets a Normal(0,1) prior on the regression coefficients. Multiple priors can be combined using c(), e.g., prior = c(prior(normal(0, 1), class = "b"), prior(exponential(1), class = "sd")). If NULL, default priors from brms will be used.

handle_missing

Mechanism to handle missing data (NA values) to ensure model stability and avoid estimation errors. Three approaches are supported. The "deleted" approach performs complete case analysis by removing all rows with any missing values before model fitting. This is done using a simple filter such as complete.cases(data). It is recommended when the missingness mechanism is Missing Completely At Random (MCAR). The "multiple" approach applies multiple imputation before model fitting. Several imputed datasets are created (e.g., using the mice package or the brm_multiple() function in brms), the model is fitted separately to each dataset, and the results are combined. This method is suitable when data are Missing At Random (MAR). The "model" approach uses model-based imputation within the Bayesian model itself. Missing values are incorporated using the mi() function in the model formula (e.g., y ~ mi(x1) + mi(x2)), allowing the missing values to be jointly estimated with the model parameters. This method also assumes a MAR mechanism and is applicable only for continuous variables. If data are suspected to be Missing Not At Random (MNAR), none of the above approaches directly apply. Further exploration, such as explicitly modeling the missingness process or conducting sensitivity analyses, is recommended.

m

Number of imputations to perform when using the "multiple" approach for handling missing data (default: 5). This parameter is only used if handle_missing = "multiple". It determines how many imputed datasets will be generated. Each imputed dataset is analyzed separately, and the posterior draws are then combined to account for both within-imputation and between-imputation variability, following Rubin’s rules. A typical choice is between 5 and 10 imputations, but more may be needed for higher missingness rates.

control

A list of control parameters for the sampler (default: list())

chains

Number of Markov chains (default: 4)

iter

Total number of iterations per chain (default: 4000)

warmup

Number of warm-up iterations per chain (default: floor(iter/2))

cores

Number of CPU cores to use (default: 1)

sample_prior

Character. Indicates whether draws from priors should be sampled in addition to posterior draws. The options are: "no" (default): Do not draw from priors (only posterior draws are obtained). "yes": Draw both from the prior and posterior. "only": Draw solely from the prior, ignoring the likelihood. which allows among others to generate draws from the prior predictive distribution.

...

Additional arguments

Details

Hierarchical Bayesian Small Area Models

Value

A hbmfit object containing :

model

Summary of brms object.

handle_missing

Handle missing option used in the model.

data

Data passed to the hbm function.

Author(s)

Achmad Syahrul Choir, Saniyyah Sri Nurhayati, and Sofi Zamzanah

References

Rao, J. N. K., & Molina, I. (2015). Small Area Estimation. John Wiley & Sons. Bürkner, P. C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1-28.

Examples



# Load the example dataset
library(hbsaems)
data("data_fhnorm")

# Prepare the dataset
data <- data_fhnorm

# Fit the Basic Model
model <- hbm(
formula = bf(y ~ x1 + x2 + x3), # Formula model
hb_sampling = "gaussian", # Gaussian family for continuous outcomes
hb_link = "identity", # Identity link function (no transformation)
data = data, # Dataset
chains = 4, # Number of MCMC chains
iter = 4000, # Total MCMC iterations
warmup = 2000, # Number of warmup iterations
cores = 2 # Parallel processing
)
summary(model)

# Fit the Basic Model With Defined Random Effect
model_with_defined_re <- hbm(
formula = bf(y ~ x1 + x2 + x3), # Formula model
hb_sampling = "gaussian", # Gaussian family
hb_link = "identity", # Identity link
re = ~(1 | group), # Defined random effect
data = data,
chains = 4,
iter = 4000,
warmup = 2000,
cores = 2
)
summary(model_with_defined_re)

# Fit the Model with Missing Data
# a. Handling missing by deletion
data_miss <- data
data_miss$y[3:5] <- NA 
model_deleted <- hbm(
formula = bf(y ~ x1 + x2 + x3),
hb_sampling = "gaussian",
hb_link = "identity",
re = ~(1 | group),
data = data,
handle_missing = "deleted",
chains = 4,
iter = 4000,
warmup = 2000,
cores = 2
)
summary(model_deleted)

# b. Handling missing using multiple imputation
model_multiple <- hbm(
formula = bf(y ~ x1 + x2 + x3),
hb_sampling = "gaussian",
hb_link = "identity",
re = ~(1 | group),
data = data,
handle_missing = "multiple",
chains = 4,
iter = 4000,
warmup = 2000,
cores = 2
)
summary(model_multiple)

# c. Handling missing during modeling
data_miss$y[3:5] <- NA 
data_miss$x1[6:7] <- NA
model_model <- hbm(
formula = bf(y | mi() ~ mi(x1) + x2 + x3) +
bf(x1 | mi() ~ x2 + x3),
hb_sampling = "gaussian",
hb_link = "identity",
re = ~(1 | group),
data = data,
handle_missing = "model",
chains = 4,
iter = 4000,
warmup = 2000,
cores = 2
)
summary(model_model)

# Fit the Model with Spatial Effect
# a. CAR (Conditional Autoregressive)
data("adjacency_matrix_car")
adjacency_matrix_car

model_spatial_car <- hbm(
formula = bf(y ~ x1 + x2 + x3 ), 
hb_sampling = "gaussian", 
hb_link = "identity", 
data = data, 
sre = "sre",
sre_type = "car",
M = adjacency_matrix_car,
chains = 4, 
iter = 4000, 
warmup = 2000, 
cores = 2 
)
summary(model_spatial_car)

# b. SAR (Simultaneous Autoregressive)
data("spatial_weight_sar")
spatial_weight_sar

model_spatial_sar <- hbm(
formula = bf(y ~ x1 + x2 + x3 ), 
hb_sampling = "gaussian", 
hb_link = "identity", 
data = data, 
sre_type = "sar",
M = spatial_weight_sar,
chains = 4, 
iter = 4000, 
warmup = 2000, 
cores = 2 
)



hbsaems documentation built on Aug. 8, 2025, 7:28 p.m.