knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
The saeHB.Spatial.Beta package provides several functions to estimate small area proportions using Hierarchical Bayesian (HB) methods under spatial and non-spatial models. This package is specifically designed to accommodate survey design effects (DEFF) and handle non-sampled areas.
In this vignette, we will demonstrate a complete analytical workflow:
First, load the package along with the provided synthetic dataset (databeta). We will also load ggplot2 for visualization. After loading the data, we calculate the variance and the Relative Standard Error (RSE) of the direct estimates to serve as our baseline for comparison.
library(saeHB.Spatial.Beta) library(ggplot2) # Load data data("databeta") # Calculate Variance of Direct Estimator for proportion data considering DEFF # var(y) = [y * (1 - y) / n_i] * deff var_direct <- (databeta$y * (1 - databeta$y) / databeta$n_i) * databeta$deff # Calculate Relative Standard Error (RSE) of Direct Estimation databeta$rse_direct <- (sqrt(var_direct) / databeta$y) * 100
We begin by fitting a baseline non-spatial model that accommodates the survey design effect. Here, we use the default Markov Chain Monte Carlo (MCMC) iterations.
mod_ns_deff <- betadeff_nonspatial( formula = y ~ x1 + x2, deff = "deff", n_i = "n_i", data = databeta )
Extract the RSE for the non-spatial estimates:
rse_ns_deff <- (mod_ns_deff$est$Est.Error / mod_ns_deff$est$Estimate) * 100
To determine if a spatial model is necessary, we evaluate the spatial autocorrelation of the random effects ($v$) from the non-spatial model.
We use the Monte Carlo Permutation approach. This method is highly recommended because it computes the p-value empirically by shuffling the spatial units, making it robust. The Analytical approach (randomization) can be used as an alternative (mc = FALSE) for very large datasets where computational speed is a priority.
# Load the spatial weight matrix for the diagnostic test data("weight_mat") W_listw <- spdep::mat2listw(weight_mat, style = "W") # Extract the mean of the random effects (v) v_ns_deff <- as.numeric(mod_ns_deff$randeff$Estimate) # Monte Carlo Permutation Approach set.seed(123) moran_result <- moran_test(x = v_ns_deff, listw = W_listw, mc = TRUE, nsim = 999) print(moran_result)
A significant p-value confirms that the non-spatial model left unexplained spatial structure, heavily justifying the use of spatial models.
We will now fit two spatial models: the Simultaneous Autoregressive (SAR) model and the Leroux Conditional Autoregressive (CAR) model. The SAR model requires a row-standardized weight matrix, while the Leroux CAR model requires a binary adjacency matrix.
# Load the binary adjacency matrix for the Leroux CAR model data("adjacency_mat") # 1. Fit Spatial SAR Model mod_sar_deff <- betadeff_sar( formula = y ~ x1 + x2, deff = "deff", n_i = "n_i", proxmat = weight_mat, data = databeta ) # 2. Fit Spatial Leroux CAR Model mod_leroux_deff <- betadeff_lerouxcar( formula = y ~ x1 + x2, deff = "deff", n_i = "n_i", proxmat = adjacency_mat, data = databeta )
Extract the RSE for both spatial models:
rse_sar_deff <- (mod_sar_deff$est$Est.Error / mod_sar_deff$est$Estimate) * 100 rse_leroux_deff <- (mod_leroux_deff$est$Est.Error / mod_leroux_deff$est$Estimate) * 100
We compare the performance of the Direct Estimator, the Non-Spatial Model, and the Spatial Models. A lower RSE indicates a more reliable and precise estimate.
# Combine RSEs into a single data frame for plotting df_rse <- data.frame( Area = seq_along(databeta$y), Direct = databeta$rse_direct, Non_Spatial = rse_ns_deff, Spatial_SAR = rse_sar_deff, Spatial_Leroux = rse_leroux_deff ) # Order by Direct RSE for better visualization df_rse <- df_rse[order(df_rse$Direct), ] df_rse$Area_Index <- seq_len(nrow(df_rse)) # Plotting the RSE Comparison ggplot(df_rse, aes(x = Area_Index)) + # Direct Estimation geom_line(aes(y = Direct, color = "Direct"), linewidth = 0.8, alpha = 0.6) + geom_point(aes(y = Direct, color = "Direct"), size = 2, alpha = 0.6) + # Non-Spatial geom_line(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), linewidth = 0.8, alpha = 0.8) + geom_point(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), size = 2, alpha = 0.8) + # Spatial SAR geom_line(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), linewidth = 1) + geom_point(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), size = 2) + # Spatial Leroux CAR geom_line(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), linewidth = 1) + geom_point(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), size = 2) + scale_color_manual( name = "Estimator", values = c("Direct" = "#E69F00", "HB Beta Deff Non-Spatial" = "#56B4E9", "HB Beta Deff Spatial SAR" = "#009E73", "HB Beta Deff Spatial Leroux" = "#D55E00") ) + labs( title = "Comparison of Relative Standard Error (RSE)", subtitle = "Lower RSE indicates higher precision", x = "Area (Ordered by Direct RSE)", y = "RSE (%)" ) + theme_minimal() + theme(legend.position = "bottom")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.