Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR"
In surveySimR: Estimation of Population Total under Complex Sampling Design

title: "Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR" author: "Nobin Ch Paul" date: "r Sys.Date()" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}

1. Introduction

In survey sampling, a population is a well-defined set of identifiable and observable units relevant to a survey’s objective. Information about its parameters can be obtained through a Census (complete enumeration) or by selecting a representative sample using sampling methods. Populations may be finite or infinite, and sampling is performed on defined units listed in a sampling frame.

Simple Random Sampling Without Replacement (SRSWOR) is a fundamental sampling design, where every subset of size ( n ) from a population of size ( N ) has equal chance of being selected, with no repeated selections.

Two primary paradigms in inference are:

Design-based: Population values are fixed; randomness comes from the sampling process. One example is the Horvitz-Thompson (HT) estimator
Model-based: Population values are random variables modeled using probability distributions.

In both, auxiliary information can improve estimation accuracy. Common estimators leveraging this include:

Ratio estimator
Regression estimator

This study uses Monte Carlo simulation to compare their performance in estimating the population total under SRSWOR sampling design.

2. Estimators and Their Formulas

Let:

( N ): Population size
( n ): Sample size
( y_i ): Value of study variable for unit ( i )
( x_i ): Value of auxiliary variable for unit ( i )
( \bar{y}_s, \bar{x}_s ): Sample means of ( y ) and ( x )
( \bar{Y}, \bar{X} ): Population means of ( y ) and ( x )
( X = \sum_{i=1}^N x_i ): Known total of auxiliary variable

2.1. Horvitz-Thompson Estimator

The Horvitz-Thompson (HT) estimator for a finite population total is given by:

[ \hat{Y}{HT} = \sum{i \in s} \frac{y_i}{\pi_i} ]

where

( s ) is the sample drawn from the population,
( y_i ) is the observed value of the variable of interest for unit ( i ),
( \pi_i ) is the inclusion probability of unit ( i ).

Under SRSWOR, formula can be written as

[ \hat{Y}_{HT} = N \cdot \bar{y}_s ]

This estimator is unbiased under SRSWOR.

2.2. Ratio Estimator

[ \hat{Y}_{R} = X \cdot \frac{\bar{y}_s}{\bar{x}_s} ]

Efficient when ( y ) and ( x ) are positively correlated and the relationship passes through the origin.

2.3. Regression Estimator

[ \hat{Y}_{reg} = N \left[\bar{y}_s + b(\bar{X} - \bar{x}_s)\right] ]

where

[ b = \frac{\sum (x_i - \bar{x}_s)(y_i - \bar{y}_s)}{\sum (x_i - \bar{x}_s)^2} ]

Performs well under linear relationship between ( x ) and ( y ).

3. Performance Metrics Used in Simulation

| Metric | Description | |---------------|-------------| | Est_Total | Average estimated total over all simulations | | RB | Relative Bias (%): ( \frac{\text{Mean Estimate} - \text{True Total}}{\text{True Total}} \times 100 ) | | RRMSE | Relative Root Mean Square Error (%): ( \frac{\sqrt{\text{MSE}}}{\text{True Total}} \times 100 ) | | Skewness | Asymmetry of estimator distribution | | Kurtosis | Peakedness of estimator distribution | | CR | Coverage Rate: Proportion of times the 95% CI includes the true total | | For_var | Theoretical (design-based) variance | | Emp_var | Empirical variance from simulation | | Est_var | Mean of estimated variances across simulations | | perc_CV | Coefficient of Variation: ( \frac{SD}{Mean} \times 100 ) |

4. Monte Carlo Simulation Framework

A population is synthetically generated with size ( N = 1000 ). An auxiliary variable ( x ) is drawn from a normal distribution, and the study variable ( y ) is linearly related to ( x ) with added noise.

A sample of size ( n = 100 ) is drawn without replacement from the population, and using this sample, an estimate finite population total of all estimators has been calculated. This procedure was repeated independently, say 1000 times (simulation number).Measures like percentage relative bias (%RB) and percentage relative root mean square error (%RRMSE) have been used to compare the performance of the estimators like HT, Ratio and Regression estimators.

5. Results Summary

knitr::opts_chunk$set(echo = TRUE)

Function: `survey_sim_est`

This function performs a simulation study to evaluate the performance of three survey estimators: Horvitz-Thompson (HT), Ratio, and Regression estimators. It generates statistics like Relative Bias (RB), Relative Root Mean Square Error (RRMSE), Coefficient of Variation (CV), Skewness, Kurtosis, and Coverage Probability over a given number of simulations.

Function Definition

survey_sim_est <- function(Y, X, n = 40, SimNo = 500, seed = 123) {
  if (length(Y) != length(X)) stop("Y and X must be of the same length.")

  set.seed(seed)

  N <- length(Y)
  Y.total <- sum(Y)
  X.total <- sum(X)
  True.Total <- rep(Y.total, SimNo)

  data <- data.frame(ID = 1:N, Y = Y, X = X)

  est1 <- est2 <- est3 <- var_est1 <- var_est2 <- var_est3 <- numeric(SimNo)

  for (h in 1:SimNo) {
    set.seed(h)
    sa <- sample(1:N, n, replace = FALSE)
    samp <- data[sa, ]
    di <- rep(N / n, n)

    y_samp <- samp$Y
    x_samp <- samp$X
    s2_y <- var(y_samp)
    s2_x <- var(x_samp)
    rho_hat <- cor(y_samp, x_samp)
    R_hat <- mean(y_samp) / mean(x_samp)

    # HT Estimator
    est1[h] <- sum(di * y_samp)
    var_est1[h] <- N^2 * ((1 / n) - (1 / N)) * s2_y

    # Ratio Estimator
    ratio1 <- di * y_samp
    ratio2 <- di * x_samp
    est2[h] <- (sum(ratio1) / sum(ratio2)) * X.total
    var_est2[h] <- N^2 * ((1 / n) - (1 / N)) * (s2_y + (R_hat^2 * s2_x) - (2 * R_hat * rho_hat * sqrt(s2_y * s2_x)))

    # Regression Estimator
    b_hat <- rho_hat * sqrt(s2_y) / sqrt(s2_x)
    est3[h] <- est1[h] + b_hat * (X.total - sum(ratio2))
    var_est3[h] <- N^2 * ((1 / n) - (1 / N)) * s2_y * (1 - rho_hat^2)
  }

  # Population-level parameters
  pop_s2_y <- var(Y)
  pop_s2_x <- var(X)
  R_pop <- mean(Y) / mean(X)
  rho_pop <- cor(Y, X)

  var_HT <- N^2 * ((1 / n) - (1 / N)) * pop_s2_y
  var_ratio <- N^2 * ((1 / n) - (1 / N)) * (pop_s2_y + R_pop^2 * pop_s2_x - 2 * R_pop * rho_pop * sqrt(pop_s2_y * pop_s2_x))
  var_reg <- N^2 * ((1 / n) - (1 / N)) * pop_s2_y * (1 - rho_pop^2)

  # Evaluation metrics
  RB <- 100 * colMeans(cbind(est1, est2, est3) - True.Total) / mean(True.Total)
  RRMSE <- 100 * sqrt(colMeans((cbind(est1, est2, est3) - True.Total)^2)) / mean(True.Total)
  CV <- 100 * apply(cbind(est1, est2, est3), 2, sd) / colMeans(cbind(est1, est2, est3))
  skew <- apply(cbind(est1, est2, est3), 2, moments::skewness)
  kurt <- apply(cbind(est1, est2, est3), 2, moments::kurtosis)
  emp_var <- apply(cbind(est1, est2, est3), 2, var)
  est_var <- c(mean(var_est1), mean(var_est2), mean(var_est3))
  coverage <- colMeans(abs(cbind(est1, est2, est3) - True.Total) <= 1.96 * sqrt(cbind(var_est1, var_est2, var_est3)))

  result <- data.frame(
    Est_Total = c(mean(est1), mean(est2), mean(est3)),
    RB = RB,
    RRMSE = RRMSE,
    Skewness = skew,
    Kurtosis = kurt,
    Coverage = coverage,
    PopVar = c(var_HT, var_ratio, var_reg),
    EmpVar = emp_var,
    EstVar = est_var,
    CV = CV
  )

  rownames(result) <- c("HT", "Ratio", "Regression")
  return(round(result, 3))
}

Example Usage

set.seed(123)
N <- 1000
X <- rnorm(N, mean = 50, sd = 10)
Y <- 3 + 1.5 * X + rnorm(N, mean = 0, sd = 10)

result <- survey_sim_est(Y = Y, X = X, n = 50, SimNo = 100)
print(result)

6. Interpretation of Results

The Horvitz-Thompson estimator is unbiased but may exhibit higher variance.
The Ratio estimator is more efficient when the relationship between ( y ) and ( x ) is strong and passes through origin.
The Regression estimator generally shows lowest RRMSE.

Coverage Rate near 95% indicates good interval performance, while low RB and RRMSE signify estimators unbiasedness and precision.

7. Conclusion

This simulation illustrates the benefits of using auxiliary information in improving the estimation of finite population totals. While the Horvitz-Thompson estimator offers unbiasedness under design-based framework, the ratio and regression estimators leverage the auxiliary variable to reduce variance significantly when assumptions are met.

*Launch the Shiny app
library(surveySimR)
surveySimR::run_survey_sim_app()
* Alternative way of launching shiny app
shiny::runApp(system.file("shiny", package = "surveySimR"))

References

Cochran, W. G. (1977). Sampling Techniques. 3rd Edition. Wiley.
Singh, D. and Chaudhary, F.S. (1986). Theory and Analysis of Sample Survey Designs. John Wiley & Sons, Inc.
Sukhatme, P.V., Sukhatme, B.V., Sukhatme, S. and Asok, C. (1984). Sampling Theory of Surveys with Applications. Iowa State University Press and Indian Society of Agricultural Statistics.
Särndal, C.-E., Swensson, B., & Wretman, J. (1992). Model Assisted Survey Sampling. Springer.

Any scripts or data that you put into this service are public.

surveySimR documentation built on June 8, 2025, 10:37 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

surveySimR
Estimation of Population Total under Complex Sampling Design

Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR"
In surveySimR: Estimation of Population Total under Complex Sampling Design

1. Introduction

2. Estimators and Their Formulas

2.1. Horvitz-Thompson Estimator

2.2. Ratio Estimator

2.3. Regression Estimator

3. Performance Metrics Used in Simulation

4. Monte Carlo Simulation Framework

5. Results Summary

Function: `survey_sim_est`

Function Definition

Example Usage

6. Interpretation of Results

7. Conclusion

References

Try the surveySimR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

surveySimR Estimation of Population Total under Complex Sampling Design

Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR" In surveySimR: Estimation of Population Total under Complex Sampling Design

1. Introduction

2. Estimators and Their Formulas

2.1. Horvitz-Thompson Estimator

2.2. Ratio Estimator

2.3. Regression Estimator

3. Performance Metrics Used in Simulation

4. Monte Carlo Simulation Framework

5. Results Summary

Function: survey_sim_est

Function Definition

Example Usage

6. Interpretation of Results

7. Conclusion

References

Try the surveySimR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

surveySimR
Estimation of Population Total under Complex Sampling Design

Estimation of Finite Population Total under Complex Sampling Design viz. SRSWOR"
In surveySimR: Estimation of Population Total under Complex Sampling Design

Function: `survey_sim_est`