epi.ssclus1estb: Sample size to estimate a binary outcome using one-stage...
In epiR: Tools for the Analysis of Epidemiological Data

epi.ssclus1estb

R Documentation

Sample size to estimate a binary outcome using one-stage cluster sampling

Description

Sample size to estimate a binary outcome using one-stage cluster sampling.

Usage

epi.ssclus1estb(N.psu = NA, b, Py, epsilon, error = "relative", 
   rho, nfractional = FALSE, conf.level = 0.95)

Arguments

`N.psu`	scalar integer, the total number of primary sampling units eligible for inclusion in the study. If `N = NA` the eligible primary sampling unit population size is assumed to be infinite.
`b`	scalar integer or vector of length two, the number of individual listing units in each cluster to be sampled. See details, below.
`Py`	scalar number, an estimate of the unknown population proportion.
`epsilon`	scalar number, the maximum difference between the estimate and the unknown population value expressed in absolute or relative terms.
`error`	character string. Options are `absolute` for absolute error and `relative` for relative error.
`rho`	scalar number, the intracluster correlation coefficient.
`nfractional`	logical, return fractional sample size.
`conf.level`	scalar, defining the level of confidence in the computed result.

Details

In many situations it is common for sampling units to be aggregated into clusters. Typical examples include individuals within households, children within classes (within schools) and cows within herds. We use the term primary sampling unit (PSU) to refer to what gets sampled first (clusters) and secondary sampling unit (SSU) to refer to what gets sampled second (individual listing units within each cluster). In this documentation the terms primary sampling unit and cluster are used interchangeably. Similarly, the terms secondary sampling unit and individual listing units are used interchangeably.

b as a scalar integer represents the total number of individual listing units from each cluster to be sampled. If b is a vector of length two the first element represents the mean number of individual listing units to be sampled from each cluster and the second element represents the standard deviation of the number of individual listing units to be sampled from each cluster.

At least 25 PSUs (clusters) are recommended for one-stage cluster sampling designs. If less than 25 PSUs are returned by the function a warning is issued.

If a value is entered for N.psu a finite population correction factor (Cochran 1977) is applied to the calculated sample size estimate. That is:

n_{adj} = \frac{n}{1 + \left(\frac{n}{N}\right)}

Value

A list containing the following:

`n.psu`	the total number of primary sampling units (clusters) to be sampled for the specified level of confidence and relative error.
`n.ssu`	the total number of secondary sampling units to be sampled for the specified level of confidence and relative error.
`DEF`	the design effect.
`rho`	the intracluster correlation coefficient, as entered by the user.

References

Cochran WG (1977). Sampling Techniques. John Wiley and Sons, London.

Levy PS, Lemeshow S (1999). Sampling of Populations Methods and Applications. Wiley Series in Probability and Statistics, London, pp. 258.

Machin D, Campbell MJ, Tan SB, Tan SH (2018). Sample Sizes for Clinical, Laboratory ad Epidemiological Studies, Fourth Edition. Wiley Blackwell, London, pp. 195 - 214.

Examples

## EXAMPLE 1:
## An aid project has distributed cook stoves in a single province in a 
## resource-poor country. At the end of three years, the donors would like 
## to know what proportion of households are still using their donated  
## stove. A cross-sectional study is planned where villages in the province 
## will be sampled and all households (approximately 75 per village) will be 
## visited to determine whether or not the donated stove is still in use.
## A pilot study of the prevalence of stove usage in five villages 
## showed that 0.46 of householders were still using their stove. The 
## intracluster correlation for a study of this type is unknown, but thought
## to be relatively high (rho = 0.20).

# If the donor wanted to be 90% confident that the survey estimate of stove
## usage was within 10% of the true population value, how many villages 
## (i.e., clusters) would need to be sampled?

epi.ssclus1estb(N.psu = NA, b = 75, Py = 0.46, epsilon = 0.10, 
   error = "relative", rho = 0.20, nfractional = FALSE, conf.level = 0.90)

## A total of 67 villages need to be sampled to meet the specifications 
## of this study.

## Now imagine the situation where the number of households per village 
## varies. We are told that the average number of households per village is
## 75 with the 0.025 quartile 40 households and the 0.975 quartile 180 
## households. The expected standard deviation of the number of households
## per village is (180 - 40) / 4 = 35. How many villages need to be sampled?

epi.ssclus1estb(N.psu = NA, b = c(75,35), Py = 0.46, epsilon = 0.10, 
   error = "relative", rho = 0.20, nfractional = FALSE, conf.level = 0.90)

## A total of 81 villages need to be sampled to meet the specifications 
## of this study.

## Now imagine the situation where this study is to be carried out on a 
## remote island where the total number of villages is 220. Recalculate
## your sample size.

epi.ssclus1estb(N.psu = 220, b = c(75,35), Py = 0.46, epsilon = 0.10, 
   error = "relative", rho = 0.20, nfractional = FALSE, conf.level = 0.90)

## A total of 60 villages need to be sampled to meet the specifications 
## of this study.

epiR documentation built on July 17, 2026, 9:07 a.m.