stratify | R Documentation |
This function implements the "cumulative square root frequency method" (Dalenius & Hodges, 1959) for determining the approximately optimal stratification of elements for stratified random sampling with Neyman allocation.
stratify(x, strata, breaks)
x |
An auxiliary variable to be used for stratification. |
strata |
Number of strata. |
breaks |
Breaks for the auxiliary variable expressed as a vector of cut points. |
See Dalenius and Hodges (1959) or Cochran (1977) for details. Ideally the auxiliary variable should be strongly correlated with the target variable.
A list object including a data frame giving the strata assignment of the elements and the cut points that define the strata in terms of the auxiliary variable.
Cochran, W. G. (1977). Sampling techniques (3rd Edition). New York: Wiley.
Dalenius, T. & Hodges, J. L. Jr. (1959). Minimum variance stratification. Journal of the American Statistical Assocation, 54, 88-101.
# replication of an example from Cochran (1977)
x <- rep(seq(2.5, 97.5, by = 5), c(3464, 2516, 2157, 1581, 1142,
746, 512, 376, 265, 207, 126, 107, 82, 50, 39, 25, 16, 19, 2, 3))
stratify(x, strata = 5, breaks = seq(0, 100, by = 5))
# artificial data with a normally-distributed auxiliary variable
set.seed(101)
x <- rnorm(10000, 20, 3)
stratify(x, strata = 4, breaks = 25)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.