# strata.rule: Non-Iterative Methods of Strata Construction In stratification: Univariate Stratification of Survey Populations

## Description

These functions first determine boundaries to stratify a population. Then, in a second independent step, the sample sizes are calculated given a CV or the CV is computed given the total sample size. The function `strata.cumrootf` uses the cumulative root frequency method by Dalenius and Hodges (1959) and `strata.geo` uses the geometric method by Gunning and Horgan (2004). A model can be specified for the relationship between the stratification variable X and the survey variable Y, but this model has no impact on the first step of boundary determination. It only influences the calculation of the n or of the CV by the use of anticipated means and variances of Y instead of the empirical means and variances of X.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```strata.cumrootf(x, n = NULL, CV = NULL, Ls = 3, certain = NULL, alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls), model = c("none", "loglinear", "linear", "random"), model.control = list(), nclass = NULL) strata.geo(x, n = NULL, CV = NULL, Ls = 3, certain=NULL, alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls), model = c("none", "loglinear", "linear", "random"), model.control = list()) ```

## Arguments

 `x` A vector containing the values of the stratification variable X for every unit in the population. `n` A numeric: the target sample size. It has no default value. The argument `n` or the argument `CV` must be input. `CV` A numeric: the target coefficient of variation. It has no default value. The argument `CV` or the argument `n` must be input. `Ls` A numeric: the number of sampled strata (take-none and certain strata are not counted in `Ls`, but here no take-none stratum can be added to the stratified design so `Ls` is in fact always equal to L). The default is 3. `certain` A vector giving the position, in the vector `x`, of the units that must be included in the sample (see `stratification-package`). By default `certain` is `NULL`, which means that no units are chosen a priori to be in the sample. `alloc` A list specifying the allocation scheme. The list must contain 3 numerics for the 3 exponents `q1`, `q2` and `q3` in the general allocation scheme (see `stratification-package`). The default is Neyman allocation (`q1`=`q3`=0.5 and `q2`=0) `rh` A vector giving the anticipated response rates in each of the `Ls` sampled strata. A single number can be given if the rates do not vary among strata. The default is 1 in each stratum. `model` A character string identifying the model used to describe the discrepancy between the stratification variable X and the survey variable Y. It can be `"none"` if one assumes Y=X, `"loglinear"` for the loglinear model with mortality, `"linear"` for the heteroscedastic linear model or `"random"` for the random replacement model (see `stratification-package` for a description of these models). The default is `"none"`. `model.control` A list of model parameters (see `stratification-package`). The default values of the parameters correspond to the model Y=X. `nclass` A numeric for the cumulative root frequency method only: the number of classes (Dalenius and Hodges 1959). The default (see Details) is `min(Ls*15, Nu)` where `Nu` is the number of unique values in the `x`-vector from which units in the certainty stratum, if any, heve been removed.

## Details

The efficiency of the cumulative root frequency method depends on the number of classes `nclass` (see Dalenius and Hodges (1959) for a description of these classes). However, there is no theory about how to choose the best value for `nclass` (Hedlin 2000). This is a limit of the method.

## Value

 `bh ` A vector of the L-1 stratum boundaries proposed by the method. `nclassh` A vector for the cumulative root frequency method only: the number of classes in each stratum (Dalenius and Hodges 1959). `Nh ` A vector of length L containing the population sizes Nh, i.e. the number of units in each stratum. `nh ` A vector of length L containing the sample sizes nh, i.e. the number of units to sample in each stratum. See `stratification-package` for information about the rounding used to get these integer values. `n ` The total sample size (`sum(nh)`). `nhnonint ` A vector of length L containing the non-integer values of the sample sizes, obtained directly from applying the allocation rule (see `stratification-package`). `certain.info ` A vector giving statistics for the certainty stratum (see `stratification-package`). It contains `Nc`, the number of units chosen a priori to be in the sample, and `meanc`, the anticipated mean of Y for these units. `opti.nh ` The final value of the criteria to optimize (either the total sample size n if a target `CV` was given or the RRMSE if a target `n` was given) calculated with the integer stratum sample sizes `nh`. `opti.nhnonint ` The final value of the criteria to optimize (either the total sample size n if a target `CV` was given or the RRMSE if a target `n` was given) calculated with the non-integer stratum sample sizes `nhnonint`. `meanh ` A vector of length L containing the anticipated means of Y in each stratum. `varh ` A vector of length L containing the anticipated variances of Y in each stratum. `mean ` A numeric: the anticipated global mean value of Y. `stderr ` A numeric: the standard error of the anticipated global mean of Y. `CV` The anticipated coefficient of variation for the mean of Y, i.e. `stderr` divided `mean`. `stratumID` A factor, having the same length as the input `x`, which values are either 1, 2, ..., L or `"certain"`. The value `"certain"` is given to units a priori chosen to be in the sample. This factor identifies, for each observation, the stratum to which it has been assigned. `takeall ` The number of take-all strata in the final solution. Note: It is possible that n_h=N_h for non take-all strata because the condition for an automatic addition of a take-all stratum is n_h>N_h. `call ` The function call (object of class "call"). `date ` A character string that contains the system date and time when the function ended. `args ` A list of all the argument values input to the function or set by default.

## Author(s)

Sophie Baillargeon Sophie.Baillargeon@mat.ulaval.ca and
Louis-Paul Rivest Louis-Paul.Rivest@mat.ulaval.ca

## References

Baillargeon, S. and Rivest L.-P. (2011). The construction of stratified designs in R with the package stratification. Survey Methodology, 37(1), 53-65.

Dalenius, T. and Hodges, J.L., Jr. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54, 88-101.

Gunning, P. and Horgan, J.M. (2004). A new algorithm for the construction of stratum boundaries in skewed populations. Survey Methodology, 30(2), 159-166.

Hedlin, D. (2000). A procedure for stratification by an extended Ekman rule. Journal of Official Statistics, 61, 15-29.

`print.strata`, `plot.strata`, `strata.LH`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ```### Example for strata.cumrootf res <- matrix(NA, nrow=20, ncol=2) i <- 1 for ( n in seq(100,2000,100)){ cum <- strata.cumrootf(x=MRTS, CV=0.01, Ls=4, alloc=c(0.5,0,0.5), nclass=n) res[i,] <- c(n,cum\$n) i <- i + 1 } plot(res, ylab="suggested sample size n", xlab="number of classes", main=expression( paste("Example of the effect of nclass on n for the cum",sqrt(f)," method"))) ### Example for strata.geo strata.geo(x=Sweden\$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="none") strata.geo(x=Sweden\$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1)) strata.geo(x=Sweden\$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), rh=0.85, model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1)) # When non-response or a model is added, the stratum boundaries do not change, # only the nh's do. ### Exemple of how a certainty stratum can be usefull with these methods strata.cumrootf(x=Sweden\$REV84, CV=0.05, Ls=4, alloc=c(0.35,0.35,0), model="none", nclass=50) strata.cumrootf(x=sort(Sweden\$REV84), CV=0.05, Ls=4, alloc=c(0.35,0.35,0), certain=282:284, model="none", nclass=50) # The certainty stratum is used here to ensure that the three large units in the # Sweden\$REV84 population are in the sample, since no take-all stratum can be forced # in the stratified design with the cumulative root frequency or geometric method. # We see that this allows to reduce by more than half the suggested sample size n # (47 vs 19). This example was presented in Baillargeon and Rivest (2011). ```