strata.rule | R Documentation |
These functions first determine boundaries to stratify a population. Then, in a second independent step, the sample sizes are calculated given a CV or the CV is computed given the total sample size. The function strata.cumrootf
uses the cumulative root frequency method by Dalenius and Hodges (1959) and strata.geo
uses the geometric method by Gunning and Horgan (2004). A model can be specified for the relationship between the stratification variable X and the survey variable Y, but this model has no impact on the first step of boundary determination. It only influences the calculation of the n or of the CV by the use of anticipated means and variances of Y instead of the empirical means and variances of X.
strata.cumrootf(x, n = NULL, CV = NULL, Ls = 3, certain = NULL, alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls), model = c("none", "loglinear", "linear", "random"), model.control = list(), nclass = NULL) strata.geo(x, n = NULL, CV = NULL, Ls = 3, certain=NULL, alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls), model = c("none", "loglinear", "linear", "random"), model.control = list())
x |
A vector containing the values of the stratification variable X for every unit in the population. |
n |
A numeric: the target sample size. It has no default value. The argument |
CV |
A numeric: the target coefficient of variation. It has no default value. The argument |
Ls |
A numeric: the number of sampled strata (take-none and certain strata are not counted in |
certain |
A vector giving the position, in the vector |
alloc |
A list specifying the allocation scheme. The list must contain 3 numerics for the 3 exponents |
rh |
A vector giving the anticipated response rates in each of the |
model |
A character string identifying the model used to describe the discrepancy between the stratification variable X and the survey variable Y. It can be |
model.control |
A list of model parameters (see |
nclass |
A numeric for the cumulative root frequency method only: the number of classes (Dalenius and Hodges 1959). The default (see Details) is |
The efficiency of the cumulative root frequency method depends on the number of classes nclass
(see Dalenius and Hodges (1959) for a description of these classes). However, there is no theory about how to choose the best value for nclass
(Hedlin 2000). This is a limit of the method.
bh |
A vector of the L-1 stratum boundaries proposed by the method. |
nclassh |
A vector for the cumulative root frequency method only: the number of classes in each stratum (Dalenius and Hodges 1959). |
Nh |
A vector of length L containing the population sizes Nh, i.e. the number of units in each stratum. |
nh |
A vector of length L containing the sample sizes nh, i.e. the number of units to sample in each stratum. See |
n |
The total sample size ( |
nhnonint |
A vector of length L containing the non-integer values of the sample sizes, obtained directly from applying the allocation rule (see |
certain.info |
A vector giving statistics for the certainty stratum (see |
opti.nh |
The final value of the criteria to optimize (either the total sample size n if a target |
opti.nhnonint |
The final value of the criteria to optimize (either the total sample size n if a target |
meanh |
A vector of length L containing the anticipated means of Y in each stratum. |
varh |
A vector of length L containing the anticipated variances of Y in each stratum. |
mean |
A numeric: the anticipated global mean value of Y. |
stderr |
A numeric: the standard error of the anticipated global mean of Y. |
CV |
The anticipated coefficient of variation for the mean of Y, i.e. |
stratumID |
A factor, having the same length as the input |
takeall |
The number of take-all strata in the final solution. Note: It is possible that n_h=N_h for non take-all strata because the condition for an automatic addition of a take-all stratum is n_h>N_h. |
call |
The function call (object of class "call"). |
date |
A character string that contains the system date and time when the function ended. |
args |
A list of all the argument values input to the function or set by default. |
Sophie Baillargeon Sophie.Baillargeon@mat.ulaval.ca and
Louis-Paul Rivest Louis-Paul.Rivest@mat.ulaval.ca
Baillargeon, S. and Rivest L.-P. (2011). The construction of stratified designs in R with the package stratification. Survey Methodology, 37(1), 53-65.
Dalenius, T. and Hodges, J.L., Jr. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54, 88-101.
Gunning, P. and Horgan, J.M. (2004). A new algorithm for the construction of stratum boundaries in skewed populations. Survey Methodology, 30(2), 159-166.
Hedlin, D. (2000). A procedure for stratification by an extended Ekman rule. Journal of Official Statistics, 61, 15-29.
print.strata
, plot.strata
, strata.LH
### Example for strata.cumrootf res <- matrix(NA, nrow=20, ncol=2) i <- 1 for ( n in seq(100,2000,100)){ cum <- strata.cumrootf(x=MRTS, CV=0.01, Ls=4, alloc=c(0.5,0,0.5), nclass=n) res[i,] <- c(n,cum$n) i <- i + 1 } plot(res, ylab="suggested sample size n", xlab="number of classes", main=expression( paste("Example of the effect of nclass on n for the cum",sqrt(f)," method"))) ### Example for strata.geo strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="none") strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1)) strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), rh=0.85, model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1)) # When non-response or a model is added, the stratum boundaries do not change, # only the nh's do. ### Exemple of how a certainty stratum can be usefull with these methods strata.cumrootf(x=Sweden$REV84, CV=0.05, Ls=4, alloc=c(0.35,0.35,0), model="none", nclass=50) strata.cumrootf(x=sort(Sweden$REV84), CV=0.05, Ls=4, alloc=c(0.35,0.35,0), certain=282:284, model="none", nclass=50) # The certainty stratum is used here to ensure that the three large units in the # Sweden$REV84 population are in the sample, since no take-all stratum can be forced # in the stratified design with the cumulative root frequency or geometric method. # We see that this allows to reduce by more than half the suggested sample size n # (47 vs 19). This example was presented in Baillargeon and Rivest (2011).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.