StratSel: Stratification and selection

Description Usage Arguments Details Value Author(s) References Examples

Description

Merge two data frames (whatever PSU population and allocation data frames) and then compute stratification and selection of a fixed number of sample PSUs (Primary Sampling Units) for stratum using the Sampford's method (unequal probabilities, without replacement, fixed sample size), implemented by the UPsampford function of R package sampling. About the stratification, the function realizes for each domain estimation, the computation of a size threshold for a given size PSU. PSUs with measure of size that exceeds a calculated threshold are identify like SR (Self Representative) and each constitutes a stratum by itself, so they come into the sample with probability equal to one. The remaining NSR (Non Self Representative) PSUs are ordered for the measure of size and divided into stratum having size approximately constant to the corrected threshold and with PSUs having sizes as homogeneous as possible.
If the number of NSR PSUs in each stratum is greater than the number of sample PSUs for each stratum, then it is indispensable to define the vector of the inclusion probabilities (pik), argument of UPsampford function, vice versa NSR PSUs will become SR PSUs. If some pik values are greater than 1, pik is newly calculated, for those PSUs belonging to the same domain and stratum of the PSUs having pik >1, until pik values will be all less than 1, then applies the UPsampford function.

Usage

1
2
3
StratSel(dataPop, idpsu, dom, final_pop, size, PSUsamplestratum, min_sample, 
		min_sample_index = FALSE, dataAll, domAll, f_sample,
		planned_min_sample = NULL, launch = TRUE)

Arguments

dataPop

PSU population data frame.

idpsu

Formula identifying primary sample units identifier.

dom

Formula identifying the variable domain.

final_pop

Formula identifying secondary stage units.

size

Formula identifying first stage units size.

PSUsamplestratum

Number of sample PSUs to select to each stratum.

min_sample

Number of final sample units to observe in each PSU.

min_sample_index

Identify the absence of planned_min_sample (default = FALSE)

dataAll

Allocation data frame.

domAll

Formula identifying the domain variable for allocation data frame.

f_sample

Formula identifying final sample units.

planned_min_sample

Formula identifying planned final sample units to observe in each PSU, variable between domains; NULL (the default) means the existence of a fixed number of them identify from min_sample.

launch

Identify the parameter related to the launch procedure. If default = TRUE the launch is partial (see 'Details').

Details

It is possible to launch the procedure in two separate steps, so that user can see a first output (launch = TRUE) and decide if modify the input parameters or continue the procedure setting launch = FALSE.

Value

An object of class list or data.frame depending from argument launch. If launch = TRUE the only output is, at domain level, a data frame of the Self Representative (SR) PSUs and of the Non Self Representative (NSR) PSUs before stratification. If launch = FALSE the output is a list composed by four members of data.frame class. The first component, at domain level, is the same data frame obtained if launch = TRUE. The second element is, at domain level, a data frame of SR and NSR PSUs after stratification, with their totals like the first output, but with additional information such as the final units sample size distinctly for SR and NSR PSUs and their mean. The third component is a data frame that supply, for each stratum, the number of sample PSUs selected and the total number of PSUs. The fourth element provides, for each PSU, some information like the inclusion probability and the sampling fraction.

Author(s)

Raffaella Cianchetta

References

S. Falorsi A. Russo (2001), Il disegno di rilevazione per indagini Panel sulle famiglie, Rivista di Statistica Ufficiale, N. 3, p. 55-90.
Sampford, M. (1967), On sampling without replacement with unequal probabilities of selection, Biometrika, 54:499-513.
Yves Tille' and Alina Matei (2012). sampling: Survey Sampling. R package version 2.5.
http://CRAN.R-project.org/package=sampling.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Start StratSel
## Not run: 
load(FS4)
data(population)
data(allocation)

# The function with a fixed number of final sample units (min_sample= 8) 
# to observe in each PSU and partial launch of the procedure with
# only one output data frame(see 'Value')   
Output_list <- StratSel(dataPop= population, idpsu= ~ comune_num, dom= ~ dom,
               final_pop= ~ fam, size= ~ pop, PSUsamplestratum= 6, min_sample= 8,
               min_sample_index= FALSE, dataAll= allocation, domAll= ~ dom,
               f_sample= ~ campdom, planned_min_sample= NULL, launch= TRUE)
			   
# It can also be written as below due to default values:
Output_list <- StratSel(dataPop= population, idpsu= ~ comune_num, dom= ~ dom,
               final_pop= ~ fam, size= ~ pop, PSUsamplestratum= 6, min_sample= 8, 
               dataAll= allocation, domAll= ~ dom, f_sample= ~ campdom)


# The function with a fixed number of final sample units (min_sample= 8)
# to observe in each PSU and full launch of the procedure with 
# the output list composed of four data frames(see 'Value')   
Output_list <- StratSel(dataPop= population, idpsu= ~ comune_num, dom= ~ dom,
               final_pop= ~ fam,  size= ~ pop, PSUsamplestratum= 6, min_sample= 8, 
               dataAll= allocation, domAll= ~ dom, f_sample= ~ campdom, launch= FALSE)
			   

# The function with a variable number of final sample units (planned_min_sample= 
# ~ planned_final_sample) and partial launch of the procedure
Output_list <- StratSel(dataPop= population, idpsu= ~ comune_num, dom= ~ dom, 
               final_pop= ~ fam, size= ~ pop, PSUsamplestratum= 6, min_sample= NULL,
               min_sample_index= TRUE, dataAll= allocation, domAll= ~ dom,
               f_sample= ~ campdom, planned_min_sample= ~ planned_final_sample)

			   
# The function with a variable number of final sample units (planned_min_sample=
# ~ planned_final_sample) and full launch of the procedure with the output list 
#composed of four data frames		
Output_list <- StratSel(dataPop= population, idpsu= ~ comune_num, dom= ~ dom, 
               final_pop= ~ fam, size= ~ pop, PSUsamplestratum= 6, min_sample= NULL,
               min_sample_index= TRUE, dataAll= allocation, domAll= ~ dom,
               f_sample= ~ campdom, planned_min_sample= ~ planned_final_sample,
               launch= FALSE)	   

## End(Not run)

barcaroli/R2BEATold documentation built on Jan. 2, 2021, 7:09 p.m.