strata.data: Stratification of Univariate Survey Population Using the Data

Description Usage Arguments Value Author(s) See Also Examples

View source: R/strata.data.R View source: R/strata.data.R

Description

This function takes in the univariate population data (argument data) and a fixed sample size (n) to compute the optimum stratum boundaries (OSB) for a given number of strata (L), optimum sample sizes (nh), etc. directly from the data. The main idea used is from Khan et al (2008) whereby the problem of stratification is formulated into a Mathematical Programming Problem (MPP) using the best-fit frequency distribution and its parameters estimated from the data. This MPP is then solved for the OSB using a Dynamic Programming (DP) solution procedure.

This function takes in the univariate population data (argument data) and a fixed sample size (n) to compute the optimum stratum boundaries (OSB) for a given number of strata (L), optimum sample sizes (nh), etc. directly from the data. The main idea used is from Khan et al (2008) whereby the problem of stratification is formulated into a Mathematical Programming Problem (MPP) using the best-fit frequency distribution and its parameters estimated from the data. This MPP is then solved for the OSB using a Dynamic Programming (DP) solution procedure.

Usage

1
2
3
strata.data(data, h, n, cost = FALSE, ch = NULL)

strata.data(data, h, n, cost = FALSE, ch = NULL)

Arguments

data

A vector of values of the survey variable y for which the OSB are determined

h

A numeric: denotes the number of strata to be created.

n

A numeric: denotes a fixed total sample size.

cost

A logical: has default cost=FALSE. If it is a stratum-cost problem, cost=TRUE, with which, one must provide the Ch parameter.

ch

A numeric: denotes a vector of stratum costs. When cost=FALSE, it has a default of NULL.

Value

strata.data returns Optimum Strata Boundaries (OSB), stratum weights (Wh), stratum variances (Vh), Optimum Sample Sizes (nh), stratum population sizes (Nh) and sampling fraction (fh).

strata.data returns Optimum Strata Boundaries (OSB), stratum weights (Wh), stratum variances (Vh), Optimum Sample Sizes (nh), stratum population sizes (Nh) and sampling fraction (fh).

Author(s)

Karuna Reddy <karuna.reddy@usp.ac.fj>
MGM Khan <khan_mg@usp.ac.fj>

Karuna Reddy <karuna.reddy@usp.ac.fj>
MGM Khan <khan_mg@usp.ac.fj>

See Also

strata.distr

strata.distr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
## Not run: 
data <- rweibull(1000, shape=2, scale = 1.5)
hist(data)
obj <- strata.data(data, h = 2, n=300)
summary(obj)
#-------------------------------------------------------------
data(anaemia)
Iron <- anaemia$Iron
res <- strata.data(Iron, h = 2, n=350)
summary(res)
#-------------------------------------------------------------
data(SHS) #Household Spending data from stratification package
weight <- SHS$WEIGHT
hist(weight); length(weight)
res <- strata.data(weight, h = 2, n=500)
summary(res)
#-------------------------------------------------------------
data(sugarcane)
Production <- sugarcane$Production
hist(Production)
res <- strata.data(Production, h = 2, n=1000)
summary(res)
#-------------------------------------------------------------
#The function be dynamically used to visualize the the strata boundaries, 
#for 2 strata, over the density (or observations) of the "mag" variable 
#from the quakes data (with purrr and ggplot2 packages loaded).
output <- quakes %>%
          pluck("mag") %>%
          strata.data(h = 2, n = 300)
quakes %>% 
      ggplot(aes(x = mag)) +
      geom_density(fill = "blue", colour = "black", alpha = 0.3) +
      geom_vline(xintercept = output$OSB, linetype = "dotted", color = "red")
#-------------------------------------------------------------

## End(Not run)

## Not run: 
data <- rweibull(1000, shape=2, scale = 1.5)
hist(data)
obj <- strata.data(data, h = 2, n=300)
summary(obj)
#-------------------------------------------------------------
data(anaemia)
Iron <- anaemia$Iron
res <- strata.data(Iron, h = 2, n=350)
summary(res)
#-------------------------------------------------------------
data(SHS) #Household Spending data from stratification package
weight <- SHS$WEIGHT
hist(weight); length(weight)
res <- strata.data(weight, h = 2, n=500)
summary(res)
#-------------------------------------------------------------
data(sugarcane)
Production <- sugarcane$Production
hist(Production)
res <- strata.data(Production, h = 2, n=1000)
summary(res)
#-------------------------------------------------------------
#The function be dynamically used to visualize the the strata boundaries, 
#for 2 strata, over the density (or observations) of the "mag" variable 
#from the quakes data (with purrr and ggplot2 packages loaded).
output <- quakes %>%
          pluck("mag") %>%
          strata.data(h = 2, n = 300)
quakes %>% 
      ggplot(aes(x = mag)) +
      geom_density(fill = "blue", colour = "black", alpha = 0.3) +
      geom_vline(xintercept = output$OSB, linetype = "dotted", color = "red")
#-------------------------------------------------------------

## End(Not run)

stratifyR documentation built on Dec. 11, 2021, 9:25 a.m.