Description Usage Arguments Note Author(s) Examples
The stratified
function samples from a
data.frame
in which one of the columns can be used as a
"stratification" or "grouping" variable. The result is a new
data.frame
with the specified number of samples from each group.
1 |
df |
The source |
group |
Your grouping variables. Generally, if you are using more than one variable to create your "strata", you should list them in the order of slowest varying to quickest varying. This can be a vector of names or column indexes. |
size |
The desired sample size.
|
select |
A named list containing levels from the "group" variables in
which you are interested. The list names must be present as variable names
for the input |
replace |
Logical. Should the sampling be done with replacement? |
bothSets |
Logical. Should just the samples be returned, or a |
Slightly different sizes than requested
Because of how computers deal with floating-point arithmetic, and because R uses a "round to even" approach, the size per strata that results when specifying a proportionate sample may be slightly higher or lower per strata than you might have expected.
Ananda Mahto
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | # Generate a couple of sample data.frames to play with
set.seed(1)
dat1 <- data.frame(ID = 1:100,
A = sample(c("AA", "BB", "CC", "DD", "EE"), 100, replace = TRUE),
B = rnorm(100), C = abs(round(rnorm(100), digits=1)),
D = sample(c("CA", "NY", "TX"), 100, replace = TRUE),
E = sample(c("M", "F"), 100, replace = TRUE))
dat2 <- data.frame(ID = 1:20,
A = c(rep("AA", 5), rep("BB", 10),
rep("CC", 3), rep("DD", 2)))
# What do the data look like in general?
summary(dat1)
summary(dat2)
# Let's take a 10% sample from all -A- groups in dat1
stratified(dat1, "A", .1)
# Let's take a 10% sample from only "AA" and "BB" groups from -A- in dat1
stratified(dat1, "A", .1, select = list(A = c("AA", "BB")))
# Let's take 5 samples from all -D- groups in dat1,
# specified by column number
stratified(dat1, group = 5, size = 5)
# Let's take a sample from all -A- groups in dat1,
# where we specify the number wanted from each group
stratified(dat1, "A", size = c(3, 5, 4, 5, 2))
# Use a two-column strata: -E- and -D-
# -E- varies more slowly, so it is better to put that first
stratified(dat1, c("E", "D"), size = .15)
# Use a two-column strata (-E- and -D-) but only interested in
# cases where -E- == "M"
stratified(dat1, c("E", "D"), .15, select = list(E = "M"))
## As above, but where -E- == "M" and -D- == "CA" or "TX"
stratified(dat1, c("E", "D"), .15,
select = list(E = "M", D = c("CA", "TX")))
# Use a three-column strata: -E-, -D-, and -A-
s.out <- stratified(dat1, c("E", "D", "A"), size = 2)
list(head(s.out), tail(s.out))
# How many samples were taken from each strata?
table(interaction(s.out[c("E", "D", "A")]))
# Can we verify the message about group sizes?
names(which(table(interaction(dat1[c("E", "D", "A")])) < 2))
names(which(table(interaction(s.out[c("E", "D", "A")])) < 2))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.