Description Usage Arguments Value Note Author(s) See Also Examples
The stratified
function samples from a data.table
in which one or more
columns can be used as a "stratification" or "grouping" variable. The result
is a new data.table
with the specified number of samples from each group.
1 2 |
indt |
The input |
group |
The column or columns that should be used to create the groups. Can be a character vector of column names (recommended) or a numeric vector of column positions. Generally, if you are using more than one variable to create your "strata", you should list them in the order of slowest varying to quickest varying. This can be a vector of names or column indexes. |
size |
The desired sample size.
|
select |
A named list containing levels from the |
replace |
Logical. Should sampling be with replacement? Defaults to |
keep.rownames |
Logical. If the input is a |
bothSets |
Logical. Should both the sampled and non-sampled sets be
returned as a |
... |
Optional arguments to |
If bothSets = TRUE
, a list
of two data.tables
; otherwise, a data.table
.
Slightly different sizes than requested: Because of how computers deal with floating-point arithmetic, and because R uses a "round to even" approach, the size per strata that results when specifying a proportionate sample may be one sample higher or lower per strata than you might have expected.
Ananda Mahto
sampling::strata()
from the "strata" package; dplyr::sample_n()
and dplyr::sample_frac()
from "dplyr".
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | # Generate a sample data.frame to play with
set.seed(1)
DF <- data.frame(
ID = 1:100,
A = sample(c("AA", "BB", "CC", "DD", "EE"), 100, replace = TRUE),
B = rnorm(100), C = abs(round(rnorm(100), digits=1)),
D = sample(c("CA", "NY", "TX"), 100, replace = TRUE),
E = sample(c("M", "F"), 100, replace = TRUE))
# Take a 10% sample from all -A- groups in DF
stratified(DF, "A", .1)
# Take a 10% sample from only "AA" and "BB" groups from -A- in DF
stratified(DF, "A", .1, select = list(A = c("AA", "BB")))
# Take 5 samples from all -D- groups in DF, specified by column number
stratified(DF, group = 5, size = 5)
# Use a two-column strata: -E- and -D-
stratified(DF, c("E", "D"), size = .15)
# Use a two-column strata (-E- and -D-) but only use cases where -E- == "M"
stratified(DF, c("E", "D"), .15, select = list(E = "M"))
## As above, but where -E- == "M" and -D- == "CA" or "TX"
stratified(DF, c("E", "D"), .15, select = list(E = "M", D = c("CA", "TX")))
# Use a three-column strata: -E-, -D-, and -A-
stratified(DF, c("E", "D", "A"), size = 2)
## Not run:
# The following will produce errors
stratified(DF, "D", c(5, 3))
stratified(DF, "D", c(5, 3, 2))
## End(Not run)
# Sizes using a named vector
stratified(DF, "D", c(CA = 5, NY = 3, TX = 2))
# Works with multiple groups as well
stratified(DF, c("D", "E"),
c("NY F" = 2, "NY M" = 3, "TX F" = 1, "TX M" = 1,
"CA F" = 5, "CA M" = 1))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.