Description Usage Arguments Details Value Examples
View source: R/sample_implicit.R
sample_implicit
draws a random sample of n units from a data.frame in a way that maximizes
variation on variables of interest. For example, it can randomly sample schools in a way that
ensures the sampled schools have as much variation as possible on key characteristics, like the
percent of students of color or average achievement. Implicit stratification is a common method
to sample units in an educational setting: the NCES frequently uses this approach when
deciding who to survey or test, including for
NCES.
1 | sample_implicit(data, n, ..., size_var = NULL, random_num = 1)
|
data |
is the data.frame on which rows will be sampled |
n |
is the number of rows to be sampled |
... |
are the variables on which to implicitly stratify. In effect, these are the variables on which the data is first sorted. The order in which the variables are listed matters: the first variable listed will have the most variability in the sampled data, so you should list the variables on which you want to stratify in order of decreasing importance, as the variables listed near the end won't have as large of an effect on the stratification. |
size_var |
is a variable indicating the size of the row. This allows you to select a sample that accounts for differences in the size of each unit. For example, if each row represents a school, an appropriate size_var could be the number of students attending the school so that schools serving more students are more likley to be selected. This is important when you are doing multiple stages of sampling, like first sampling schools and then sampling classrooms within schools. Without setting the size_va in this example, each shcool would be equally likely selected, meaning classrooms in small schools would be more likely to be selected because their small school with only a few classrooms has the same chance as being selected as a large school with many classrooms. Default is NULL. |
random_num |
is a random number to control the random sampling process so that results are reproducible. Default is 1. |
sample_implicit
implicitly samples units by first sorting the data on the key variables
indicated. It uses a serpentine sort, which alternates between ascending and descending orders
so that any two adjacent rows in the sorted data are as similar as possible. See
serpentine
for more details about serpentine sorting and
vignette("sample_implicit")
for a longer discussion of why it's useful. Serpentine
sorting is commonly used by NCES to achieve implicit stratification.
A data.frame with equal size as the original data, but sorted differently and with a new
variabled called in_sample
that is TRUE if the row was selected for the same or FALSE
otherwise.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | # Sample 7 cars after implicitly stratifying on gear and mpg.
sampled_cars <- sample_implicit(data = mtcars, n = 7, am, mpg)
sampled_cars
# Once the sample is complete, it's easy to compare sampled to non-sampled cars
library(dplyr)
sampled_cars %>%
group_by(in_sample) %>%
summarize(mean_mpg = mean(mpg))
# Using implicit stratification gets us more variation on variables of interest than just randomly
# selecting rows. For example, if we chose 3 cars, we might not get variability on the variables
# of interest. In this case, sample_implicit got us more variablity on mpg than a simple random
# sample
set.seed(12)
simplesample <- sample_n(mtcars, 3)
implicitsample <- sample_implicit(data = mtcars, n = 3, am, mpg)
count(simplesample, am)
implicitsample %>%
filter(in_sample) %>%
count(am)
# You'll get different, but reproducible results if you change the random number
sampled_cars1 <- sample_implicit(data = mtcars, n = 5, am, mpg)
sampled_cars2 <- sample_implicit(data = mtcars, n = 5, am, mpg, random_num = 2)
sampled_cars1
sampled_cars2
# If you have a variable that represents size, it's easy to account for that when selecting
# the sample
sample_implicit(data = mtcars, n = 5, am, mpg, size_var = hp)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.