resample_data: Resample data, including hierarchical data

View source: R/resample_data.R

resample_dataR Documentation

Resample data, including hierarchical data

Description

This function allows you to resample any data frame. The default mode performs a single resample of size N with replacement. Users can also specify more complex resampling strategies to resample hierarchical data.

Usage

resample_data(data, N, ID_labels = NULL, unique_labels = FALSE)

Arguments

data

A data.frame, usually provided by the user.

N

The number of sample observations to return. If N is a single scalar and no labels are provided, N will specify the number of unit observations to resample. If N is named, or if the ID_labels argument is specified (in which case, both N and ID_labels should be the same length), then the units resampled will be values of the levels resampled (this is useful for, e.g., cluster resampling). If N is the constant ALL for any level, all units of this level will be transparently passed through to the next level of resampling.

ID_labels

A character vector of the variables that indicate the data hierarchy, from highest to lowest (i.e., from cities to citizens).

unique_labels

A boolean, defaulting to FALSE. If TRUE, fabricatr will created an extra data frame column depicting a unique version of the ID_label variable resampled on, called <ID_label>_unique.

Value

A data.frame

Examples


# Resample a dataset of size N without any hierarchy
baseline_survey <- fabricate(N = 50, Y_pre = rnorm(N))
bootstrapped_data <- resample_data(baseline_survey)

# Specify a fixed number of observations to return
baseline_survey <- fabricate(N = 50, Y_pre = rnorm(N))
bootstrapped_data <- resample_data(baseline_survey, N = 100)

# Resample by a single level of a hierarchical dataset (e.g. resampling
# clusters of observations): N specifies a number of clusters to return

clustered_survey <- fabricate(
  clusters = add_level(N=25),
  cities = add_level(N=round(runif(25, 1, 5)),
                     population=runif(n = N, min=50000, max=1000000))
)

cluster_resample <- resample_data(clustered_survey, N = 5, ID_labels = "clusters")

# Alternatively, pass the level to resample as a name:
cluster_resample_2 <- resample_data(clustered_survey, N=c(clusters = 5))

# Resample a hierarchical dataset on multiple levels
my_data <-
fabricate(
  cities = add_level(N = 20, elevation = runif(n = N, min = 1000, max = 2000)),
  citizens = add_level(N = 30, age = runif(n = N, min = 18, max = 85))
)

# Specify the levels you wish to resample:
my_data_2 <- resample_data(my_data, N = c(3, 5),
                           ID_labels = c("cities", "citizens"))

# To resample every unit at a given level, use the ALL constant
# This example will resample 10 citizens at each of the cities:

passthrough_resample_data <- resample_data(my_data, N = c(cities=ALL, citizens=10))

# To ensure a column with unique labels (for example, to calculate block-level
# statistics irrespective of sample choices), use the unique_labels=TRUE
# argument -- this will produce new columns with unique labels.

unique_resample <- resample_data(my_data, N = c(cities=5), unique_labels = TRUE)


DeclareDesign/fabricatr documentation built on Jan. 31, 2024, 4 a.m.