View source: R/aaa_fabricate.R
| fabricate | R Documentation |
fabricate helps you simulate a dataset before you collect it. You can
either start with your own data and add simulated variables to it (by passing
data to fabricate()) or start from scratch by defining
N. Create hierarchical data with multiple levels of data such as
citizens within cities within states using add_level() or modify
existing hierarchical data using modify_level(). You can use any R
function to create each variable. Use cross_levels() and
link_levels() to make more complex designs such as panel or
cross-classified data.
fabricate(..., data = NULL, N = NULL, ID_label = NULL)
add_level(N = NULL, ..., nest = TRUE)
modify_level(..., by = NULL)
nest_level(N = NULL, ...)
... |
Variable or level-generating arguments, such as
|
data |
(optional) user-provided data that forms the basis of the
fabrication, e.g. you can add variables to existing data. Provide either
|
N |
(optional) number of units to draw. If provided as
|
ID_label |
(optional) variable name for ID variable, e.g. citizen_ID. Set to NA to suppress the creation of an ID variable. |
nest |
(Default TRUE) Boolean determining whether data in an
|
by |
(optional) quoted name of variable |
We also provide several built-in options to easily create variables, including
draw_binary, draw_count, draw_likert,
and intra-cluster correlated variables draw_binary_icc and
draw_normal_icc
data.frame
link_levels
# Draw a single-level dataset with a covariate
building_df <- fabricate(
N = 100,
height_ft = runif(N, 300, 800)
)
head(building_df)
# Start with existing data instead
building_modified <- fabricate(
data = building_df,
rent = rnorm(N, mean = height_ft * 100, sd = height_ft * 30)
)
# Draw a two-level hierarchical dataset
# containing cities within regions
multi_level_df <- fabricate(
regions = add_level(N = 5),
cities = add_level(N = 2, pollution = rnorm(N, mean = 5)))
head(multi_level_df)
# Start with existing data and add a nested level:
company_df <- fabricate(
data = building_df,
company_id = add_level(N=10, is_headquarters = sample(c(0, 1), N, replace=TRUE))
)
# Start with existing data and add variables to hierarchical data
# at levels which are already present in the existing data.
# Note: do not provide N when adding variables to an existing level
fabricate(
data = multi_level_df,
regions = modify_level(watershed = sample(c(0, 1), N, replace = TRUE)),
cities = modify_level(runoff = rnorm(N))
)
# fabricatr can add variables that are higher-level summaries of lower-level
# variables via a split-modify-combine logic and the \code{by} argument
multi_level_df <-
fabricate(
regions = add_level(N = 5, elevation = rnorm(N)),
cities = add_level(N = 2, pollution = rnorm(N, mean = 5)),
cities = modify_level(by = "regions", regional_pollution = mean(pollution))
)
# fabricatr can also make panel or cross-classified data. For more
# information about syntax for this functionality please read our vignette
# or check documentation for \code{link_levels}:
cross_classified <- fabricate(
primary_schools = add_level(N = 50, ps_quality = runif(N, 0, 10)),
secondary_schools = add_level(N = 100, ss_quality = runif(N, 0, 10), nest=FALSE),
students = link_levels(N = 2000,
by = join_using(ps_quality, ss_quality, rho = 0.5),
student_quality = ps_quality + 3*ss_quality + rnorm(N)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.