knitr::opts_chunk$set(echo = TRUE)
The respecatlbes
package provides a framework to
For this vignette we load the respecatbles
and dplyr
package:
library(respectables) library(dplyr)
Note the respectables
package is still under development.
Lets start defining a simple dataset dm
with a single variable id
.
gen_id <- function(n) { paste0("id-", 1:n) } dm_recipe <- tribble( ~variables, ~dependencies, ~func, ~func_args, "id", no_deps, gen_id, no_args ) gen_table_data(N = 2, recipe = dm_recipe)
Note that the argument n
is defined by respectables
, in this case it is equal N
.
We can use the recepie dm_recepie
again to create a different dataset:
gen_table_data(N = 5, recipe = dm_recipe)
We will now specify the variables height
and weight
to the dm
recipe:
gen_hw <- function(n) { bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3)) data.frame(height = runif(n, min = 1.5, 1.95)) %>% mutate(weight = bmi * height^2) } dm_recipe <- tribble( ~variables, ~dependencies, ~func, ~func_args, "id", no_deps, gen_id, no_args, c("height", "weight"), no_deps, gen_hw, no_args ) gen_table_data(N = 2, recipe = dm_recipe)
Note that we used random number generators in gen_hw
, hence rerunning gen_table_data
will give different values
gen_table_data(N = 2, recipe = dm_recipe)
We will now continue our dm
example by defining the variable age
which for illustrative purposes is dependent
on the height
.
gen_age <- function(n, .df) { .df %>% transmute(age = height*25) } dm_recipe <- tribble( ~variables, ~dependencies, ~func, ~func_args, "id", no_deps, gen_id, no_args, c("height", "weight"), no_deps, gen_hw, no_args, "age", "height", gen_age, no_args ) gen_table_data(N = 2, recipe = dm_recipe)
Note that respectables
creates the arguments n
and .df
on the fly. Also, respectables
determines the evaluation
order of the variables based on the dependency structure. That is, respectables
does not guarantee to build the
resulting data frame using the recipe row by row.
If we plan to make configurable variable generating functions we can specify the arguments in the recipe
gen_color <- function(n, colors = colors()) { data.frame(color = sample(colors, n, replace = TRUE)) } dm_recipe <- tribble( ~variables, ~dependencies, ~func, ~func_args, "id", no_deps, gen_id, no_args, c("height", "weight"), no_deps, gen_hw, no_args, "age", "height", gen_age, no_args, "color", no_deps, gen_color, list(color = c("blue", "red")) ) gen_table_data(N = 4, recipe = dm_recipe)
The miss_recipe
argument in gen_table_data
can be used to inject missing values in the last step when creating
data with gen_table_data
. That is, first the data generation recipe is executed and then the missing data is injected.
Hence, all variables are available at execution time and the .df
argument is supplied to the func
.
gen_alternate_na <- function(.df) { n <- nrow(.df) rep(c(TRUE, FALSE), length.out = n) } dm_na_recipe <- tribble( ~variables, ~func, ~func_args, "age", gen_alternate_na, no_args ) gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)
Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.
For this example we create a data frame aseq
with the variable seqterm
being c("step 1", ..., "step i")
, where i
is extracted from the variable id
.
dm <- gen_table_data(N = 3, recipe = dm_recipe) # grow dataset gen_seq <- function(.db) { dm <- .db$dm ni <- as.numeric(substring(dm$id, 4)) df_grow <- data.frame( id = rep(dm$id, ni), seq = unlist(sapply(ni, seq, from = 1)) ) left_join(dm, df_grow, by = "id") } aseq_scf_recipe <- tribble( ~foreign_tbl, ~foreign_key, ~func, ~func_args, "dm", "id", gen_seq, no_args ) gen_seq_term <- function(.df, ...) { data.frame(seqterm = paste("step", .df$seq)) } aseq_recipe <- tribble( ~variables, ~dependencies, ~func, ~func_args, "seqterm", "seq", gen_seq_term, no_args ) gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))
The steps here are:
joinrec
to grow a new data frame, say A
, possibly from db
gen_table_data
with the following argumentsA
for df
tblrec
for recipe
miss_recipe
Note that this functionality is under development. Currently aseq_scf_recipe
needs to be a tibble with one row, and
the foreign_key
is currently not used.
dplyr
This section needs further work.
Let's map the following code into respectible
recipes:
iris %>% mutate(SPECIES = toupper(Species)) %>% head()
There are multiple solutions to map this to the respectables
framework.
gen_toupper <- function(varname, .df, ...) { toupper(.df[[varname]]) } rcp <- tribble( ~variables, ~dependencies, ~func, ~func_args, "SPECIES", "Species", gen_toupper, list(varname = "Species") ) gen_table_data(recipe = rcp, df = iris) %>% head()
Note in gen_toupper
we use the ellipsis ...
to absorb not used arguments such as n
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.