knitr::opts_chunk$set(echo = TRUE)

Introduction

The respecatlbes package provides a framework to

For this vignette we load the respecatbles and dplyr package:

library(respectables)
library(dplyr)

Note the respectables package is still under development.

Simple Dataset

Lets start defining a simple dataset dm with a single variable id.

gen_id <- function(n) {
  paste0("id-", 1:n)
}

dm_recipe <- tribble(
  ~variables, ~dependencies,  ~func,   ~func_args,
  "id",       no_deps,        gen_id,  no_args
)

gen_table_data(N = 2, recipe = dm_recipe)

Note that the argument n is defined by respectables, in this case it is equal N.

We can use the recepie dm_recepie again to create a different dataset:

gen_table_data(N = 5, recipe = dm_recipe)

Adding Multiple Variables

We will now specify the variables height and weight to the dm recipe:

gen_hw <- function(n) {
  bmi <- 17 + abs(rnorm(n, mean = 3, sd = 3))

  data.frame(height = runif(n, min = 1.5, 1.95)) %>%
    mutate(weight = bmi * height^2)
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,  no_args,
   c("height", "weight"),   no_deps,        gen_hw,  no_args
)

gen_table_data(N = 2, recipe = dm_recipe)

Note that we used random number generators in gen_hw, hence rerunning gen_table_data will give different values

gen_table_data(N = 2, recipe = dm_recipe)

Variable Dependencies

We will now continue our dm example by defining the variable age which for illustrative purposes is dependent on the height.

gen_age <- function(n, .df) {
  .df %>%
    transmute(age = height*25)
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,  no_args,
   c("height", "weight"),   no_deps,        gen_hw,  no_args,
  "age",                    "height",       gen_age, no_args
)

gen_table_data(N = 2, recipe = dm_recipe)

Note that respectables creates the arguments n and .df on the fly. Also, respectables determines the evaluation order of the variables based on the dependency structure. That is, respectables does not guarantee to build the resulting data frame using the recipe row by row.

Configurable Arguments

If we plan to make configurable variable generating functions we can specify the arguments in the recipe

gen_color <- function(n, colors = colors()) {
  data.frame(color = sample(colors, n, replace = TRUE))
}

dm_recipe <- tribble(
  ~variables,               ~dependencies,  ~func,   ~func_args,
  "id",                     no_deps,        gen_id,     no_args,
   c("height", "weight"),   no_deps,        gen_hw,     no_args,
  "age",                    "height",       gen_age,    no_args,
  "color",                  no_deps,        gen_color,  list(color = c("blue", "red"))
)

gen_table_data(N = 4, recipe = dm_recipe)

Injecting Missing Data

The miss_recipe argument in gen_table_data can be used to inject missing values in the last step when creating data with gen_table_data. That is, first the data generation recipe is executed and then the missing data is injected. Hence, all variables are available at execution time and the .df argument is supplied to the func.

gen_alternate_na <- function(.df) {
  n <- nrow(.df)
  rep(c(TRUE, FALSE), length.out = n)
}

dm_na_recipe <- tribble(
  ~variables,       ~func,             ~func_args,
  "age",            gen_alternate_na,  no_args
)

gen_table_data(N = 4, recipe = dm_recipe, miss_recipe = dm_na_recipe)

Note that this currently only works with one variable per row in the missing recipe. This is a feature that we are still working on to allow for more complex missing structure definition.

Scaffolding

For this example we create a data frame aseq with the variable seqterm being c("step 1", ..., "step i"), where i is extracted from the variable id.

dm <- gen_table_data(N = 3, recipe = dm_recipe)

# grow dataset
gen_seq <- function(.db) {

  dm <- .db$dm

  ni <- as.numeric(substring(dm$id, 4))

  df_grow <- data.frame(
    id = rep(dm$id, ni),
    seq = unlist(sapply(ni, seq, from = 1))
  )

  left_join(dm, df_grow, by = "id")
}

aseq_scf_recipe <- tribble(
  ~foreign_tbl, ~foreign_key, ~func,     ~func_args,
  "dm",         "id",         gen_seq,   no_args     
)

gen_seq_term <- function(.df, ...) {
  data.frame(seqterm = paste("step", .df$seq))
}

aseq_recipe <- tribble(
  ~variables,      ~dependencies,  ~func,            ~func_args,
  "seqterm",       "seq",          gen_seq_term,     no_args
)

gen_reljoin_table(joinrec = aseq_scf_recipe, tblrec = aseq_recipe, db = list(dm = dm))

The steps here are:

  1. use joinrec to grow a new data frame, say A, possibly from db
  2. call gen_table_data with the following arguments
    • A for df
    • tblrec for recipe
    • forward miss_recipe

Note that this functionality is under development. Currently aseq_scf_recipe needs to be a tibble with one row, and the foreign_key is currently not used.

Compare dplyr

This section needs further work.

Let's map the following code into respectible recipes:

iris %>% 
  mutate(SPECIES = toupper(Species)) %>%
  head()

There are multiple solutions to map this to the respectables framework.

gen_toupper <- function(varname, .df, ...) {
   toupper(.df[[varname]])
}

rcp <- tribble(
    ~variables, ~dependencies,  ~func,          ~func_args,
    "SPECIES",  "Species",       gen_toupper,   list(varname = "Species") 
)

gen_table_data(recipe = rcp, df = iris) %>%
  head()

Note in gen_toupper we use the ellipsis ... to absorb not used arguments such as n.



Roche/respectables documentation built on Oct. 2, 2024, 8:57 p.m.