knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options("tibble.print_min" = 5, "tibble.print_max" = 5) library(magrittr) library(cohortBuilder)
When working with already defined cohort, you may want to manipulate its configuration (i.e. filter value) without the need to create the cohort from scratch.
cohortBuilder
offers various methods that perform common Cohort management operations.
To present the functionality we'll be working on the below librarian_cohort
object:
librarian_source <- set_source( as.tblist(librarian) ) librarian_cohort <- librarian_source %>% cohort( step( filter( "discrete", id = "author", dataset = "books", variable = "author", value = "Dan Brown" ), filter( "discrete", id = "program", dataset = "borrowers", variable = "program", value = "premium", keep_na = FALSE ) ), step( filter( "range", id = "copies", dataset = "books", variable = "copies", range = c(-Inf, 5) ) ), run_flow = TRUE )
In order to manage filters configuration you may call the following methods:
update_filter
- to update filter configuration,add_filter
- to add new filter in the selected step,rm_filter
- to remove filter in the existing step.Updating filter:
librarian_cohort %>% update_filter( step_id = 1, filter_id = "author", value = c("Dan Brown", "Khaled Hosseini") ) sum_up(librarian_cohort)
Adding new filter:
librarian_cohort %>% add_filter( filter( "date_range", id = "issue_date", dataset = "issues", variable = "date", range = c(as.Date("2010-01-01"), Inf) ), step_id = 2 ) sum_up(librarian_cohort)
Removing filter:
librarian_cohort %>% rm_filter(step_id = 2, filter_id = "copies") sum_up(librarian_cohort)
By default the above configuration doesn't trigger data recalculation so we need to call run
method.
Calling run
we trigger all steps computations. In our case we've updated only the
second step so we can optimize workflow skipping the previous steps calculation by specifying min_step_id
parameter:
run(librarian_cohort, min_step_id = 2) get_data(librarian_cohort)
Note. If you want to run data computation directly after calling one of the
above methods just set run_flow = TRUE
within the method.
Similar to filter, you can operate on the Cohort to manage steps.
cohortBuilder
offers add_step
and rm_step
methods to add new, or remove existing step respectively.
librarian_cohort %>% rm_step(step_id = 1) sum_up(librarian_cohort)
Note. Removing not the last step results with renaming all step ids (so that we always have steps numbering starting with 1).
librarian_cohort %>% add_step( step( filter( "discrete", id = "author", dataset = "books", variable = "author", value = "Dan Brown" ), filter( "discrete", id = "program", dataset = "borrowers", variable = "program", value = "premium", keep_na = FALSE ) ) ) sum_up(librarian_cohort)
Note. All the methods used for managing steps and filters can be also called on Source object itself.
See vignette("cohort-configuration")
.
The last Cohort configuration component - source, can be also managed within the Cohort itself.
With update_source
method you can change the source defined in the existing Cohort.
Below we update cohort with Source having source_code
parameter defined.
The argument is responsible to generate source
object definition printed in the reproducible code (you can use it when the default method doesn't print reasonable output).
code(librarian_cohort, include_methods = NULL) new_source <- set_source( as.tblist(librarian), source_code = quote({ source <- list() source$dtconn <- as.tblist(librarian) }) ) update_source(librarian_cohort, new_source) sum_up(librarian_cohort) code(librarian_cohort, include_methods = NULL)
Note that updating source doesn't remove Cohort configuration (steps and filters).
If you want to clear the configuration just set keep_steps = FALSE
:
update_source(librarian_cohort, new_source, keep_steps = FALSE) sum_up(librarian_cohort)
You can also use update_source
to add Source to an empty Cohort:
new_source <- set_source( as.tblist(librarian) ) empty_cohort <- cohort() update_source(empty_cohort, new_source) code(empty_cohort, include_methods = NULL)
The update_source
method can be also useful if you want to update source along with steps and filters configuration.
In this case, the good practice is to keep the configuration directly in Source:
source_one <- set_source( as.tblist(librarian) ) %>% add_step( step( filter( "discrete", id = "author", dataset = "books", variable = "author", value = "Dan Brown" ), filter( "discrete", id = "program", dataset = "borrowers", variable = "program", value = "premium", keep_na = FALSE ) ) ) source_two <- set_source( as.tblist(librarian) ) %>% add_step( step( filter( "range", id = "copies", dataset = "books", variable = "copies", range = c(-Inf, 5) ) ) ) my_cohort <- cohort(source_one) sum_up(my_cohort) update_source(my_cohort, source_two) sum_up(my_cohort)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.