library(dplyr) library(dtrackr) knitr::opts_chunk$set(echo = TRUE)
Most of the behaviour of dtrackr
can be specified at the individual call level
using the .headline
and .messages
glue specifications to define a format.
Sometimes however this is annoying to do for all the stages in a flow chart and
a global configuration of behaviour is desirable.
One of the areas where default behaviour may be undesirable is the naming of
groups. The default setting combines the group name {.group}
and the group
value {.value}
into a concatenated colon separated string as demonstrated
below:
# these are the defaults old = options( dtrackr.strata_glue="{.group}:{.value}", dtrackr.strata_sep="; " ) survival::cgd %>% track() %>% group_by(treat) %>% comment() %>% group_by(sex,.add = TRUE) %>% comment( .messages = c( "{.count} patients", "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>% ungroup() %>% flowchart() # reset options options(old)
In particular in situations like this where you are faceting on factors or
strings, disposing of the group name may make this clearer. In the following
example we only include the group value, force it to lower case, and use a comma
to separate multiple facets. We have used manual override of the messages in the
grouping stages, by providing a .messages
parameter, to specify what we are
faceting by in a more natural way:
# only include the group value in the description of the group old = options( dtrackr.strata_glue="{tolower(.value)}", dtrackr.strata_sep=", " ) survival::cgd %>% track() %>% group_by(treat, .messages = "case or control") %>% comment() %>% group_by(sex, .add = TRUE, .messages = "by {tolower(.cols)}" #.cols contains a csv string of the grouping variables ) %>% comment( .messages = c( "{.count} patients", "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>% ungroup() %>% flowchart() # reset options options(old)
N.B. this setting affects the "strata" label of the group, which in turn affects the flowchart branching. If this is not unique from one group to another strange behaviours will be observed.
With the group strata label defined you can set other defaults. In the flowchart above the "583 items" labels are generated by the default message setting, and the headings for the groups by the default headline setting. In this example we change these to alter the default text.
old = options( dtrackr.strata_glue="{tolower(.value)}", dtrackr.strata_sep=", ", dtrackr.default_message = "containing {.count} patients", dtrackr.default_headline = "subgroup: {.strata}" ) survival::cgd %>% track() %>% group_by( treat, .messages = "case or control" ) %>% comment() %>% group_by( sex, .add = TRUE, .messages = "by gender" ) %>% comment( .messages = c( "{.count} patients", "{sprintf('%1.0f',.count/.total*100)}% of the total")) %>% ungroup() %>% flowchart() # N.b. this setting includes some unwanted headlines in the ungrouped stages of # the flow chart. If a headline evaluates to "" then the headline is suppressed # and we can get rid of unwanted headlines. An example of doing this is as # follows: # options(dtrackr.default_headline = "{ifelse(.strata != '', glue::glue('subgroup: {.strata}'), '')}") # reset options options(old)
Subgroup counts are a slightly neater way of doing this. Their default layout
can be modified using dtrackr.default_count_subgroup
.
old = options( dtrackr.default_headline = "{.strata}", dtrackr.default_count_subgroup = "{tolower(.name)}: {.count}/{.subtotal}" ) survival::cgd %>% track() %>% group_by( treat, .messages = "case or control" ) %>% comment() %>% count_subgroup( sex ) %>% ungroup() %>% flowchart() # reset options options(old)
Elsewhere we discuss the possibility of capturing excluded items for debugging.
This behaviour can be added to any pipeline with the capture_exclusions()
function. Alternatively it can be globally enabled with the following option.
Usual caveats about performance apply.
options(dtrackr.exclusions=TRUE)
Sometimes in a pipeline we have a exclusion criteria which is not triggered, or is not triggered for a particular subgroup. In this case the default is not to show the zero items that were excluded. However sometimes it is reassuring to know that an filter was applied even if it results in nothing:
options(dtrackr.show_zero_exclusions=FALSE)
In count_subgroup()
and group_by()
statements there can be a large number of
items generated if a particular grouping variable has a lot of possible values.
This can cause performance issues and legibility issues for the resulting graph
and is usually a result of an interim stage of the data pipeline where grouping
is used to do fine scale summarisation operation (e.g. a
dataset %>% group_by(nearly_unique_id) %>% filter(row_number()==1)
or a time-series where
things need to be aggregated by date, and the data is quickly ungrouped (e.g.
timeseries %>% group_by(date) %>% summarise(count = n())
). The most number of
groups that dtrackr
will attempt to keep track of is configurable but defaults
to 16:
options(dtrackr.max_supported_groupings = 16)
Various messages about what dtrackr
is doing are produced but suppressed by
default. They can be enabled with the following flag.
options(dtrackr.verbose=TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.