knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(admiral) library(admiraldev)
This vignette aims to discuss some of the common programming concepts and conventions that have been adopted within the
{admiral}
family of packages. It is intended to be a user-facing version of the Programming Strategy vignette, but users can also read the
latter after becoming more familiar with the package to expand further on any topics of interest. For some of common
{admiral}
FAQ, visit the corresponding FAQ page provided in the same drop down menu
as this vignette.
It is expected that the input dataset is not grouped. Otherwise an error is issued.
The output dataset is ungrouped. The observations are not ordered in a dedicated way. In particular, the order of the observations of the input dataset may not be preserved.
{admiral}
Functions and OptionsAs a general principle, the behavior of the {admiral}
functions is only determined by their input, not by any global object,
i.e. all inputs like datasets, variable names, options, etc. must be provided to the function by arguments. Correspondingly,
in general functions do not have any side-effects like creating or modifying global objects, printing, writing files, etc.
An exception to the above principle is found in our approach to package options (see get_admiral_option()
and set_admiral_options()
),
which allow for user-defined defaults on commonly used function
arguments. For instance, the option subject_keys
is currently pre-defined as exprs(STUDYID, USUBJID)
, but can be modified
using set_admiral_options(subject_keys = exprs(...))
at the top of a script.
For a full discussion on admiral Inputs, Outputs and Options, see this section on our developer-facing Programming Strategy.
When using the {haven}
package to read SAS datasets into R, SAS-style character missing values, i.e. ""
, are not
converted into proper R NA
values. Rather they are kept as is. This is problematic for any downstream data processing
as R handles ""
just as any other string. Thus, before any data manipulation is being performed SAS blanks should be
converted to R NA
s using {admiral}
's convert_blanks_to_na()
function, e.g.
dm <- haven::read_sas("dm.sas7bdat") %>% convert_blanks_to_na()
Note that any logical operator being applied to an NA
value always returns NA
rather than TRUE
or FALSE
.
visits <- c("Baseline", NA, "Screening", "Week 1 Day 7") visits != "Baseline"
The only exception is is.na()
which returns TRUE
if the input is NA
.
is.na(visits)
Thus, to filter all visits which are not "Baseline"
the following condition would need to be used.
visits != "Baseline" | is.na(visits)
Also note that most aggregation functions, like mean()
or max()
, also return NA
if any element of the input vector is missing.
mean(c(1, NA, 2))
To avoid this behavior one has to explicitly set na.rm = TRUE
.
mean(c(1, NA, 2), na.rm = TRUE)
This is very important to keep in mind when using {admiral}
's aggregation functions such as derive_summary_records()
.
For handling of NA
s in sorting variables see Sort Order.
expr()
, exprs()
, !!
and !!!
expr()
and exprs()
expr()
is a function from the {rlang}
package, which is used to create an expression. The expression is not evaluated
- rather, it is passed on to the derivation function which evaluates it in its own environment. exprs()
is the plural version of
expr()
, so it accepts multiple comma-separated items and returns a list of expressions.
library(rlang) adae <- data.frame(USUBJID = "XXX-1", AEDECOD = "HEADACHE") # Return the adae object adae # Return an expression expr(adae)
When used within the contest of an {admiral}
derivation function, expr()
and exprs()
allow the function to evaluate the
expressions in the context of the input dataset. As an example, expr()
and exprs()
allow users to pass variable names of datasets
to the function without wrapping them in quotation marks.
The expressions framework is powerful because users are able to intuitively "inject code" into admiral
functions
(through the function parameters) using very similar syntax as if they were writing open code, with the exception possibly being an
outer exprs()
wrapper. For instance, in the derive_vars_merged()
call below, the user is merging adsl
with ex
and is able
to filter ex
prior to the merge using an expression passed to the filter_add
parameter. Because filter_add
accepts expressions,
the user has full power to filter their dataset as they please. In the same vein, the user is able to create any new variables they
wish after the merge using the new_vars
argument, to which they pass a list of expressions containing "standard" R code.
``` {r, eval = FALSE} derive_vars_merged( adsl, dataset_add = ex, filter_add = !is.na(EXENDTM), by_vars = exprs(STUDYID, USUBJID), new_vars = exprs( TRTEDTM = EXENDTM, TRTETMF = EXENTMF, COMPTRT = if_else(!is.na(EXENDTM), "Y", "N") ), order = exprs(EXENDTM), mode = "last" )
### Bang-Bang (`!!`) and Bang-Bang-Bang (`!!!`) {#unquoting} Sometimes you may want to construct an expression using other, pre-existing expressions. However, it's not immediately clear how to achieve this because expressions inherently pause evaluation of your code before it's executed: ```r a <- expr(2) b <- expr(3) expr(a + b) # NOT 2 + 3
This is where !!
(bang-bang) comes in: provided again by the {rlang}
package, it allows you to inject the contents of an
expression into another expression, meaning that by using !!
you can modify the code inside an expression before R evaluates it.
By using !!
you are unquoting an expression, i.e. evaluating it before you pass it onwards.
expr(!!a + !!b)
You can see an example of where !!
comes in handy within {admiral}
code in Common Pitfall 1, where the contents of an expression is unquoted so that it can be passed to derive_vars_merged()
.
!!!
(bang-bang-bang) is the plural version of !!
and can be used to unquote a list of expressions:
exprs(!!!list(a, b))
Within {admiral}
, this operator can be useful if we need to unquote a list of variables (stored as expressions) to use them inside
of an {admiral}
or even {dplyr}
call. One example is the {admiral}
subject keys:
get_admiral_option("subject_keys")
If we want to use the subject keys stored within this {admiral}
option to subset a dataset, we need to use !!!
to unquote this list.
Let's construct a dummy example to illustrate the point:
adcm <- data.frame(STUDYID = "XXX", USUBJID = "XXX-1", CMTRT = "ASPIRIN") adcm # This doesn't work as we are not unquoting the subject keys adcm %>% select(get_admiral_option("subject_keys")) # This works because we are unquoting the subject keys adcm %>% select(!!!get_admiral_option("subject_keys"))
You can see another example of !!!
in action in this line of the
{admiral}
ADEX
template script, where it is used to dynamically control the by variables passed to an {admiral}
function.
In summary, although the expressions framework may seem slightly clunky and mysterious to begin with, it allows for such power
and flexibility that it forms a key part of the {admiral}
package. For a comprehensive treatment of expressions, see
Chapter 18 and Chapter 19 of the Advanced R textbook. Chapter 19 specifically covers in much more detail the concept of unquoting.
Expressions are very powerful, but this can also lead to misunderstandings about their functionality. Let's set up some dummy data to explore common issues that new (or experienced!) programmers may encounter when dealing with expressions.
library(dplyr, warn.conflicts = FALSE) library(admiral) vs <- tribble( ~USUBJID, ~VSTESTCD, ~VISIT, ~VSSTRESN, ~VSSTRESU, ~VSDTC, "01-1301", "WEIGHT", "SCREENING", 82.1, "kg", "2013-08-29", "01-1301", "WEIGHT", "WEEK 2", 81.19, "kg", "2013-09-15", "01-1301", "WEIGHT", "WEEK 4", 82.56, "kg", "2013-09-24", "01-1302", "BMI", "SCREENING", 20.1, "kg/m2", "2013-08-29", "01-1302", "BMI", "WEEK 2", 20.2, "kg/m2", "2013-09-15", "01-1302", "BMI", "WEEK 4", 19.9, "kg/m2", "2013-09-24" ) dm <- tribble( ~USUBJID, ~AGE, "01-1301", 18 )
When writing more complex {admiral}
code it can be easy to mistakenly pass the wrong input to an argument that
expects an expression. For example, the code below fails because my_expression
is not an expression - it is
the name of an object in the global environment containing an expression.
my_expression <- expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING") derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = my_expression )
To fix this code, we need to unquote my_expression
so that the expression that it is holding is passed correctly
to derive_vars_merged()
:
derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = !!my_expression )
In a similar vein to above, even if an actual expression is passed as an argument, you must make sure that it can be evaluated within the dataset of interest. This may seem trivial, but it is a common pitfall because expressions delay evaluation of code and so can delay the identification of issues. For instance, consider this example:
filter_vs_and_merge <- function(my_expression) { derive_vars_merged( dm, dataset_add = select(vs, USUBJID, VSTESTCD, VISIT), by_vars = exprs(USUBJID), filter_add = !!my_expression ) } # This works filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING")) # This fails filter_vs_and_merge(expr(VSTESTCD == "WEIGHT" & VISIT == "SCREENING" & VSTPT == "PREDOSE"))
The second call fails because hidden within the expression is a mention of VSTPT
, which was dropped from vs
in filter_vs_and_merge()
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.