knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
As {admiral}
is intended to be contributed by the user community, this
article is meant for developers that want to either expand {admiral}
functionalities or build on top of {admiral}
.
In order to keep the framework robust across the whole community,
we have defined a programming strategy that should be followed in such cases.
These contributions could include, for example, company specific derivations of ADaM datasets.
{admiral}
function or a company specific function.The functions are categorized by keywords. The keywords can be specified in the
function header via the @keywords
field. Each function must use at least one
of the following keywords. Please note that the keywords are handled
case-sensitive. Thus they must be in lower case.
| Keyword | Description |
|-----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| general_utility
| A general function which performs a general functionality, like copy variables, check missing, create parameters. |
| metadata
| A function that provides Metadata functionality. |
| derivation
| A function that derives one or more ADaM variable(s) |
| assertion
| Throws an error if a certain condition is not met and NULL
otherwise |
| warning
| Returns a warning message based on invalid data input or if existing variables are overwritten. |
| predicate
| Returns TRUE
if the argument is a valid input, FALSE
otherwise. |
| Across ADaM dataset structures keywords: adam
| A function that is applicable across ADaM datasets. |
| ADaM dataset structure specific keywords: adls
, occds
, bds
, other
| A function specific to one of the ADaM dataset structures according to CDISC. Use the dataset structure as the keyword. |
| ADaM dataset specific keywords: adex
, adlb
, ... | A function specific to an ADaM dataset according to CDISC. Use the name of the corresponding ADaM dataset name as keyword |
| timing
| Function is related to timing, e.g., imputing dates, converting dates, deriving duration,, ... |
| computation
| Function which performs a computation that is used by other functions to derive an ADaM variable |
| Therapeutic Area specific functions: oncology
, infectious_diseases
, neuroscience
| Functions that provide a particular algorithm for a specific therapeutic area. |
| Company specific keywords | |
Functions in {admiral}
must follow the modular approach, i.e. functions which add
variables should add only a single variable except if other variables are
closely related to the derived variable and can not be derived without it. For
example AAGE
(analysis age) and AAGEU
(analysis age units) should be derived
by the same function because it does not make sense to derive AAGEU
without
AAGE
.
Variables like BASE
(baseline value) and BNRIND
(baseline reference range
indicator) must be derived by two separate function calls as it is reasonable to
derive only one of them.
temp_
and must be removed from the output dataset.temp_
, an error must be issued.derive_baseline()
. | Function name prefix | Description |
|----------------------------------------------|-----------------------------------------------------------------------------------------------------|
| assert_
/ warn_
/ is_
| Functions that check other functions’ inputs |
| derive_
| Functions that take a dataset as input and return a new dataset with additional rows and/or columns |
| derive_var_
(e.g. derive_var_trtdurd
) | Functions which add a single variable |
| derive_vars_
(e.g. derive_vars_dt
) | Functions which add multiple variables |
| derive_param_
(e.g. derive_param_os
) | Functions which add a single parameter |
| compute_
/ calculate_
/ ... | Functions that take vectors as input and return a vector |
Please note that the appropriate var/vars prefix should be used for all cases in which the function creates any variable(s), regardless of the presence of a new_var
parameter in the function call.
The default value of optional parameters should be NULL
.
There is a recommended parameter order that all contributors are asked to adhere to (in order to keep consistency across functions):
dataset
(and any additional datasets denoted by dataset_*
)by_vars
order
new_var
(and any related new_var_*
parameters)filter
(and any additional filters denoted by filter_*
)start_date
before end_date
(or related).Names of variables inside a dataset should be passed as symbols rather than strings, i.e. AVAL
rather than "AVAL"
.
If a parameter accepts one or more variables as input, the variables should be wrapped inside vars()
.
For example:
new_var = TEMPBL
by_vars = vars(PARAMCD, AVISIT)
filter = PARAMCD == "TEMP"
order = vars(AVISIT, desc(AESEV))
Parameter must not accept expressions for assigning the value of the new variable. Instead separate parameters need to be provided for defining the value. For example, if a function derives a variable which may be imputed, the following is not acceptable.
... new_var = vars(mydtm = convert_dtc_to_dtm(impute_dtc(cmstdtc, date_imputation = "last", time_imputation = "last"))), ...
Separate parameters for the imputation must be provided, e.g.:
... new_var = mydtm, source_var = cmstdtc, date_imputation = "last", time_imputation = "last", ...
Each function parameter needs to be tested with assert_
type of function.
Each expression needs to be tested for the following
(there are many utility functions in {admiral}
available to the contributor):
The only exception to this is derive_var_basetype()
where we allowed the use of rlang::exprs()
.
The reason is that in this case the user needs to have the flexibility to provide not just symbols but
usually more complicated filtering conditions (that may be based on multiple input parameters).
The first parameter of derive_
functions should be the input dataset and it should be named dataset
.
If more than one input dataset is required, the other input dataset should start with dataset_
, e.g., dataset_ex.
Parameters for specifying items to add should start with new_
.
If a variable is added, the second part of the parameter name should be var, if a parameter is added, it should be param.
For example: new_var
, new_var_unit
, new_param
.
Parameters which expect a boolean or boolean vector must start with a verb, e.g., is_imputed
or impute_date
.
| Parameter Name | Description |
|------------------|--------------------------------------------------------------------------------------------------------------------|
| dataset
| The input dataset. Expects a data.frame or a tibble. |
| by_vars
| Variables to group by. |
| order
| List of expressions for sorting a dataset, e.g., vars(PARAMCD, AVISITN, desc(AVAL))
. |
| new_var
| Name of a single variable to be added to the dataset. |
| new_vars
| List of variables to be added to the dataset. |
| new_var_unit
| Name of the unit variable to be added. It should be the unit of the variable specified for the new_var parameter. |
| filter
| Expression to filter a dataset, e.g., PARAMCD == "TEMP"
. |
| start_date
| The start date of an event/interval. Expects a date object. |
| end_date
| The end date of an event/interval. Expects a date object. |
| start_dtc
| (Partial) start date/datetime in ISO 8601 format. |
| dtc
| (Partial) date/datetime in ISO 8601 format. |
| date
| Date of an event / interval. Expects a date object. |
| set_values_to
| List of variable name-value pairs. |
All source code should be formatted according to the tidyverse style guide. The lintr package will be used to check and enforce this.
In line with the fail-fast design principle, function inputs should be checked for validity and, if there’s an invalid input, the function should stop immediately with an error. An exception is the case where a variable to be added by a function already exists in the input dataset: here only a warning should be displayed and the function should continue executing.
Inputs should be checked either using asserthat::assert_that()
or custom assertion functions defined in R/assertions.R
.
These custom assertion functions should either return an error in case of an invalid input or return nothing.
For the most common types of input parameters like a single variable, a list of variables, a dataset, ... functions for checking are available (see assertions).
Parameters which expect keywords should handle them in a case-insensitive manner, e.g., both
date_imputation = "FIRST"
and date_imputation = "first"
should be accepted.
The assert_character_scalar()
function helps with handling parameters in a
case-insensitive manner.
A parameter should not be checked in an outer function if the parameter name is the same as in the inner function.
This rule is applicable only if both functions are part of {admiral}
.
Every function that is exported from the package must have an accompanying header that should be formatted according to the roxygen2 convention.
In addition to the roxygen2 parameters, @author
and @keywords
are also used.
Author is the owner of the function while the keywords are used to categorize the function. Please see section "Categorization of functions".
An example is given below:
#' Derive Analysis Study Day #' #' Adds the analysis study day (`ADY`) to the dataset, i.e. study day #' of analysis date. #' #' @param dataset Input dataset #' #' The columns specified by the `reference_date` and the `date` parameter are #' expected. #' #' @param reference_date The start date column, e.g., date of first #'treatment #' #' A date or date-time object column is expected. #' #' The default is `TRTSDT`. #' #' @param date The end date column for which the study day should be #'derived #' #' A date or date-time object column is expected. #' #' The default is `ADT` #' #' @details The study day is derived as number of days from the start date #' to the end date. If it is non-negative, one is added, i.e. the study day of the #' start date is 1. #' #' @author Stefan Bundfuss #' #' @return The input dataset with `ADY` column added #' #' @keywords bds timing #' #' @export #' #' @examples #' data <- tibble::tribble( #' ~TRTSDT, ~ADT, #' lubridate::ymd("2020-01-01"), lubridate::ymd("2020-02-24") #' ) #' #' derive_var_ady(data)
The following fields are mandatory:
@param
: One entry per function parameter.
The following attributes should be described: expected data type (e.g. data.frame
, logical
, numeric
etc.), default value (if any), permitted values (if applicable), optionality (i.e. is this a required parameter).
If the expected input is a dataset then the required variables should be clearly stated.@details
: A natural-language description of the derivation used inside the function.@author
: The person who wrote the function. In case a function is later on updated by another person the name should be appended to the list of authors.@keyword
: One or more keywords applicable to the function.@return
: A description of the return value of the function.
Any newly added variable(-s) should be mentioned here.@examples
: A fully self-contained example of how to use the function.
Self-contained means that, if this code is executed in a new R session, it will run without errors.
That means any packages need to be loaded with library()
and any datasets needed either to be created directly inside the example code or loaded using data()
.
If a dataset is created in the example, it should be done so using the function tibble::tribble()
.
Make sure to align columns as this ensures quick code readability. Copying descriptions should be avoided as it makes the documentation hard to
maintain. For example if the same parameter with the same description is used by
more than one function, the parameter should be described for one function and
the other functions should use @inheritParams <function name where the
parameter is described>
.
Please note that if @inheritParams func_first
is used in the header of the
func_second()
function, those parameter descriptions of func_first()
are
included in the documentation of func_second()
for which
func_second()
and@param
tag for the parameter is included in the header of
func_second()
.The order of the @param
tags should be the same as in the function definition.
The @inheritParams
tags should be after the @param
. This does not affect the
order of the parameter description in the rendered documentation but makes it
easier to maintain the headers.
Variable names, expressions, functions, and any other code must be enclosed which backticks. This will render it as code.
For functions which derive a specific CDISC variable, the title must state the label of the variable without the variable name. The variable should be stated in the description.
Missing values (NA
s) need to be explicitly shown.
Regarding character vectors converted from SAS files: SAS treats missing character values as blank.
Those are imported into R as empty strings (""
) although in nature they are missing values (NA
).
All empty strings that originate like this need to be converted to proper R missing values NA
.
Organizing functions into files is more of an art than a science. Thus, there are no hard rules but just recommendations. First and foremost, there are two extremes that should be avoided: putting each function into its own file and putting all functions into a single file. Apart from that the following recommendations should be taken into consideration when deciding upon file structuring:
dplyr::bind_rows()
and dplyr::bind_cols()
), put them into one fileIt is the responsibility of both the author of a new function and reviewer to ensure that these recommendations are put into practice.
Package dependencies have to be documented in the DESCRIPTION
file.
If a package is used only in examples and/or unit tests then it should be listed in Suggests
, otherwise in Imports
.
Functions from other packages have to be explicitly imported by using the @importFrom
tag in the R/admiral-package.R
file.
To import the if_else()
and mutate()
function from dplyr
the following line would have to be included in that file:
#' @importFrom dplyr if_else mutate
.
Functions should only perform the derivation logic and not add any kind of metadata, e.g. labels.
The only exception is derive_vars_suppqual()
which takes it from its argument dataset_suppqual
,
where the labels are stored.
A function requires a set of unit tests to verify it produces the expected result. See Writing Unit Tests in {admiral} for details.
As {admiral}
is still evolving, functions or parameters may need to be removed or replaced with more
efficient options from one release to another. In such cases, the relevant function or parameter
must be marked as deprecated. A warning will be issued until the next release and an error will be
generated thereafter. Information about deprecation timelines must be added to the warning/error message.
If a function or parameter is removed, the documentation must be updated to indicate the function or the parameter is now deprecated and which new function/parameter should be used instead.
The documentation will be updated at:
#' Title of the function #' #' @description #'*Deprecated*, please use `new_fun()` instead. #' description of the function in plain English
@param
level for a parameter.@param old_param *Deprecated*, please use `new_param` instead.
When a function or parameter is deprecated, the function must be updated to issue an error or a warning to inform the user.
There should be a test case added in tests/testthat/test-deprecation.R
that checks whether this warning/error is issued as appropriate when using the deprecated function or parameter.
If the deprecated function still exists in the package besides the new function, a warning must be issued. If it has been removed, an error must be generated.
### BEGIN DEPRECATION # Warning if the deprecated function still exists deprecate_warn("x.y.z", "fun_xxx()", "new_fun_xxx()") # Error if the deprecated function does not exist anymore deprecate_stop("x.y.z", "fun_xxx()", "new_fun_xxx()") ### END DEPRECATION
For the former case above, please pass any input arguments into a call to the new function so that it still runs and pushes users towards adopting the new function.
fun_xxx <- function(dataset, new_var) { deprecate_warn("x.y.z", "fun_xxx()", "new_fun_xxx()") new_fun_xxx(dataset, new_var = new_var) }
If a parameter is removed and is not replaced, an error must be generated:
### BEGIN DEPRECATION if (!missing(old_param)) { deprecate_stop("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )") } ### END DEPRECATION
If the parameter is renamed or replaced, a warning must be issued and the new parameter takes
the value of the old parameter until the next release.
Note: parameters which are not passed as vars()
argument (e.g. new_var = VAR1
or filter = AVAL >10
)
will need to be quoted.
### BEGIN DEPRECATION if (!missing(old_param)) { deprecate_warn("x.y.z", "fun_xxx(old_param = )", "fun_xxx(new_param = )") # old_param is given using vars() new_param <- old_param # old_param is NOT given using vars() new_param <- enquo(old_param) } ### END DEPRECATION
Please take the following list as recommendation and try to adhere to its rules if possible.
assert_data_frame(dataset, required_vars = vars(var1, var2), optional = TRUE)
).dplyr::if_else()
should be used when there are only two conditions.
Try to always set the missing
parameter whenever appropriate.Each function should be considered as readable code by default.
All R code that produces ADaM datasets should be based on readable code for their 1st line code. Producing Readable Code should not be part or the responsibility of any QC activities or 2nd line programming.
ADaMs in R will be highly modularized. This means code needs to be commented across the set of functions that produces the final ADaM dataset.
This guidance is built on the assumption that each ADaM dataset will have one R script that will call a set of functions needed to produce the corresponding ADaM dataset.
The main R-script would contain all function calls to create the ADaM dataset. In the header, describe the ADaM dataset that will be produced:
The code itself should be described in meaningful, plain English, so that a reviewer can understand how the piece of code works, e.g.
Code within a function that creates a specific variable should have a starting comment and an ending comment, e.g.
# calculate X # describe how the code works in meaningful plain english "<code>" # end of X
If meaningful, comments can cover multiple variables within a piece of code
# creates X, Y, Z # describe how the code works in meaningful plain english "<code>" # end of X, Y, Z
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.