View source: R/derive_summary_records.R
derive_summary_records | R Documentation |
It is not uncommon to have an analysis need whereby one needs to derive an
analysis value (AVAL
) from multiple records. The ADaM basic dataset
structure variable DTYPE
is available to indicate when a new derived
records has been added to a dataset, if the derivation deviates from the
standard derivation of the parameter.
derive_summary_records(
dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter_add = NULL,
set_values_to,
missing_values = NULL
)
dataset |
Input dataset If the argument is not specified (or set to
|
dataset_add |
Additional dataset The variables specified for
|
dataset_ref |
Reference dataset The variables specified for
|
by_vars |
Grouping variables Variables to consider for generation of groupwise summary
records. Providing the names of variables in
|
filter_add |
Filter condition as logical expression to apply during
summary calculation. By default, filtering expressions are computed within
For example,
|
set_values_to |
Variables to be set The specified variables are set to the specified values for the new observations. Set a list of variables to some specified value for the new records
For example: set_values_to = exprs( AVAL = sum(AVAL), DTYPE = "AVERAGE", )
|
missing_values |
Values for missing summary values For observations of the reference dataset (
|
For the newly derived records, only variables specified within by_vars
or
set_values_to
will be populated. All other variables will be set to NA
.
A data frame with derived records appended to original dataset.
The following examples use the ECG dataset below as a basis.
library(tibble, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) adeg <- tribble( ~USUBJID, ~PARAM, ~AVISIT, ~EGDTC, ~AVAL, "XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:50", 385, "XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:52", 399, "XYZ-1001", "QTcF Int. (msec)", "Baseline", "2016-02-24T07:56", 396, "XYZ-1001", "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:48", 393, "XYZ-1001", "QTcF Int. (msec)", "Visit 2", "2016-03-08T09:51", 388, "XYZ-1001", "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:48", 394, "XYZ-1001", "QTcF Int. (msec)", "Visit 3", "2016-03-22T10:51", 402, "XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 399, "XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T07:58", 200, "XYZ-1002", "QTcF Int. (msec)", "Baseline", "2016-02-22T08:01", 392, "XYZ-1002", "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:53", 414, "XYZ-1002", "QTcF Int. (msec)", "Visit 3", "2016-03-24T10:56", 402 ) %>% mutate(ADTM = convert_dtc_to_dtm(EGDTC))
A derived record is generated for each subject, containing the mean of the triplicate ECG
interval values (AVAL
) and the latest measurement's time (ADTM
) by using summary functions
within the set_values_to
argument.
derive_summary_records( adeg, dataset_add = adeg, by_vars = exprs(USUBJID, PARAM, AVISIT), set_values_to = exprs( AVAL = mean(AVAL, na.rm = TRUE), ADTM = max(ADTM), DTYPE = "AVERAGE" ) ) %>% arrange(USUBJID, AVISIT) #> # A tibble: 17 × 7 #> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE #> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> #> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA> #> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA> #> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA> #> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. 2016-02-24 07:56:00 AVER… #> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA> #> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA> #> 7 XYZ-1001 QTcF Int. (msec) Visit 2 <NA> 390. 2016-03-08 09:51:00 AVER… #> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA> #> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA> #> 10 XYZ-1001 QTcF Int. (msec) Visit 3 <NA> 398 2016-03-22 10:51:00 AVER… #> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA> #> 12 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA> #> 13 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA> #> 14 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. 2016-02-22 08:01:00 AVER… #> 15 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA> #> 16 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA> #> 17 XYZ-1002 QTcF Int. (msec) Visit 3 <NA> 408 2016-03-24 10:56:00 AVER…
Functions such as all()
and any()
are also often useful when creating
summary records. For instance, the above example can be extended to flag which derived
records were affected by outliers. Note that the outlier flag is created before AVAL
is set for the summary record. Otherwise, referencing AVAL
later on would pick up the
AVAL
from the summary record rather than the source records.
derive_summary_records( adeg, dataset_add = adeg, by_vars = exprs(USUBJID, PARAM, AVISIT), set_values_to = exprs( OUTLIEFL = if_else(any(AVAL >= 500 | AVAL <= 300), "Y", "N"), AVAL = mean(AVAL, na.rm = TRUE), ADTM = max(ADTM), DTYPE = "AVERAGE" ) ) %>% arrange(USUBJID, AVISIT) #> # A tibble: 17 × 8 #> USUBJID PARAM AVISIT EGDTC AVAL ADTM OUTLIEFL DTYPE #> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> <chr> #> 1 XYZ-1001 QTcF Int. (ms… Basel… 2016… 385 2016-02-24 07:50:00 <NA> <NA> #> 2 XYZ-1001 QTcF Int. (ms… Basel… 2016… 399 2016-02-24 07:52:00 <NA> <NA> #> 3 XYZ-1001 QTcF Int. (ms… Basel… 2016… 396 2016-02-24 07:56:00 <NA> <NA> #> 4 XYZ-1001 QTcF Int. (ms… Basel… <NA> 393. 2016-02-24 07:56:00 N AVER… #> 5 XYZ-1001 QTcF Int. (ms… Visit… 2016… 393 2016-03-08 09:48:00 <NA> <NA> #> 6 XYZ-1001 QTcF Int. (ms… Visit… 2016… 388 2016-03-08 09:51:00 <NA> <NA> #> 7 XYZ-1001 QTcF Int. (ms… Visit… <NA> 390. 2016-03-08 09:51:00 N AVER… #> 8 XYZ-1001 QTcF Int. (ms… Visit… 2016… 394 2016-03-22 10:48:00 <NA> <NA> #> 9 XYZ-1001 QTcF Int. (ms… Visit… 2016… 402 2016-03-22 10:51:00 <NA> <NA> #> 10 XYZ-1001 QTcF Int. (ms… Visit… <NA> 398 2016-03-22 10:51:00 N AVER… #> 11 XYZ-1002 QTcF Int. (ms… Basel… 2016… 399 2016-02-22 07:58:00 <NA> <NA> #> 12 XYZ-1002 QTcF Int. (ms… Basel… 2016… 200 2016-02-22 07:58:00 <NA> <NA> #> 13 XYZ-1002 QTcF Int. (ms… Basel… 2016… 392 2016-02-22 08:01:00 <NA> <NA> #> 14 XYZ-1002 QTcF Int. (ms… Basel… <NA> 330. 2016-02-22 08:01:00 Y AVER… #> 15 XYZ-1002 QTcF Int. (ms… Visit… 2016… 414 2016-03-24 10:53:00 <NA> <NA> #> 16 XYZ-1002 QTcF Int. (ms… Visit… 2016… 402 2016-03-24 10:56:00 <NA> <NA> #> 17 XYZ-1002 QTcF Int. (ms… Visit… <NA> 408 2016-03-24 10:56:00 N AVER…
filter_add
)The filter_add
argument can be used to restrict the records that are being summarized.
For instance, the mean of the triplicates above can be computed only for the baseline records
by passing filter_add = AVISIT == "Baseline"
.
derive_summary_records( adeg, dataset_add = adeg, by_vars = exprs(USUBJID, PARAM, AVISIT), filter_add = AVISIT == "Baseline", set_values_to = exprs( AVAL = mean(AVAL, na.rm = TRUE), DTYPE = "AVERAGE" ) ) %>% arrange(USUBJID, AVISIT) #> # A tibble: 14 × 7 #> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE #> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> #> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA> #> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA> #> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA> #> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. NA AVER… #> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA> #> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA> #> 7 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA> #> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA> #> 9 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA> #> 10 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA> #> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA> #> 12 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. NA AVER… #> 13 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA> #> 14 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
Summary functions can also be used within filter_add
to filter based on conditions
applied to the whole of the by group specified in by_vars
. For instance, the mean of
the triplicates can be computed only for by groups which do indeed contain three records
by passing filter_add = n() > 2
.
derive_summary_records( adeg, dataset_add = adeg, by_vars = exprs(USUBJID, PARAM, AVISIT), filter_add = n() > 2, set_values_to = exprs( AVAL = mean(AVAL, na.rm = TRUE), DTYPE = "AVERAGE" ) ) %>% arrange(USUBJID, AVISIT) #> # A tibble: 14 × 7 #> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE #> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> #> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA> #> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA> #> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA> #> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. NA AVER… #> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA> #> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA> #> 7 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA> #> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA> #> 9 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA> #> 10 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA> #> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA> #> 12 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. NA AVER… #> 13 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA> #> 14 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA>
dataset_ref
and missing_values
)Adding records for groups which are not in the source data can be achieved by
specifying a reference dataset in the dataset_ref
argument. For example, specifying
the input dataset adeg_allparamvis
(containing an extra "Visit 2"
for patient
1002
) ensures a summary record is derived for that visit as well. For these records,
the values of the analysis variables to be populated should be specified within the
missing_values
argument. Here, DTYPE = "PHANTOM"
was chosen as AVAL
is set to
missing.
adeg_allparamvis <- tribble( ~USUBJID, ~PARAM, ~AVISIT, "XYZ-1001", "QTcF Int. (msec)", "Baseline", "XYZ-1001", "QTcF Int. (msec)", "Visit 2", "XYZ-1001", "QTcF Int. (msec)", "Visit 3", "XYZ-1002", "QTcF Int. (msec)", "Baseline", "XYZ-1002", "QTcF Int. (msec)", "Visit 2", "XYZ-1002", "QTcF Int. (msec)", "Visit 3" ) derive_summary_records( adeg, dataset_add = adeg, dataset_ref = adeg_allparamvis, by_vars = exprs(USUBJID, PARAM, AVISIT), set_values_to = exprs( AVAL = mean(AVAL, na.rm = TRUE), ADTM = max(ADTM), DTYPE = "AVERAGE" ), missing_values = exprs( AVAL = NA, ADTM = NA, DTYPE = "PHANTOM" ) ) %>% arrange(USUBJID, AVISIT) #> # A tibble: 18 × 7 #> USUBJID PARAM AVISIT EGDTC AVAL ADTM DTYPE #> <chr> <chr> <chr> <chr> <dbl> <dttm> <chr> #> 1 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 385 2016-02-24 07:50:00 <NA> #> 2 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-24 07:52:00 <NA> #> 3 XYZ-1001 QTcF Int. (msec) Baseline 2016-02-2… 396 2016-02-24 07:56:00 <NA> #> 4 XYZ-1001 QTcF Int. (msec) Baseline <NA> 393. 2016-02-24 07:56:00 AVER… #> 5 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 393 2016-03-08 09:48:00 <NA> #> 6 XYZ-1001 QTcF Int. (msec) Visit 2 2016-03-0… 388 2016-03-08 09:51:00 <NA> #> 7 XYZ-1001 QTcF Int. (msec) Visit 2 <NA> 390. 2016-03-08 09:51:00 AVER… #> 8 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 394 2016-03-22 10:48:00 <NA> #> 9 XYZ-1001 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-22 10:51:00 <NA> #> 10 XYZ-1001 QTcF Int. (msec) Visit 3 <NA> 398 2016-03-22 10:51:00 AVER… #> 11 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 399 2016-02-22 07:58:00 <NA> #> 12 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 200 2016-02-22 07:58:00 <NA> #> 13 XYZ-1002 QTcF Int. (msec) Baseline 2016-02-2… 392 2016-02-22 08:01:00 <NA> #> 14 XYZ-1002 QTcF Int. (msec) Baseline <NA> 330. 2016-02-22 08:01:00 AVER… #> 15 XYZ-1002 QTcF Int. (msec) Visit 2 <NA> NA NA PHAN… #> 16 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 414 2016-03-24 10:53:00 <NA> #> 17 XYZ-1002 QTcF Int. (msec) Visit 3 2016-03-2… 402 2016-03-24 10:56:00 <NA> #> 18 XYZ-1002 QTcF Int. (msec) Visit 3 <NA> 408 2016-03-24 10:56:00 AVER…
derive_var_merged_summary()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_event()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.