View source: R/derive_extreme_event.R
derive_extreme_event | R Documentation |
Add the first available record from events
for each by group as new
records, all variables of the selected observation are kept. It can be used
for selecting the extreme observation from a series of user-defined events.
This distinguishes derive_extreme_event()
from derive_extreme_records()
,
where extreme records are derived based on certain order of existing
variables.
derive_extreme_event(
dataset = NULL,
by_vars,
events,
tmp_event_nr_var = NULL,
order,
mode,
source_datasets = NULL,
check_type = "warning",
set_values_to = NULL,
keep_source_vars = exprs(everything())
)
dataset |
Input dataset The variables specified by the
|
by_vars |
Grouping variables
|
events |
Conditions and new values defining events A list of For
|
tmp_event_nr_var |
Temporary event number variable The specified variable is added to all source datasets and is set to the number of the event before selecting the records of the event. It can be used in The variable is not included in the output dataset.
|
order |
Sort order If a particular event from For handling of
|
mode |
Selection mode (first or last) If a particular event from
|
source_datasets |
Source datasets A named list of datasets is expected. The
|
check_type |
Check uniqueness? If
|
set_values_to |
Variables to be set The specified variables are set to the specified values for the new observations. Set a list of variables to some specified value for the new records
For example: set_values_to = exprs( PARAMCD = "WOBS", PARAM = "Worst Observations" )
|
keep_source_vars |
Variables to keep from the source dataset For each event the specified variables are kept from the selected
observations. The variables specified for
|
For each event select the observations to consider:
If the event is of class event
, the observations of the source dataset
are restricted by condition
and then the first or last (mode
)
observation per by group (by_vars
) is selected.
If the event is of class event_joined
, filter_joined()
is called to
select the observations.
The variables specified by the set_values_to
field of the event
are added to the selected observations.
The variable specified for tmp_event_nr_var
is added and set to
the number of the event.
Only the variables specified for the keep_source_vars
field of the
event, and the by variables (by_vars
) and the variables created by
set_values_to
are kept. If keep_source_vars = NULL
is used for an event
in derive_extreme_event()
the value of the keep_source_vars
argument of
derive_extreme_event()
is used.
All selected observations are bound together.
For each group (with respect to the variables specified for the
by_vars
parameter) the first or last observation (with respect to the
order specified for the order
parameter and the mode specified for the
mode
parameter) is selected.
The variables specified by the set_values_to
parameter are added to
the selected observations.
The observations are added to input dataset.
Note: This function creates temporary datasets which may be much bigger
than the input datasets. If this causes memory issues, please try setting
the admiral option save_memory
to TRUE
(see set_admiral_options()
).
This reduces the memory consumption but increases the run-time.
The input dataset with the best or worst observation of each by group added as new observations.
event()
objectsFor each subject, the observation containing the worst sleeping problem (if any exist) should be identified and added as a new record, retaining all variables from the original observation. If multiple occurrences of the worst sleeping problem occur, or no sleeping problems, then take the observation occurring at the latest day.
The groups for which new records are added are specified by the by_vars
argument. Here for each subject a record should be added. Thus
by_vars = exprs(STUDYID, USUBJID)
is specified.
The sets of possible sleeping problems are passed through the events
argument as event()
objects. Each event contains a condition
which
may or may not be satisfied by each record (or possibly a group of
records) within the input dataset dataset
. Summary functions such as
any()
and all()
are often handy to use within conditions, as is done
here for the third event, which checks that the subject had no sleeping
issues. The final event uses a catch-all condition = TRUE
to ensure all
subjects have a new record derived. Note that in this example, as no
condition involves analysis of cross-comparison values of within records,
it is sufficient to use event()
objects rather than event_joined()
-
see the next example for a more complex condition.
If any subject has one or more records satisfying the conditions from
events, we can select just one record using the order
argument. In this
example, the first argument passed to order
is event_nr
, which is a
temporary variable created through the tmp_event_nr_var
argument, which
numbers the events consecutively. Since mode = "first"
, we only consider
the first event for which a condition is satisfied. Within that event, we
consider only the observation with the latest day, because the second
argument for the order is desc(ADY)
.
Once a record is identified as satisfying an event's condition, a new observation is created by the following process:
the selected record is copied,
the variables specified in the event's set_values_to
(here,
AVAL
and AVALC
) are created/updated,
the variables specified in keep_source_vars
(here, ADY
does due
to the use of the tidyselect expression everything()
) (plus by_vars
and the variables from set_values_to
) are kept,
the variables specified in the global set_values_to
(here,
PARAM
and PARAMCD
) are created/updated.
library(tibble, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) adqs1 <- tribble( ~USUBJID, ~PARAMCD, ~AVALC, ~ADY, "1", "NO SLEEP", "N", 1, "1", "WAKE UP 3X", "N", 2, "2", "NO SLEEP", "N", 1, "2", "WAKE UP 3X", "Y", 2, "2", "WAKE UP 3X", "Y", 3, "3", "NO SLEEP", NA_character_, 1 ) %>% mutate(STUDYID = "AB42") derive_extreme_event( adqs1, by_vars = exprs(STUDYID, USUBJID), events = list( event( condition = PARAMCD == "NO SLEEP" & AVALC == "Y", set_values_to = exprs(AVALC = "No sleep", AVAL = 1) ), event( condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y", set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2) ), event( condition = all(AVALC == "N"), set_values_to = exprs(AVALC = "No sleeping problems", AVAL = 3) ), event( condition = TRUE, set_values_to = exprs(AVALC = "Missing", AVAL = 99) ) ), tmp_event_nr_var = event_nr, order = exprs(event_nr, desc(ADY)), mode = "first", set_values_to = exprs( PARAMCD = "WSP", PARAM = "Worst Sleeping Problem" ), keep_source_vars = exprs(everything()) ) %>% select(-STUDYID) #> # A tibble: 9 × 6 #> USUBJID PARAMCD AVALC ADY AVAL PARAM #> <chr> <chr> <chr> <dbl> <dbl> <chr> #> 1 1 NO SLEEP N 1 NA <NA> #> 2 1 WAKE UP 3X N 2 NA <NA> #> 3 2 NO SLEEP N 1 NA <NA> #> 4 2 WAKE UP 3X Y 2 NA <NA> #> 5 2 WAKE UP 3X Y 3 NA <NA> #> 6 3 NO SLEEP <NA> 1 NA <NA> #> 7 1 WSP No sleeping problems 2 3 Worst Sleeping Problem #> 8 2 WSP Waking up three times 3 2 Worst Sleeping Problem #> 9 3 WSP Missing 1 99 Worst Sleeping Problem
event_joined()
)We'll now extend the above example. Specifically, we consider a new possible worst sleeping problem, namely if a subject experiences no sleep on consecutive days.
The "consecutive days" portion of the condition requires records to be
compared with each other. This is done by using an event_joined()
object,
specifically by passing dataset_name = adqs2
to it so that the adqs2
dataset is joined onto itself. The condition
now checks for two
no sleep records, and crucially compares the ADY
values to see if
they differ by one day. The .join
syntax distinguishes between the
ADY
value of the parent and joined datasets. As the condition involves
AVALC
, PARAMCD
and ADY
, we specify these variables with join_vars
,
and finally, because we wish to compare all records with each other, we
select join_type = "all"
.
adqs2 <- tribble( ~USUBJID, ~PARAMCD, ~AVALC, ~ADY, "4", "WAKE UP", "N", 1, "4", "NO SLEEP", "Y", 2, "4", "NO SLEEP", "Y", 3, "5", "NO SLEEP", "N", 1, "5", "NO SLEEP", "Y", 2, "5", "WAKE UP 3X", "Y", 3, "5", "NO SLEEP", "Y", 4 ) %>% mutate(STUDYID = "AB42") derive_extreme_event( adqs2, by_vars = exprs(STUDYID, USUBJID), events = list( event_joined( join_vars = exprs(AVALC, PARAMCD, ADY), join_type = "all", condition = PARAMCD == "NO SLEEP" & AVALC == "Y" & PARAMCD.join == "NO SLEEP" & AVALC.join == "Y" & ADY == ADY.join + 1, set_values_to = exprs(AVALC = "No sleep two nights in a row", AVAL = 0) ), event( condition = PARAMCD == "NO SLEEP" & AVALC == "Y", set_values_to = exprs(AVALC = "No sleep", AVAL = 1) ), event( condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y", set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2) ), event( condition = all(AVALC == "N"), set_values_to = exprs( AVALC = "No sleeping problems", AVAL = 3 ) ), event( condition = TRUE, set_values_to = exprs(AVALC = "Missing", AVAL = 99) ) ), tmp_event_nr_var = event_nr, order = exprs(event_nr, desc(ADY)), mode = "first", set_values_to = exprs( PARAMCD = "WSP", PARAM = "Worst Sleeping Problem" ), keep_source_vars = exprs(everything()) ) %>% select(-STUDYID) #> # A tibble: 9 × 6 #> USUBJID PARAMCD AVALC ADY AVAL PARAM #> <chr> <chr> <chr> <dbl> <dbl> <chr> #> 1 4 WAKE UP N 1 NA <NA> #> 2 4 NO SLEEP Y 2 NA <NA> #> 3 4 NO SLEEP Y 3 NA <NA> #> 4 5 NO SLEEP N 1 NA <NA> #> 5 5 NO SLEEP Y 2 NA <NA> #> 6 5 WAKE UP 3X Y 3 NA <NA> #> 7 5 NO SLEEP Y 4 NA <NA> #> 8 4 WSP No sleep two nights in a row 3 0 Worst Sleeping Pr… #> 9 5 WSP No sleep 4 1 Worst Sleeping Pr…
event()
objectsHere we consider a Hy's Law use case. We are interested in
knowing whether a subject's Alkaline Phosphatase has ever been
above twice the upper limit of normal range. If so, i.e. if
CRIT1FL
is Y
, we are interested in the record for the first
time this occurs, and if not, we wish to retain the last record.
As such, for this case now we need to vary our usage of the
mode
argument dependent on the event()
.
In first event()
, since we simply seek the first time that
CRIT1FL
is "Y"
, it's enough to specify the condition
,
because we inherit order
and mode
from the main
derive_extreme_event()
call here which will automatically
select the first occurrence by AVISITN
.
In the second event()
, we select the last record among the
full set of records where CRIT1FL
are all "N"
by additionally
specifying mode = "last"
within the event()
.
Note now the usage of keep_source_vars = exprs(AVISITN)
rather than everything()
as in the previous example. This
is done to ensure CRIT1
and CRIT1FL
are not populated for
the new records.
adhy <- tribble( ~USUBJID, ~AVISITN, ~CRIT1, ~CRIT1FL, "1", 1, "ALT > 2 times ULN", "N", "1", 2, "ALT > 2 times ULN", "N", "2", 1, "ALT > 2 times ULN", "N", "2", 2, "ALT > 2 times ULN", "Y", "2", 3, "ALT > 2 times ULN", "N", "2", 4, "ALT > 2 times ULN", "Y" ) %>% mutate( PARAMCD = "ALT", PARAM = "ALT (U/L)", STUDYID = "AB42" ) derive_extreme_event( adhy, by_vars = exprs(STUDYID, USUBJID), events = list( event( condition = CRIT1FL == "Y", set_values_to = exprs(AVALC = "Y") ), event( condition = CRIT1FL == "N", mode = "last", set_values_to = exprs(AVALC = "N") ) ), tmp_event_nr_var = event_nr, order = exprs(event_nr, AVISITN), mode = "first", keep_source_vars = exprs(AVISITN), set_values_to = exprs( PARAMCD = "ALT2", PARAM = "ALT > 2 times ULN" ) ) %>% select(-STUDYID) #> # A tibble: 8 × 7 #> USUBJID AVISITN CRIT1 CRIT1FL PARAMCD PARAM AVALC #> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> #> 1 1 1 ALT > 2 times ULN N ALT ALT (U/L) <NA> #> 2 1 2 ALT > 2 times ULN N ALT ALT (U/L) <NA> #> 3 2 1 ALT > 2 times ULN N ALT ALT (U/L) <NA> #> 4 2 2 ALT > 2 times ULN Y ALT ALT (U/L) <NA> #> 5 2 3 ALT > 2 times ULN N ALT ALT (U/L) <NA> #> 6 2 4 ALT > 2 times ULN Y ALT ALT (U/L) <NA> #> 7 1 2 <NA> <NA> ALT2 ALT > 2 times ULN N #> 8 2 2 <NA> <NA> ALT2 ALT > 2 times ULN Y
first/last_cond_upper
, join_type
, source_datasets
)The final example showcases a use of derive_extreme_event()
to calculate the Confirmed Best Overall Response (CBOR) in an
ADRS
dataset, as is common in many oncology trials. This example
builds on all the previous ones and thus assumes a baseline level
of confidence with derive_extreme_event()
.
The following ADSL
and ADRS
datasets will be used
throughout:
adsl <- tribble( ~USUBJID, ~TRTSDTC, "1", "2020-01-01", "2", "2019-12-12", "3", "2019-11-11", "4", "2019-12-30", "5", "2020-01-01", "6", "2020-02-02", "7", "2020-02-02", "8", "2020-02-01" ) %>% mutate( TRTSDT = ymd(TRTSDTC), STUDYID = "AB42" ) adrs <- tribble( ~USUBJID, ~ADTC, ~AVALC, "1", "2020-01-01", "PR", "1", "2020-02-01", "CR", "1", "2020-02-16", "NE", "1", "2020-03-01", "CR", "1", "2020-04-01", "SD", "2", "2020-01-01", "SD", "2", "2020-02-01", "PR", "2", "2020-03-01", "SD", "2", "2020-03-13", "CR", "4", "2020-01-01", "PR", "4", "2020-03-01", "NE", "4", "2020-04-01", "NE", "4", "2020-05-01", "PR", "5", "2020-01-01", "PR", "5", "2020-01-10", "PR", "5", "2020-01-20", "PR", "6", "2020-02-06", "PR", "6", "2020-02-16", "CR", "6", "2020-03-30", "PR", "7", "2020-02-06", "PR", "7", "2020-02-16", "CR", "7", "2020-04-01", "NE", "8", "2020-02-16", "PD" ) %>% mutate( ADT = ymd(ADTC), STUDYID = "AB42", PARAMCD = "OVR", PARAM = "Overall Response by Investigator" ) %>% derive_vars_merged( dataset_add = adsl, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(TRTSDT) )
Since the CBOR derivation contains multiple complex parts, it's
convenient to make use of the description
argument within each event object
to describe what condition is being checked.
For the Confirmed Response (CR), for each "CR"
record in the original ADRS
dataset that will be identified by the first part of the condition
argument
(AVALC == "CR"
), we need to use the first_cond_upper
argument to limit the
group of observations to consider alongside it. Namely, we need to look up to
and including the second CR (AVALC.join == "CR"
) over 28 days from the first
one (ADT.join >= ADT + 28
). The observations satisfying first_cond_upper
then form part of our "join group", meaning that the remaining portions of
condition
which reference joined variables are limited to this group.
In particular, within condition
we use all()
to check that all observations
are either "CR"
or "NE"
, and count_vals()
to ensure at most one is
"NE"
.
Note that the selection of join_type = "after"
is critical here, due to the
fact that the restriction implied by join_type
is applied before the one
implied by first_cond_upper
. Picking the first subject (who was correctly
identified as a confirmed responder) as an example, selecting
join_type = "all"
instead of "after"
would mean the first "PR"
record
from "2020-01-01"
would also be considered when evaluating the
all(AVALC.join %in% c("CR", "NE"))
portion of condition
. In turn, the
condition would not be satisfied anymore, and in this case, following the
later event logic shows the subject would be considered a partial responder
instead.
The Partial Response (PR), is very similar; with the difference being that the
first portion of condition
now references "PR"
and first_cond_upper
accepts a confirmatory "PR"
or "CR"
28 days later. Note that now we must add
"PR"
as an option within the all()
condition to account for confirmatory
"PR"
s.
The Stable Disease (SD), Progressive Disease (PD) and Not Evaluable (NE)
events are simpler and just require event()
calls.
Finally, we use a catch-all event()
with condition = TRUE
and
dataset_name = "adsl"
to identify those subjects who do not appear in ADRS
and list their CBOR as "MISSING"
. Note here the fact that dataset_name
is
set to "adsl"
, which is a new source dataset. As such it's important in the
main derive_extreme_event()
call to list adsl
as another source dataset
with source_datasets = list(adsl = adsl)
.
derive_extreme_event( adrs, by_vars = exprs(STUDYID, USUBJID), tmp_event_nr_var = event_nr, order = exprs(event_nr, ADT), mode = "first", source_datasets = list(adsl = adsl), events = list( event_joined( description = paste( "CR needs to be confirmed by a second CR at least 28 days later", "at most one NE is acceptable between the two assessments" ), join_vars = exprs(AVALC, ADT), join_type = "after", first_cond_upper = AVALC.join == "CR" & ADT.join >= ADT + 28, condition = AVALC == "CR" & all(AVALC.join %in% c("CR", "NE")) & count_vals(var = AVALC.join, val = "NE") <= 1, set_values_to = exprs(AVALC = "CR") ), event_joined( description = paste( "PR needs to be confirmed by a second CR or PR at least 28 days later,", "at most one NE is acceptable between the two assessments" ), join_vars = exprs(AVALC, ADT), join_type = "after", first_cond_upper = AVALC.join %in% c("CR", "PR") & ADT.join >= ADT + 28, condition = AVALC == "PR" & all(AVALC.join %in% c("CR", "PR", "NE")) & count_vals(var = AVALC.join, val = "NE") <= 1, set_values_to = exprs(AVALC = "PR") ), event( description = paste( "CR, PR, or SD are considered as SD if occurring at least 28", "after treatment start" ), condition = AVALC %in% c("CR", "PR", "SD") & ADT >= TRTSDT + 28, set_values_to = exprs(AVALC = "SD") ), event( condition = AVALC == "PD", set_values_to = exprs(AVALC = "PD") ), event( condition = AVALC %in% c("CR", "PR", "SD", "NE"), set_values_to = exprs(AVALC = "NE") ), event( description = "Set response to MISSING for patients without records in ADRS", dataset_name = "adsl", condition = TRUE, set_values_to = exprs(AVALC = "MISSING"), keep_source_vars = exprs(TRTSDT) ) ), set_values_to = exprs( PARAMCD = "CBOR", PARAM = "Best Confirmed Overall Response by Investigator" ) ) %>% filter(PARAMCD == "CBOR") %>% select(-STUDYID, -ADTC) #> # A tibble: 8 × 6 #> USUBJID AVALC ADT PARAMCD PARAM TRTSDT #> <chr> <chr> <date> <chr> <chr> <date> #> 1 1 CR 2020-02-01 CBOR Best Confirmed Overall Response… 2020-01-01 #> 2 2 SD 2020-02-01 CBOR Best Confirmed Overall Response… 2019-12-12 #> 3 3 MISSING NA CBOR Best Confirmed Overall Response… 2019-11-11 #> 4 4 SD 2020-05-01 CBOR Best Confirmed Overall Response… 2019-12-30 #> 5 5 NE 2020-01-01 CBOR Best Confirmed Overall Response… 2020-01-01 #> 6 6 PR 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02 #> 7 7 NE 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02 #> 8 8 PD 2020-02-16 CBOR Best Confirmed Overall Response… 2020-02-01
Equivalent examples for using thecheck_type
argument can be found in
derive_extreme_records()
.
event()
, event_joined()
, derive_vars_extreme_event()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd()
,
derive_expected_records()
,
derive_extreme_records()
,
derive_locf_records()
,
derive_param_bmi()
,
derive_param_bsa()
,
derive_param_computed()
,
derive_param_doseint()
,
derive_param_exist_flag()
,
derive_param_exposure()
,
derive_param_framingham()
,
derive_param_map()
,
derive_param_qtc()
,
derive_param_rr()
,
derive_param_wbc_abs()
,
derive_summary_records()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.