View source: R/derive_extreme_event.R
| derive_extreme_event | R Documentation |
Add the first available record from events for each by group as new
records, all variables of the selected observation are kept. It can be used
for selecting the extreme observation from a series of user-defined events.
This distinguishes derive_extreme_event() from derive_extreme_records(),
where extreme records are derived based on certain order of existing
variables.
derive_extreme_event(
dataset = NULL,
by_vars,
events,
tmp_event_nr_var = NULL,
order,
mode,
source_datasets = NULL,
check_type = "warning",
set_values_to = NULL,
keep_source_vars = exprs(everything())
)
dataset |
Input dataset The variables specified by the
|
by_vars |
Grouping variables
|
events |
Conditions and new values defining events A list of For
|
tmp_event_nr_var |
Temporary event number variable The specified variable is added to all source datasets and is set to the number of the event before selecting the records of the event. It can be used in The variable is not included in the output dataset.
|
order |
Sort order If a particular event from For handling of
|
mode |
Selection mode (first or last) If a particular event from
|
source_datasets |
Source datasets A named list of datasets is expected. The
|
check_type |
Check uniqueness? If
|
set_values_to |
Variables to be set The specified variables are set to the specified values for the new observations. Set a list of variables to some specified value for the new records
For example: set_values_to = exprs(
PARAMCD = "WOBS",
PARAM = "Worst Observations"
)
|
keep_source_vars |
Variables to keep from the source dataset For each event the specified variables are kept from the selected
observations. The variables specified for
|
For each event select the observations to consider:
If the event is of class event, the observations of the source dataset
are restricted by condition and then the first or last (mode)
observation per by group (by_vars) is selected.
If the event is of class event_joined, filter_joined() is called to
select the observations.
The variables specified by the set_values_to field of the event
are added to the selected observations.
The variable specified for tmp_event_nr_var is added and set to
the number of the event.
Only the variables specified for the keep_source_vars field of the
event, and the by variables (by_vars) and the variables created by
set_values_to are kept. If keep_source_vars = NULL is used for an event
in derive_extreme_event() the value of the keep_source_vars argument of
derive_extreme_event() is used.
All selected observations are bound together.
For each group (with respect to the variables specified for the
by_vars parameter) the first or last observation (with respect to the
order specified for the order parameter and the mode specified for the
mode parameter) is selected.
The variables specified by the set_values_to parameter are added to
the selected observations.
The observations are added to input dataset.
Note: This function creates temporary datasets which may be much bigger
than the input datasets. If this causes memory issues, please try setting
the admiral option save_memory to TRUE (see set_admiral_options()).
This reduces the memory consumption but increases the run-time.
The input dataset with the best or worst observation of each by group added as new observations.
event() objectsFor each subject, the observation containing the worst sleeping problem (if any exist) should be identified and added as a new record, retaining all variables from the original observation. If multiple occurrences of the worst sleeping problem occur, or no sleeping problems, then take the observation occurring at the latest day.
The groups for which new records are added are specified by the by_vars
argument. Here for each subject a record should be added. Thus
by_vars = exprs(STUDYID, USUBJID) is specified.
The sets of possible sleeping problems are passed through the events
argument as event() objects. Each event contains a condition which
may or may not be satisfied by each record (or possibly a group of
records) within the input dataset dataset. Summary functions such as
any() and all() are often handy to use within conditions, as is done
here for the third event, which checks that the subject had no sleeping
issues. The final event uses a catch-all condition = TRUE to ensure all
subjects have a new record derived. Note that in this example, as no
condition involves analysis of cross-comparison values of within records,
it is sufficient to use event() objects rather than event_joined() -
see the next example for a more complex condition.
If any subject has one or more records satisfying the conditions from
events, we can select just one record using the order argument. In this
example, the first argument passed to order is event_nr, which is a
temporary variable created through the tmp_event_nr_var argument, which
numbers the events consecutively. Since mode = "first", we only consider
the first event for which a condition is satisfied. Within that event, we
consider only the observation with the latest day, because the second
argument for the order is desc(ADY).
Once a record is identified as satisfying an event's condition, a new observation is created by the following process:
the selected record is copied,
the variables specified in the event's set_values_to (here,
AVAL and AVALC) are created/updated,
the variables specified in keep_source_vars (here, ADY does due
to the use of the tidyselect expression everything()) (plus by_vars
and the variables from set_values_to) are kept,
the variables specified in the global set_values_to (here,
PARAM and PARAMCD) are created/updated.
library(tibble, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
adqs1 <- tribble(
~USUBJID, ~PARAMCD, ~AVALC, ~ADY,
"1", "NO SLEEP", "N", 1,
"1", "WAKE UP 3X", "N", 2,
"2", "NO SLEEP", "N", 1,
"2", "WAKE UP 3X", "Y", 2,
"2", "WAKE UP 3X", "Y", 3,
"3", "NO SLEEP", NA_character_, 1
) %>%
mutate(STUDYID = "AB42")
derive_extreme_event(
adqs1,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event(
condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
),
event(
condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y",
set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2)
),
event(
condition = all(AVALC == "N"),
set_values_to = exprs(AVALC = "No sleeping problems", AVAL = 3)
),
event(
condition = TRUE,
set_values_to = exprs(AVALC = "Missing", AVAL = 99)
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, desc(ADY)),
mode = "first",
set_values_to = exprs(
PARAMCD = "WSP",
PARAM = "Worst Sleeping Problem"
),
keep_source_vars = exprs(everything())
) %>%
select(-STUDYID)
#> # A tibble: 9 × 6
#> USUBJID PARAMCD AVALC ADY AVAL PARAM
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 1 NO SLEEP N 1 NA <NA>
#> 2 1 WAKE UP 3X N 2 NA <NA>
#> 3 2 NO SLEEP N 1 NA <NA>
#> 4 2 WAKE UP 3X Y 2 NA <NA>
#> 5 2 WAKE UP 3X Y 3 NA <NA>
#> 6 3 NO SLEEP <NA> 1 NA <NA>
#> 7 1 WSP No sleeping problems 2 3 Worst Sleeping Problem
#> 8 2 WSP Waking up three times 3 2 Worst Sleeping Problem
#> 9 3 WSP Missing 1 99 Worst Sleeping Problemevent_joined())We'll now extend the above example. Specifically, we consider a new possible worst sleeping problem, namely if a subject experiences no sleep on consecutive days.
The "consecutive days" portion of the condition requires records to be
compared with each other. This is done by using an event_joined() object,
specifically by passing dataset_name = adqs2 to it so that the adqs2
dataset is joined onto itself. The condition now checks for two
no sleep records, and crucially compares the ADY values to see if
they differ by one day. The .join syntax distinguishes between the
ADY value of the parent and joined datasets. As the condition involves
AVALC, PARAMCD and ADY, we specify these variables with join_vars,
and finally, because we wish to compare all records with each other, we
select join_type = "all".
adqs2 <- tribble(
~USUBJID, ~PARAMCD, ~AVALC, ~ADY,
"4", "WAKE UP", "N", 1,
"4", "NO SLEEP", "Y", 2,
"4", "NO SLEEP", "Y", 3,
"5", "NO SLEEP", "N", 1,
"5", "NO SLEEP", "Y", 2,
"5", "WAKE UP 3X", "Y", 3,
"5", "NO SLEEP", "Y", 4
) %>%
mutate(STUDYID = "AB42")
derive_extreme_event(
adqs2,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event_joined(
join_vars = exprs(AVALC, PARAMCD, ADY),
join_type = "all",
condition = PARAMCD == "NO SLEEP" & AVALC == "Y" &
PARAMCD.join == "NO SLEEP" & AVALC.join == "Y" &
ADY == ADY.join + 1,
set_values_to = exprs(AVALC = "No sleep two nights in a row", AVAL = 0)
),
event(
condition = PARAMCD == "NO SLEEP" & AVALC == "Y",
set_values_to = exprs(AVALC = "No sleep", AVAL = 1)
),
event(
condition = PARAMCD == "WAKE UP 3X" & AVALC == "Y",
set_values_to = exprs(AVALC = "Waking up three times", AVAL = 2)
),
event(
condition = all(AVALC == "N"),
set_values_to = exprs(
AVALC = "No sleeping problems", AVAL = 3
)
),
event(
condition = TRUE,
set_values_to = exprs(AVALC = "Missing", AVAL = 99)
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, desc(ADY)),
mode = "first",
set_values_to = exprs(
PARAMCD = "WSP",
PARAM = "Worst Sleeping Problem"
),
keep_source_vars = exprs(everything())
) %>%
select(-STUDYID)
#> # A tibble: 9 × 6
#> USUBJID PARAMCD AVALC ADY AVAL PARAM
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 4 WAKE UP N 1 NA <NA>
#> 2 4 NO SLEEP Y 2 NA <NA>
#> 3 4 NO SLEEP Y 3 NA <NA>
#> 4 5 NO SLEEP N 1 NA <NA>
#> 5 5 NO SLEEP Y 2 NA <NA>
#> 6 5 WAKE UP 3X Y 3 NA <NA>
#> 7 5 NO SLEEP Y 4 NA <NA>
#> 8 4 WSP No sleep two nights in a row 3 0 Worst Sleeping Pr…
#> 9 5 WSP No sleep 4 1 Worst Sleeping Pr…event() objectsHere we consider a Hy's Law use case. We are interested in
knowing whether a subject's Alkaline Phosphatase has ever been
above twice the upper limit of normal range. If so, i.e. if
CRIT1FL is Y, we are interested in the record for the first
time this occurs, and if not, we wish to retain the last record.
As such, for this case now we need to vary our usage of the
mode argument dependent on the event().
In first event(), since we simply seek the first time that
CRIT1FL is "Y", it's enough to specify the condition,
because we inherit order and mode from the main
derive_extreme_event() call here which will automatically
select the first occurrence by AVISITN.
In the second event(), we select the last record among the
full set of records where CRIT1FL are all "N" by additionally
specifying mode = "last" within the event().
Note now the usage of keep_source_vars = exprs(AVISITN)
rather than everything() as in the previous example. This
is done to ensure CRIT1 and CRIT1FL are not populated for
the new records.
adhy <- tribble(
~USUBJID, ~AVISITN, ~CRIT1, ~CRIT1FL,
"1", 1, "ALT > 2 times ULN", "N",
"1", 2, "ALT > 2 times ULN", "N",
"2", 1, "ALT > 2 times ULN", "N",
"2", 2, "ALT > 2 times ULN", "Y",
"2", 3, "ALT > 2 times ULN", "N",
"2", 4, "ALT > 2 times ULN", "Y"
) %>%
mutate(
PARAMCD = "ALT",
PARAM = "ALT (U/L)",
STUDYID = "AB42"
)
derive_extreme_event(
adhy,
by_vars = exprs(STUDYID, USUBJID),
events = list(
event(
condition = CRIT1FL == "Y",
set_values_to = exprs(AVALC = "Y")
),
event(
condition = CRIT1FL == "N",
mode = "last",
set_values_to = exprs(AVALC = "N")
)
),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, AVISITN),
mode = "first",
keep_source_vars = exprs(AVISITN),
set_values_to = exprs(
PARAMCD = "ALT2",
PARAM = "ALT > 2 times ULN"
)
) %>%
select(-STUDYID)
#> # A tibble: 8 × 7
#> USUBJID AVISITN CRIT1 CRIT1FL PARAMCD PARAM AVALC
#> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 1 1 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 2 1 2 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 3 2 1 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 4 2 2 ALT > 2 times ULN Y ALT ALT (U/L) <NA>
#> 5 2 3 ALT > 2 times ULN N ALT ALT (U/L) <NA>
#> 6 2 4 ALT > 2 times ULN Y ALT ALT (U/L) <NA>
#> 7 1 2 <NA> <NA> ALT2 ALT > 2 times ULN N
#> 8 2 2 <NA> <NA> ALT2 ALT > 2 times ULN Y first/last_cond_upper, join_type, source_datasets)The final example showcases a use of derive_extreme_event()
to calculate the Confirmed Best Overall Response (CBOR) in an
ADRS dataset, as is common in many oncology trials. This example
builds on all the previous ones and thus assumes a baseline level
of confidence with derive_extreme_event().
The following ADSL and ADRS datasets will be used
throughout:
adsl <- tribble(
~USUBJID, ~TRTSDTC,
"1", "2020-01-01",
"2", "2019-12-12",
"3", "2019-11-11",
"4", "2019-12-30",
"5", "2020-01-01",
"6", "2020-02-02",
"7", "2020-02-02",
"8", "2020-02-01"
) %>%
mutate(
TRTSDT = ymd(TRTSDTC),
STUDYID = "AB42"
)
adrs <- tribble(
~USUBJID, ~ADTC, ~AVALC,
"1", "2020-01-01", "PR",
"1", "2020-02-01", "CR",
"1", "2020-02-16", "NE",
"1", "2020-03-01", "CR",
"1", "2020-04-01", "SD",
"2", "2020-01-01", "SD",
"2", "2020-02-01", "PR",
"2", "2020-03-01", "SD",
"2", "2020-03-13", "CR",
"4", "2020-01-01", "PR",
"4", "2020-03-01", "NE",
"4", "2020-04-01", "NE",
"4", "2020-05-01", "PR",
"5", "2020-01-01", "PR",
"5", "2020-01-10", "PR",
"5", "2020-01-20", "PR",
"6", "2020-02-06", "PR",
"6", "2020-02-16", "CR",
"6", "2020-03-30", "PR",
"7", "2020-02-06", "PR",
"7", "2020-02-16", "CR",
"7", "2020-04-01", "NE",
"8", "2020-02-16", "PD"
) %>%
mutate(
ADT = ymd(ADTC),
STUDYID = "AB42",
PARAMCD = "OVR",
PARAM = "Overall Response by Investigator"
) %>%
derive_vars_merged(
dataset_add = adsl,
by_vars = exprs(STUDYID, USUBJID),
new_vars = exprs(TRTSDT)
)Since the CBOR derivation contains multiple complex parts, it's
convenient to make use of the description argument within each event object
to describe what condition is being checked.
For the Confirmed Response (CR), for each "CR" record in the original ADRS
dataset that will be identified by the first part of the condition argument
(AVALC == "CR"), we need to use the first_cond_upper argument to limit the
group of observations to consider alongside it. Namely, we need to look up to
and including the second CR (AVALC.join == "CR") over 28 days from the first
one (ADT.join >= ADT + 28). The observations satisfying first_cond_upper
then form part of our "join group", meaning that the remaining portions of
condition which reference joined variables are limited to this group.
In particular, within condition we use all() to check that all observations
are either "CR" or "NE", and count_vals() to ensure at most one is
"NE".
Note that the selection of join_type = "after" is critical here, due to the
fact that the restriction implied by join_type is applied before the one
implied by first_cond_upper. Picking the first subject (who was correctly
identified as a confirmed responder) as an example, selecting
join_type = "all" instead of "after" would mean the first "PR" record
from "2020-01-01" would also be considered when evaluating the
all(AVALC.join %in% c("CR", "NE")) portion of condition. In turn, the
condition would not be satisfied anymore, and in this case, following the
later event logic shows the subject would be considered a partial responder
instead.
The Partial Response (PR), is very similar; with the difference being that the
first portion of condition now references "PR" and first_cond_upper
accepts a confirmatory "PR" or "CR" 28 days later. Note that now we must add
"PR" as an option within the all() condition to account for confirmatory
"PR"s.
The Stable Disease (SD), Progressive Disease (PD) and Not Evaluable (NE)
events are simpler and just require event() calls.
Finally, we use a catch-all event() with condition = TRUE and
dataset_name = "adsl" to identify those subjects who do not appear in ADRS
and list their CBOR as "MISSING". Note here the fact that dataset_name is
set to "adsl", which is a new source dataset. As such it's important in the
main derive_extreme_event() call to list adsl as another source dataset
with source_datasets = list(adsl = adsl).
derive_extreme_event(
adrs,
by_vars = exprs(STUDYID, USUBJID),
tmp_event_nr_var = event_nr,
order = exprs(event_nr, ADT),
mode = "first",
source_datasets = list(adsl = adsl),
events = list(
event_joined(
description = paste(
"CR needs to be confirmed by a second CR at least 28 days later",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join == "CR" & ADT.join >= ADT + 28,
condition = AVALC == "CR" &
all(AVALC.join %in% c("CR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(AVALC = "CR")
),
event_joined(
description = paste(
"PR needs to be confirmed by a second CR or PR at least 28 days later,",
"at most one NE is acceptable between the two assessments"
),
join_vars = exprs(AVALC, ADT),
join_type = "after",
first_cond_upper = AVALC.join %in% c("CR", "PR") & ADT.join >= ADT + 28,
condition = AVALC == "PR" &
all(AVALC.join %in% c("CR", "PR", "NE")) &
count_vals(var = AVALC.join, val = "NE") <= 1,
set_values_to = exprs(AVALC = "PR")
),
event(
description = paste(
"CR, PR, or SD are considered as SD if occurring at least 28",
"after treatment start"
),
condition = AVALC %in% c("CR", "PR", "SD") & ADT >= TRTSDT + 28,
set_values_to = exprs(AVALC = "SD")
),
event(
condition = AVALC == "PD",
set_values_to = exprs(AVALC = "PD")
),
event(
condition = AVALC %in% c("CR", "PR", "SD", "NE"),
set_values_to = exprs(AVALC = "NE")
),
event(
description = "Set response to MISSING for patients without records in ADRS",
dataset_name = "adsl",
condition = TRUE,
set_values_to = exprs(AVALC = "MISSING"),
keep_source_vars = exprs(TRTSDT)
)
),
set_values_to = exprs(
PARAMCD = "CBOR",
PARAM = "Best Confirmed Overall Response by Investigator"
)
) %>%
filter(PARAMCD == "CBOR") %>%
select(-STUDYID, -ADTC)
#> # A tibble: 8 × 6
#> USUBJID AVALC ADT PARAMCD PARAM TRTSDT
#> <chr> <chr> <date> <chr> <chr> <date>
#> 1 1 CR 2020-02-01 CBOR Best Confirmed Overall Response… 2020-01-01
#> 2 2 SD 2020-02-01 CBOR Best Confirmed Overall Response… 2019-12-12
#> 3 3 MISSING NA CBOR Best Confirmed Overall Response… 2019-11-11
#> 4 4 SD 2020-05-01 CBOR Best Confirmed Overall Response… 2019-12-30
#> 5 5 NE 2020-01-01 CBOR Best Confirmed Overall Response… 2020-01-01
#> 6 6 PR 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02
#> 7 7 NE 2020-02-06 CBOR Best Confirmed Overall Response… 2020-02-02
#> 8 8 PD 2020-02-16 CBOR Best Confirmed Overall Response… 2020-02-01Equivalent examples for using thecheck_type argument can be found in
derive_extreme_records().
event(), event_joined(), derive_vars_extreme_event()
BDS-Findings Functions for adding Parameters/Records:
default_qtc_paramcd(),
derive_expected_records(),
derive_extreme_records(),
derive_locf_records(),
derive_param_bmi(),
derive_param_bsa(),
derive_param_computed(),
derive_param_doseint(),
derive_param_exist_flag(),
derive_param_exposure(),
derive_param_framingham(),
derive_param_map(),
derive_param_qtc(),
derive_param_rr(),
derive_param_wbc_abs(),
derive_summary_records()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.