View source: R/derive_joined.R
derive_vars_joined | R Documentation |
The function adds variables from an additional dataset to the input dataset. The selection of the observations from the additional dataset can depend on variables from both datasets. For example, add the lowest value (nadir) before the current observation.
derive_vars_joined(
dataset,
dataset_add,
by_vars = NULL,
order = NULL,
new_vars = NULL,
tmp_obs_nr_var = NULL,
join_vars = NULL,
join_type,
filter_add = NULL,
first_cond_lower = NULL,
first_cond_upper = NULL,
filter_join = NULL,
mode = NULL,
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
missing_values = NULL,
check_type = "warning"
)
dataset |
Input dataset The variables specified by the |
dataset_add |
Additional dataset The variables specified by the |
by_vars |
Grouping variables The two datasets are joined by the specified variables. Variables can be renamed by naming the element, i.e.
Permitted Values: list of variables created by |
order |
Sort order If the argument is set to a non-null value, for each observation of the
input dataset the first or last observation from the joined dataset is
selected with respect to the specified order. The specified variables are
expected in the additional dataset ( If an expression is named, e.g., For handling of Permitted Values: list of expressions created by |
new_vars |
Variables to add The specified variables from the additional dataset are added to the output
dataset. Variables can be renamed by naming the element, i.e., For example And Values of the added variables can be modified by specifying an expression.
For example, If the argument is not specified or set to Permitted Values: list of variables or named expressions created by |
tmp_obs_nr_var |
Temporary observation number The specified variable is added to the input dataset ( The variable is not included in the output dataset. To include it specify
it for |
join_vars |
Variables to use from additional dataset Any extra variables required from the additional dataset for If an expression is named, e.g., The variables are not included in the output dataset. Permitted Values: list of variables or named expressions created by |
join_type |
Observations to keep after joining The argument determines which of the joined observations are kept with
respect to the original observation. For example, if For example for confirmed response or BOR in the oncology setting or
confirmed deterioration in questionnaires the confirmatory assessment must
be after the assessment. Thus Whereas, sometimes you might allow for confirmatory observations to occur
prior to the observation. For example, to identify AEs occurring on or
after seven days before a COVID AE. Thus Permitted Values: |
filter_add |
Filter for additional dataset ( Only observations from Variables created by The condition can include summary functions like Permitted Values: a condition |
first_cond_lower |
Condition for selecting range of data (before) If this argument is specified, the other observations are restricted from the first observation before the current observation where the specified condition is fulfilled up to the current observation. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if |
first_cond_upper |
Condition for selecting range of data (after) If this argument is specified, the other observations are restricted up to the first observation where the specified condition is fulfilled. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if |
filter_join |
Filter for the joined dataset The specified condition is applied to the joined dataset. Therefore
variables from both datasets Variables created by The condition can include summary functions like Permitted Values: a condition |
mode |
Selection mode Determines if the first or last observation is selected. If the If the Permitted Values: |
exist_flag |
Exist flag If the argument is specified (e.g., Permitted Values: Variable name |
true_value |
True value The value for the specified variable Permitted Values: An atomic scalar |
false_value |
False value The value for the specified variable Permitted Values: An atomic scalar |
missing_values |
Values for non-matching observations For observations of the input dataset ( Permitted Values: named list of expressions, e.g.,
|
check_type |
Check uniqueness? If This argument is ignored if Permitted Values: |
The variables specified by order
are added to the additional dataset
(dataset_add
).
The variables specified by join_vars
are added to the additional dataset
(dataset_add
).
The records from the additional dataset (dataset_add
) are restricted to
those matching the filter_add
condition.
The input dataset and the (restricted) additional dataset are left joined
by the grouping variables (by_vars
). If no grouping variables are
specified, a full join is performed.
If first_cond_lower
is specified, for each observation of the input
dataset the joined dataset is restricted to observations from the first
observation where first_cond_lower
is fulfilled (the observation fulfilling
the condition is included) up to the observation of the input dataset. If for
an observation of the input dataset the condition is not fulfilled, the
observation is removed.
If first_cond_upper
is specified, for each observation of the input
dataset the joined dataset is restricted to observations up to the first
observation where first_cond_upper
is fulfilled (the observation
fulfilling the condition is included). If for an observation of the input
dataset the condition is not fulfilled, the observation is removed.
For an example see the last example in the "Examples" section.
The joined dataset is restricted by the filter_join
condition.
If order
is specified, for each observation of the input dataset the
first or last observation (depending on mode
) is selected.
The variables specified for new_vars
are created (if requested) and
merged to the input dataset. I.e., the output dataset contains all
observations from the input dataset. For observations without a matching
observation in the joined dataset the new variables are set as specified by
missing_values
(or to NA
for variables not in missing_values
).
Observations in the additional dataset which have no matching observation in
the input dataset are ignored.
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
from
the additional dataset (dataset_add
).
derive_var_joined_exist_flag()
, filter_joined()
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_computed()
,
derive_vars_merged()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
library(tibble)
library(lubridate)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
# Add AVISIT (based on time windows), AWLO, and AWHI
adbds <- tribble(
~USUBJID, ~ADY,
"1", -33,
"1", -2,
"1", 3,
"1", 24,
"2", NA,
)
windows <- tribble(
~AVISIT, ~AWLO, ~AWHI,
"BASELINE", -30, 1,
"WEEK 1", 2, 7,
"WEEK 2", 8, 15,
"WEEK 3", 16, 22,
"WEEK 4", 23, 30
)
derive_vars_joined(
adbds,
dataset_add = windows,
join_type = "all",
filter_join = AWLO <= ADY & ADY <= AWHI
)
# derive the nadir after baseline and before the current observation
adbds <- tribble(
~USUBJID, ~ADY, ~AVAL,
"1", -7, 10,
"1", 1, 12,
"1", 8, 11,
"1", 15, 9,
"1", 20, 14,
"1", 24, 12,
"2", 13, 8
)
derive_vars_joined(
adbds,
dataset_add = adbds,
by_vars = exprs(USUBJID),
order = exprs(AVAL),
new_vars = exprs(NADIR = AVAL),
join_vars = exprs(ADY),
join_type = "all",
filter_add = ADY > 0,
filter_join = ADY.join < ADY,
mode = "first",
check_type = "none"
)
# add highest hemoglobin value within two weeks before AE,
# take earliest if more than one
adae <- tribble(
~USUBJID, ~ASTDY,
"1", 3,
"1", 22,
"2", 2
)
adlb <- tribble(
~USUBJID, ~PARAMCD, ~ADY, ~AVAL,
"1", "HGB", 1, 8.5,
"1", "HGB", 3, 7.9,
"1", "HGB", 5, 8.9,
"1", "HGB", 8, 8.0,
"1", "HGB", 9, 8.0,
"1", "HGB", 16, 7.4,
"1", "HGB", 24, 8.1,
"1", "ALB", 1, 42,
)
derive_vars_joined(
adae,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVAL, desc(ADY)),
new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
join_type = "all",
filter_add = PARAMCD == "HGB",
filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
mode = "last"
)
# Add APERIOD, APERIODC based on ADSL
adsl <- tribble(
~USUBJID, ~AP01SDT, ~AP01EDT, ~AP02SDT, ~AP02EDT,
"1", "2021-01-04", "2021-02-06", "2021-02-07", "2021-03-07",
"2", "2021-02-02", "2021-03-02", "2021-03-03", "2021-04-01"
) %>%
mutate(across(ends_with("DT"), ymd)) %>%
mutate(STUDYID = "xyz")
period_ref <- create_period_dataset(
adsl,
new_vars = exprs(APERSDT = APxxSDT, APEREDT = APxxEDT)
)
period_ref
adae <- tribble(
~USUBJID, ~ASTDT,
"1", "2021-01-01",
"1", "2021-01-05",
"1", "2021-02-05",
"1", "2021-03-05",
"1", "2021-04-05",
"2", "2021-02-15",
) %>%
mutate(
ASTDT = ymd(ASTDT),
STUDYID = "xyz"
)
derive_vars_joined(
adae,
dataset_add = period_ref,
by_vars = exprs(STUDYID, USUBJID),
join_vars = exprs(APERSDT, APEREDT),
join_type = "all",
filter_join = APERSDT <= ASTDT & ASTDT <= APEREDT
)
# Add day since last dose (LDRELD)
adae <- tribble(
~USUBJID, ~ASTDT, ~AESEQ,
"1", "2020-02-02", 1,
"1", "2020-02-04", 2
) %>%
mutate(ASTDT = ymd(ASTDT))
ex <- tribble(
~USUBJID, ~EXSDTC,
"1", "2020-01-10",
"1", "2020-01",
"1", "2020-01-20",
"1", "2020-02-03"
)
## Please note that EXSDT is created via the order argument and then used
## for new_vars, filter_add, and filter_join
derive_vars_joined(
adae,
dataset_add = ex,
by_vars = exprs(USUBJID),
order = exprs(EXSDT = convert_dtc_to_dt(EXSDTC)),
join_type = "all",
new_vars = exprs(LDRELD = compute_duration(
start_date = EXSDT, end_date = ASTDT
)),
filter_add = !is.na(EXSDT),
filter_join = EXSDT <= ASTDT,
mode = "last"
)
# first_cond_lower and first_cond_upper argument
myd <- tribble(
~subj, ~day, ~val,
"1", 1, "++",
"1", 2, "-",
"1", 3, "0",
"1", 4, "+",
"1", 5, "++",
"1", 6, "-",
"2", 1, "-",
"2", 2, "++",
"2", 3, "+",
"2", 4, "0",
"2", 5, "-",
"2", 6, "++"
)
# derive last "++" day before "0" where all results in between are "+" or "++"
derive_vars_joined(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
mode = "first",
new_vars = exprs(prev_plus_day = day),
join_vars = exprs(val),
join_type = "before",
first_cond_lower = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)
# derive first "++" day after "0" where all results in between are "+" or "++"
derive_vars_joined(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
mode = "last",
new_vars = exprs(next_plus_day = day),
join_vars = exprs(val),
join_type = "after",
first_cond_upper = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.