View source: R/derive_vars_joined_summary.R
derive_vars_joined_summary | R Documentation |
The function summarizes variables from an additional dataset and adds the summarized values as new variables to the input dataset. The selection of the observations from the additional dataset can depend on variables from both datasets. For example, all doses before the current observation can be selected and the sum be added to the input dataset.
derive_vars_joined_summary(
dataset,
dataset_add,
by_vars = NULL,
order = NULL,
new_vars,
tmp_obs_nr_var = NULL,
join_vars = NULL,
join_type,
filter_add = NULL,
first_cond_lower = NULL,
first_cond_upper = NULL,
filter_join = NULL,
missing_values = NULL,
check_type = "warning"
)
dataset |
Input dataset The variables specified by the
|
dataset_add |
Additional dataset The variables specified by the
|
by_vars |
Grouping variables The two datasets are joined by the specified variables. Variables can be renamed by naming the element, i.e.
|
order |
Sort order The specified variables are used to determine the order of the records if
If an expression is named, e.g., For handling of
|
new_vars |
Variables to add The new variables can be defined by named expressions, i.e.,
|
tmp_obs_nr_var |
Temporary observation number The specified variable is added to the input dataset ( The variable is not included in the output dataset. To include it specify
it for
|
join_vars |
Variables to use from additional dataset Any extra variables required from the additional dataset for If an expression is named, e.g., The variables are not included in the output dataset.
|
join_type |
Observations to keep after joining The argument determines which of the joined observations are kept with
respect to the original observation. For example, if
|
filter_add |
Filter for additional dataset ( Only observations from Variables created by The condition can include summary functions like
|
first_cond_lower |
Condition for selecting range of data (before) If this argument is specified, the other observations are restricted from the first observation before the current observation where the specified condition is fulfilled up to the current observation. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if
|
first_cond_upper |
Condition for selecting range of data (after) If this argument is specified, the other observations are restricted up to the first observation where the specified condition is fulfilled. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if
|
filter_join |
Filter for the joined dataset The specified condition is applied to the joined dataset. Therefore
variables from both datasets Variables created by The condition can include summary functions like
|
missing_values |
Values for non-matching observations For observations of the input dataset (
|
check_type |
Check uniqueness? If The uniqueness is checked only if
|
The variables specified by order
are added to the additional dataset
(dataset_add
).
The variables specified by join_vars
are added to the additional dataset
(dataset_add
).
The records from the additional dataset (dataset_add
) are restricted to
those matching the filter_add
condition.
The input dataset and the (restricted) additional dataset are left joined
by the grouping variables (by_vars
). If no grouping variables are
specified, a full join is performed.
If first_cond_lower
is specified, for each observation of the input
dataset the joined dataset is restricted to observations from the first
observation where first_cond_lower
is fulfilled (the observation fulfilling
the condition is included) up to the observation of the input dataset. If for
an observation of the input dataset the condition is not fulfilled, the
observation is removed.
If first_cond_upper
is specified, for each observation of the input
dataset the joined dataset is restricted to observations up to the first
observation where first_cond_upper
is fulfilled (the observation
fulfilling the condition is included). If for an observation of the input
dataset the condition is not fulfilled, the observation is removed.
For an example see the last example in the "Examples" section.
The joined dataset is restricted by the filter_join
condition.
The variables specified for new_vars
are created and merged to the input
dataset. I.e., the output dataset contains all observations from the input
dataset. For observations without a matching observation in the joined
dataset the new variables are set as specified by missing_values
(or to
NA
for variables not in missing_values
). Observations in the additional
dataset which have no matching observation in the input dataset are ignored.
Note: This function creates temporary datasets which may be much bigger
than the input datasets. If this causes memory issues, please try setting
the admiral option save_memory
to TRUE
(see set_admiral_options()
).
This reduces the memory consumption but increases the run-time.
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
derived from the additional dataset (dataset_add
).
The examples focus on the functionality specific to this function. For
examples of functionality common to all "joined" functions like
filter_join
, filter_add
, join_vars
, ... please see the examples
of derive_vars_joined()
.
CUMDOSA
)Deriving the cumulative actual dose up to the day of the adverse event
in the ADAE
dataset.
USUBJID
is specified for by_vars
to join the ADAE
and the ADEX
dataset by subject.
filter_join
is specified to restrict the ADEX
dataset to the days up
to the adverse event. ADY.join
refers to the study day in ADEX
.
The new variable CUMDOSA
is defined by the new_vars
argument. It is
set to the sum of AVAL
.
As ADY
from ADEX
is used in filter_join
(but not in new_vars
), it
needs to be specified for join_vars
.
The join_type
is set to "all"
to consider all records in the joined
dataset. join_type = "before"
can't by used here because then doses at
the same day as the adverse event would be excluded.
library(tibble) library(dplyr, warn.conflicts = FALSE) adex <- tribble( ~USUBJID, ~ADY, ~AVAL, "1", 1, 10, "1", 8, 20, "1", 15, 10, "2", 8, 5 ) adae <- tribble( ~USUBJID, ~ADY, ~AEDECOD, "1", 2, "Fatigue", "1", 9, "Influenza", "1", 15, "Theft", "1", 15, "Fatigue", "2", 4, "Parasomnia", "3", 2, "Truancy" ) derive_vars_joined_summary( dataset = adae, dataset_add = adex, by_vars = exprs(USUBJID), filter_join = ADY.join <= ADY, join_type = "all", join_vars = exprs(ADY), new_vars = exprs(CUMDOSA = sum(AVAL, na.rm = TRUE)) ) #> # A tibble: 6 × 4 #> USUBJID ADY AEDECOD CUMDOSA #> <chr> <dbl> <chr> <dbl> #> 1 1 2 Fatigue 10 #> 2 1 9 Influenza 30 #> 3 1 15 Theft 40 #> 4 1 15 Fatigue 40 #> 5 2 4 Parasomnia NA #> 6 3 2 Truancy NA
missing_values
)By default, the new variables are set to NA
for records without
matching records in the restricted additional dataset. This can be changed
by specifying the missing_values
argument.
derive_vars_joined_summary( dataset = adae, dataset_add = adex, by_vars = exprs(USUBJID), filter_join = ADY.join <= ADY, join_type = "all", join_vars = exprs(ADY), new_vars = exprs(CUMDOSE = sum(AVAL, na.rm = TRUE)), missing_values = exprs(CUMDOSE = 0) ) #> # A tibble: 6 × 4 #> USUBJID ADY AEDECOD CUMDOSE #> <chr> <dbl> <chr> <dbl> #> 1 1 2 Fatigue 10 #> 2 1 9 Influenza 30 #> 3 1 15 Theft 40 #> 4 1 15 Fatigue 40 #> 5 2 4 Parasomnia 0 #> 6 3 2 Truancy 0
join_type = "before"
, join_type = "after"
)The join_type
argument can be used to select records from the
additional dataset. For example, if join_type = "before"
is specified,
only records before the current observation are selected. If join_type = "after"
is specified, only records after the current observation are
selected.
To illustrate this, a variable (SELECTED_DAYS
) is derived which contains
the selected days.
mydata <- tribble( ~DAY, 1, 2, 3, 4, 5 ) derive_vars_joined_summary( mydata, dataset_add = mydata, order = exprs(DAY), join_type = "before", new_vars = exprs(SELECTED_DAYS = paste(DAY, collapse = ", ")) ) #> # A tibble: 5 × 2 #> DAY SELECTED_DAYS #> <dbl> <chr> #> 1 1 <NA> #> 2 2 1 #> 3 3 1, 2 #> 4 4 1, 2, 3 #> 5 5 1, 2, 3, 4 derive_vars_joined_summary( mydata, dataset_add = mydata, order = exprs(DAY), join_type = "after", new_vars = exprs(SELECTED_DAYS = paste(DAY, collapse = ", ")) ) #> # A tibble: 5 × 2 #> DAY SELECTED_DAYS #> <dbl> <chr> #> 1 1 2, 3, 4, 5 #> 2 2 3, 4, 5 #> 3 3 4, 5 #> 4 4 5 #> 5 5 <NA>
first_cond_lower
, first_cond_upper
)The first_cond_lower
and first_cond_upper
arguments can be used to
restrict the joined dataset to a certain range of records. For example, if
first_cond_lower
is specified, the joined dataset is restricted to the
last observation before the current record where the condition is
fulfilled.
Please note:
If the condition is not fulfilled for any of the records, no records are selected.
The restriction implied by join_type
is applied first.
If a variable is contained in both dataset
and dataset_add
like DAY
in the example below, DAY
refers to the value from dataset
and
DAY.join
to the value from dataset_add
.
To illustrate this, a variable (SELECTED_DAYS
) is derived which contains
the selected days.
derive_vars_joined_summary( mydata, dataset_add = mydata, order = exprs(DAY), join_type = "before", first_cond_lower = DAY.join == 2, new_vars = exprs(SELECTED_DAYS = paste(sort(DAY), collapse = ", ")) ) #> # A tibble: 5 × 2 #> DAY SELECTED_DAYS #> <dbl> <chr> #> 1 1 <NA> #> 2 2 <NA> #> 3 3 2 #> 4 4 2, 3 #> 5 5 2, 3, 4 derive_vars_joined_summary( mydata, dataset_add = mydata, order = exprs(DAY), join_type = "after", first_cond_upper = DAY.join == 4, new_vars = exprs(SELECTED_DAYS = paste(DAY, collapse = ", ")) ) #> # A tibble: 5 × 2 #> DAY SELECTED_DAYS #> <dbl> <chr> #> 1 1 2, 3, 4 #> 2 2 3, 4 #> 3 3 4 #> 4 4 <NA> #> 5 5 <NA> derive_vars_joined_summary( mydata, dataset_add = mydata, order = exprs(DAY), join_type = "all", first_cond_lower = DAY.join == 2, first_cond_upper = DAY.join == 4, new_vars = exprs(SELECTED_DAYS = paste(sort(DAY), collapse = ", ")) ) #> # A tibble: 5 × 2 #> DAY SELECTED_DAYS #> <dbl> <chr> #> 1 1 2, 3, 4 #> 2 2 2, 3, 4 #> 3 3 2, 3, 4 #> 4 4 2, 3, 4 #> 5 5 2, 3, 4
For each planned visit the average score within the week before the visit should be derived if at least three assessments are available.
Please note that the condition for the number of assessments is specified
in new_vars
and not in filter_join
. This is because the number of
assessments within the week before the visit should be counted but not the
number of assessments available for the subject.
planned_visits <- tribble( ~AVISIT, ~ADY, "WEEK 1", 8, "WEEK 4", 29, "WEEK 8", 57 ) %>% mutate(USUBJID = "1", .before = AVISIT) adqs <- tribble( ~ADY, ~AVAL, 1, 10, 2, 12, 4, 9, 5, 9, 7, 10, 25, 11, 27, 10, 29, 10, 41, 8, 42, 9, 44, 5 ) %>% mutate(USUBJID = "1") derive_vars_joined_summary( planned_visits, dataset_add = adqs, by_vars = exprs(USUBJID), filter_join = ADY - 7 <= ADY.join & ADY.join < ADY, join_type = "all", join_vars = exprs(ADY), new_vars = exprs(AVAL = if_else(n() >= 3, mean(AVAL, na.rm = TRUE), NA)) ) #> # A tibble: 3 × 4 #> USUBJID AVISIT ADY AVAL #> <chr> <chr> <dbl> <dbl> #> 1 1 WEEK 1 8 10 #> 2 1 WEEK 4 29 NA #> 3 1 WEEK 8 57 NA
derive_vars_joined()
, derive_var_merged_summary()
,
derive_var_joined_exist_flag()
, filter_joined()
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_cat()
,
derive_vars_computed()
,
derive_vars_joined()
,
derive_vars_merged()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.