dh.makeStrata | R Documentation |
For many analyses you may want to create strata of repeated measures data within specified bands. For example, you may have BMI measures between ages 0-18, but want to create a variable for each subject which is their BMI between ages 9-11. This function automates this process.
dh.makeStrata(
df = NULL,
id_var = NULL,
age_var = NULL,
var_to_subset = NULL,
bands = NULL,
mult_action = NULL,
mult_vals = NULL,
keep_vars = NULL,
new_obj = NULL,
band_action = NULL,
conns = NULL,
checks = TRUE,
df_name = NULL
)
df |
Character specifying a server-side data frame. |
id_var |
Character giving the name of the column within 'df' which uniquely identifies each subject. |
age_var |
Character specifying age or time variable in |
var_to_subset |
Character specifying variable in |
bands |
Numeric vector of alternating lower and upper values specifying
the bands in which to derive strata of |
mult_action |
Character specifying how to handle cases where a subject
has more than one measurement within a specified band. Use "earliest" to
take the earliest measurement, "latest" to take the latest measurement and
"nearest" to take the measurement nearest to the value(s) specified in
|
mult_vals |
Numeric vector specifying the value in each age band to chose values closest to if subjects have more than one value per band. Required only if mult_action is "nearest". The order and length of the vector should correspond to the order and number of the bands. |
keep_vars |
Optionally, a vector of variable names within df to include within each strata created. |
new_obj |
Character specifying name for created serverside object. |
band_action |
Character specifying how the values provided in
|
conns |
DataSHIELD connections object. |
checks |
Logical; if TRUE checks are performed prior to running the function. Default is TRUE. |
df_name |
Retired argument name. Please use ‘new_obj’ instead. |
The steps here are equivalent to the following dplyr chain:
df %>% group_by(band, id) %>% arrange() %<% slice(1)
One of the complexities of this operation is how to deal with cases where subjects have multiple observations within a specified band. This is handled by first sorting the group so that the required value is first. When the data is reshaped to wide format all but the first value for subjects with multiple observations within a band are dropped.
Note that for big datasets this will take a long time to run.
Servside dataframe in wide format containing the derived variables. For each band specified at least two variables will be returned:
var_to_subset
age_var. The suffix .lower_band identifies the band for that variable.
If argument keep_vars
is not NULL, then additional variables will be
added to the data frame representing these variables within the strata
created.
Other data manipulation functions:
dh.dropCols()
,
dh.makeAgePolys()
,
dh.makeIQR()
,
dh.quartileSplit()
,
dh.renameVars()
,
dh.tidyEnv()
,
dh.zByGroup()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.