Description Usage Arguments Details Value Automated Covariate Selection Author(s) References Examples
get_candidate_covariates
function generates the list of candidate empirical covariates based on their prevalence
within each domains (dimensions). This is the first step in the automated covariate selection process. See 'Automated Covariate Selection'
section below for more details regarding the overall process.
1 2 3 4 5 6 7 8 9 | get_candidate_covariates(
df,
domainVarname,
eventCodeVarname,
patientIdVarname,
patientIdVector,
n = 200,
min_num_patients = 100
)
|
df |
The input |
domainVarname |
The variable(field) name which contains the domain of the covariate in the |
eventCodeVarname |
The variable name which contains the covariate codes (eg:- CCS, ICD9) in the |
patientIdVarname |
The variable name which contains the patient identifier in the |
patientIdVector |
The 1-D vector with all the patient identifiers. The length of this vector should be equal to
the number of distinct patients in the |
n |
The maximum number of empirical candidate baseline covariates that should be returned within each domain. By default, n is 200 |
min_num_patients |
Minimum number of patients that should be present for each covariate to be selected for selection.
To be considered for selection, a covariate should have occurred for a minimum |
The theoretical details of the high-dimensional propensity score (HDPS) algorithm is detailed in the publication listed below in the References
section.
get_candidate_covariates
is the function implementing what is described in the 'Identify candidate empirical covariates' section
of the article.
A named list containing three R objects
covars
A 1-D vector containing the names of selected baseline covariate names from each domain.
For each domain in the df
, the number of covars
would be equal to or less than n
covars_data
The data.frame
that is filtered out of df
with only the selected covars
. The values of the
eventCodeVarname
field is prefixed with the corresponding domain
name. For example, if the event code is 19900 and the domain
is 'dx', then the the covariate name will be 'dx_19900'.
patientIds
The list of patient ids present in the original input df
. This is exactly the same as the input patientIdVector
The three steps in automated covariate selection are listed below with the functions implementing the methodology
Identify candidate empirical covariates: get_candidate_covariates
Assess recurrence: get_recurrence_covariates
Prioritize covariates: get_prioritised_covariates
Dennis Robert dennis.robert.nm@gmail.com
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512-522. doi:10.1097/EDE.0b013e3181a663cc
1 2 3 4 5 6 7 8 9 10 11 | library("autoCovariateSelection")
data(rwd)
head(rwd, 3)
#select distinct elements that are unique for each patient - treatment and outcome
basetable <- rwd %>% select(person_id, treatment, outcome_date) %>% distinct()
head(basetable, 3)
patientIds <- basetable$person_id
step1 <- get_candidate_covariates(df = rwd, domainVarname = "domain",
eventCodeVarname = "event_code", patientIdVarname = "person_id",
patientIdVector = patientIds,n = 100, min_num_patients = 10)
out1 <- step1$covars_data #this will be input to get_recurrence_covariates() function
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.