View source: R/smdi_diagnose.R
smdi_diagnose | R Documentation |
This function bundles and calls all three group diagnostics and returns the most important summary metrics. For more information and details, please refer to the individual functions.
Important: don't include variables like ID variables, ZIP codes, dates, etc.
smdi_diagnose(
data = NULL,
covar = NULL,
median = TRUE,
includeNA = FALSE,
train_test_ratio = c(0.7, 0.3),
set_seed = 42,
ntree = 1000,
n_cores = 1,
model = c("logistic", "linear", "cox"),
form_lhs = NULL,
exponentiated = FALSE
)
data |
dataframe or tibble object with partially observed/missing variables |
covar |
character covariate or covariate vector with partially observed variable/column name(s) to investigate. If NULL, the function automatically includes all columns with at least one missing observation and all remaining covariates will be used as predictors |
median |
logical if the median (= TRUE; recommended default) or mean of all absolute standardized mean differences (asmd) should be computed (smdi_asmd()) |
includeNA |
logical, should missingness of other partially observed covariates be explicitly modeled for computation of absolute standardized mean differences (default is FALSE) |
train_test_ratio |
numeric vector to indicate the test/train split ratio for random forest missingness prediction model, e.g. c(.7, .3) is the default |
set_seed |
seed for reproducibility of random forest missingness prediction model, defaults to 42 |
ntree |
integer, number of trees for random forest missingness prediction model (defaults to 1000 trees) |
n_cores |
integer, if >1, computations will be parallelized across amount of cores specified in n_cores (only UNIX systems) |
model |
character describing which outcome model to fit to assess the association between covar missingness indicator and outcome. Currently supported are models of type logistic, linear and cox (see smdi_outcome) |
form_lhs |
string specifying the left-hand side of the outcome formula (see smdi_outcome) |
exponentiated |
logical, should results of outcome regression to assess association between missingness and outcome be exponentiated (default is FALSE) |
Wrapper for individual diagnostics function.
smdi object including a summary table of all three smdi group diagnostics:
Group 1 diagnostic:
asmd_mean
or asmd_median
: average/median absolute standardized mean difference (and min, max) of patient characteristics between those without (1) and with (0) observed covariate
hotteling_p: p-value of hotelling test. Rejecting the H0 means that Hotelling's test detects a significant difference in the distribution between patients without (1) and with (0) the observed covariate
Group 2 diagnostic:
rf_auc
: The area under the receiver operating curve (AUC) as a measure of the ability to predict the missingness of the partially observed covariate
Group 3 diagnostic:
estimate_univariate
: univariate association between missingness indicator of covar and outcome
estimate_adjusted
: association between missingness indicator of covar and outcome conditional on other fully observed covariates and missing indicator variables of other partially observed covariates
TBD
smdi_asmd
smdi_hotelling
smdi_little
smdi_rf
smdi_outcome
library(smdi)
smdi_diagnose(
data = smdi_data,
covar = "egfr_cat",
model = "cox",
form_lhs = "Surv(eventtime, status)"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.