post_imp_diag: Post imputation diagnostics
In missCompare: Intuitive Missing Data Imputation Framework

Description Usage Arguments Details Value Examples

post_imp_diag serves as post imputation diagnostics. The function compares the original dataset (with missing data) with the imputed dataset. The function outputs statistics and visualizations that will help the user compare the original and the imputed datasets.

1	post_imp_diag(X_orig, X_imp, scale = TRUE, n.boot = 100)

`X_orig`	Dataframe - the original data that contains missing values.
`X_imp`	Dataframe - the imputed data with no missing values.
`scale`	Boolean with default TRUE. Scaling will scale and center all variables to mean = 0 and standard deviation = 1 in the original dataframe with missingness. The user should select TRUE or FALSE here depending on whether the imputed dataframe has scaled or unscaled values (which is controlled by the scale argument in `impute_data`. Factor variables will not be scaled.
`n.boot`	Number of bootstrap iterations to generate mean pairwise Pearson correlation coefficients and 95% confidence intervals for variable pairs from the original and the imputed dataframes.

This function uses the original dataframe and produces plots that allows the user to compare the distributions of the original values and the imputed values for each numeric variables. If there are factors present in the dataframes, the function will recognize this and create bar charts for these. In addition, the function will calculate bootstrapped pairwise Pearson correlation coefficients between numeric variables in the original dataframe (with missingness) and the imputed dataframe and plot these for the user to assess whether the imputation distorted the original data structure or not. The function will also visualize variable clusters in the original dataframe and the imputed one. Should the imputation algorithm perform well, the variable distributions and the variable clusters should be similar.

`Histograms`	List of histograms of all numeric variables. The histograms show the original values and the imputed values overlaid for each variables in the dataframe
`Boxplots`	List of boxplots of all numeric variables. The boxplots show the original values and the imputed values for each variables in the dataframe. As normally, the boxplots show the median values, the IQR and the range of values
`Barcharts`	List of bar charts of all categorical (factor) variables. The bar charts show the original categories and the imputed categories for each categorical variables in the dataframe. Bar charts will only be output if scale is set to FALSE and both the original and imputed data contain the same factor variables
`Statistics`	List of output statistics for all variables. A named vector containing means and standard deviations of the original and imputed values, P value from Welch's t test and D test statistic from a Kolmogorov–Smirnov test comparing the original and the imputed values by variable
`Variable_clusters_orig`	Variable clusters based on the original dataframe (with missingness). Regardless of the argument scale being set to TRUE or FALSE, the clusters are assessed based on normalized data
`Variable_clusters_imp`	Variable clusters based on the imputed dataframe. Regardless of the argument scale being set to TRUE or FALSE, the clusters are assessed based on normalized data
`Correlation_stats`	Mean pairwise Pearson's correlation coefficients and 95% confidence intervals from the original dataframe (with missingness) and the imputed dataframe
`Correlation_plot`	Scatter plot of mean pairwise Pearson's correlation coefficients from the original dataframe (with missingness) and the imputed dataframe. The blue line represents a line with slope 1 and intercept 0. The red line is a fitted line of the correlation coefficient pairs. The error bars around the points represent the individual 95% confidence intervals drawn from bootstrapping the correlation coefficients

# diagnostics <- post_imp_diag(X_orig = df_miss, X_imp = df_imputed, scale=TRUE)
# diagnostics$Histograms$variable_X
# diagnostics$Boxplots$variable_Z
# diagnostics$Statistics$variable_Y

missCompare documentation built on Dec. 1, 2020, 9:09 a.m.