Description Usage Arguments Details Value Examples
get_data
extracts descriptive metadata from the dataframe including information on missing data
1 |
X |
Original dataframe with samples in rows and variables as columns. Can also use the resulting object from the |
matrixplot_sort |
Boolean with default TRUE. If TRUE, the matrix plot will be sorted by missing/non-missing status. If FALSE, the original order of rows will be retained |
plot_transform |
Boolean with default TRUE. If TRUE, the matrix plot will plot all variables scaled (mean = 0, SD = 1). If FALSE, the matrix plot will show the variables on their original scale |
This function uses the original dataframe and extracts descriptive metadata including dimensions, missingness fractions overall and by variable, number of missing values overall and by variable, missing data patterns, missing data correlations and missing data visualizations
Complete_cases |
Number of complete cases (samples with no missing data in any columns) |
Rows |
Total number of rows (samples) in the dataframe |
Columns |
Total number of columns (variables) in the dataframe |
Corr_matrix |
Correlation matrix of all variables. The correlation matrix contains Pearson correlation coefficients based on pairwise correlations between variable pairs |
Fraction_missingness |
Total fraction of missingness expressed as a number between 0 and 1, where 1 means 100% of data is missing and 0 means there are no missing values |
Fraction_missingness_per_variable |
Fraction of missingness per variable. A (named) numeric vector of length the number of columns. Each variable missingness values are expressed as numbers between 0 and 1, where 1 means 100% of data is missing and 0 means there are no missing values |
Total_NA |
Total number of missing values in the dataframe |
NA_per_variable |
Number of missing values per variables in the dataframe. A (named) numeric vector of length the number of columns |
MD_Pattern |
Missing data pattern calculated using mice::md_pattern (see |
NA_Correlations |
Correlation matrix of variables vs. variables converted to boolean based on missingness status (yes/no). Point-biserial correlation coefficients based on variable pairs is obtained using complete observations in the respective variable pairs. Higher correlation coefficients can indicate MAR missingness pattern |
NA_Correlation_plot |
Plot based on NA_Correlations |
min_PDM_thresholds |
Small dataframe offering clues on how to set min_PDM thresholds in the next steps of the pipeline. The first column represents min_PDM thresholds, while the second column represents percentages that would be retained by setting min_PDM to the respective values. These values are the percentages of the total rows with any number of missing data (excluding complete observations), so a value of e.g. 80% would mean that 80% of rows with missing data with the most common patterns are represented in the simulation step |
Vars_above_half |
Character vector of variables names with missingness higher than 50% |
Matrix_plot |
Matrix plot where missing values are colored gray and available values are colored based on value range |
Cluster_plot |
Cluster plot of co-missingness. Variables demonstrating shared missingness patterns will branch at closer to the bottom of the plot, while no patterns will be represented by branches high in the plot |
1 2 3 | cleaned <- clean(clindata_miss, missingness_coding = -9)
metadata <- get_data(cleaned)
metadata <- get_data(cleaned, matrixplot_sort = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.