clean: Dataframe cleaning for missing data handling
In missCompare: Intuitive Missing Data Imputation Framework

Description Usage Arguments Details Value Examples

clean helps in the conversion of missing values, variable types and removes rows and columns above pre-specified missingness

clean(
  X,
  var_remove = NULL,
  var_removal_threshold = 0.5,
  ind_removal_threshold = 1,
  missingness_coding = NA
)

`X`	Original dataframe with samples in rows and variables as columns
`var_remove`	Variables to remove (e.g. ID). Define by character vector, e.g. c('ID', 'character_variable')
`var_removal_threshold`	Variable removal threshold with default 0.5 (range between 0 and 1). Variables (columns) above this missingness fraction will be removed during the cleaning process
`ind_removal_threshold`	Individual removal threshold with default 1 (range between 0 and 1). Individuals (rows) above this missingness fraction will be removed during the cleaning process
`missingness_coding`	Non NA coding in original dataframe that should be changed to NA (e.g. -9). Can take a single value (define by: missingness_coding = -9) or multiple values (define by: missingness_coding = c(-9, -99, -999))

For better imputation performance, a clean, filtered dataframe is needed. Variables and samples with very high missingness fractions will negatively impact most missing data imputation algorithms. This function cleans the original dataframe by removing rows (samples) and columns (variables) above pre-specified missingness thresholds. The function will also convert any prespecified, strangely coded missing data to NAs. Note that all factor variables will be converted or coerced to numeric variables.

Clean dataset with NAs as missing values and rows/columns above the pre-specified missingness thresholds removed

# basic settings
cleaned <- clean(clindata_miss, missingness_coding = -9)

# setting very conservative removal thresholds
cleaned <- clean(clindata_miss,
                 var_removal_threshold = 0.10,
                 ind_removal_threshold = 0.9,
                 missingness_coding = -9)