impute.wknn: The function performs k-Nearest Neighbours imputation...
In wkNNMI: A Mutual Information-Weighted k-NN Imputation Algorithm

Description Usage Arguments Value Author(s) Examples

View source: R/imputation.wknn.mi.R

This function implements an adaptive weighted k-nearest neighbours (wk-NN) imputation algorithm for clinical register data developed to explicitly handle missing values of continuous/ordinal/categorical and static/dynamic features conjointly. For each subject with missing data to be imputed, the method creates a feature vector constituted by the information collected over his/her first *window_size* time units of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An *ad hoc* similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features.

impute.wknn(
  dataset.to.impute,
  window_size = 3,
  t.thresh = 1,
  imputation.method = "wknn.MI",
  cont.imp.type = "w.mean",
  ord.imp.type = "w.mean",
  static.features,
  dynamic.features,
  continuous.features,
  categorical.features,
  ordinal.features,
  time.feature,
  sub.id.feature,
  make.unique.separator = ".",
  K = 15,
  parallel = FALSE
)

`dataset.to.impute`	data frame containing missing values.
`window_size`	size of the time window to be imputed. Defaults to 3 (months).
`t.thresh`	time threshold parameter. Defaults to 1 (months).
`imputation.method`	imputation type, to be chosen between "wknn.MI", "wknn.simple" or "knn.random". Defaults to "wknn.MI".
`cont.imp.type`	imputation type for the continuous features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".
`ord.imp.type`	imputation type for the ordinal features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".
`static.features`	list of the static feature names.
`dynamic.features`	list of the dynamic feature names.
`continuous.features`	list of the continuous feature names.
`categorical.features`	list of the categorical feature names.
`ordinal.features`	list of the ordinal feature names.
`time.feature`	name of the time feature
`sub.id.feature`	name of the subject ID feature
`make.unique.separator`	symbol to be used for the make unique function (must not be present in the feature names). Defaults to ".".
`K`	number of neighbours to use. Defaults to 15.
`parallel`	if TRUE, the iterations are performed in parallel. An appropriate parallel backed must be registered before hand, such as doMC or doSNOW. Defaults to FALSE.

the imputed data.frame

Sebastian Daberdaku

#' This example shows how a user can use the impute.wknn() function to impute an
#' instance of a clinical register composed of static and dynamic, mixed-type
#' clinical data.

data(patient.data)
#' The user must define which features are static/dynamic and
#' continuous/categorical/ordinal.
static.features = c(
  "sex",
  "bmi_premorbid",
  "bmi_diagnosis",
  "fvc_diagnosis",
  "familiality",
  "genetics",
  "ftd",
  "onset_site",
  "onset_age"
)
dynamic.features = c(
  "niv",
  "peg",
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)
continuous.features = c("bmi_premorbid",
                        "bmi_diagnosis",
                        "fvc_diagnosis",
                        "onset_age")
categorical.features = c("sex",
                         "familiality",
                         "genetics",
                         "ftd",
                         "onset_site",
                         "niv",
                         "peg")
ordinal.features = c(
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)

#' In what follows, the impute.wknn() function is used to impute the missing
#' values in the patient.data dataset in a 3 months wide time window.
#' Please note that missing values in the visits outside of this window will not
#' be imputed.
imputed.patient.data <-
  impute.wknn(
    dataset.to.impute = patient.data,
    # dataset to impute
    window_size = 3,
    # how many months of patient data to impute
    K = 5,
    # number of neighbours to consider for the imputation
    static.features = static.features,
    dynamic.features = dynamic.features,
    continuous.features = continuous.features,
    categorical.features = categorical.features,
    ordinal.features = ordinal.features,
    time.feature = "visit_time",
    # the time feature
    sub.id.feature = "subID",
    parallel = FALSE
  )