impute.subject: The function performs k-Nearest Neighbours imputation...
In wkNNMI: A Mutual Information-Weighted k-NN Imputation Algorithm

Description Usage Arguments Value Author(s) Examples

View source: R/imputation.wknn.mi.R

This function implements an adaptive weighted k-nearest neighbours (wk-NN) imputation algorithm for clinical register data developed to explicitly handle missing values of continuous/ordinal/categorical and static/dynamic features conjointly. For each subject with missing data to be imputed, the method creates a feature vector constituted by the information collected over his/her first *window_size* time units of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An *ad hoc* similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features.

impute.subject(
  subject.to.impute,
  candidates,
  method = "wknn.MI",
  window_size = 3,
  t.thresh = 1,
  cont.imp.type = "w.mean",
  ord.imp.type = "w.mean",
  static.features = NULL,
  dynamic.features = NULL,
  continuous.features = NULL,
  categorical.features = NULL,
  ordinal.features = NULL,
  time.feature,
  sub.id.feature,
  make.unique.separator = ".",
  K
)

`subject.to.impute`	data frame containing the visits of the subjects with missing values to be imputed.
`candidates`	data frame containing all the visits to be used as candidates for the imputation.
`method`	imputation type, to be chosen between "wknn.MI", "wknn.simple" or "knn.random". Defaults to "wknn.MI".
`window_size`	size of the time window to be imputed. Defaults to 3 (months).
`t.thresh`	time threshold parameter. Defaults to 1 (months).
`cont.imp.type`	imputation type for the continuous features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".
`ord.imp.type`	imputation type for the ordinal features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".
`static.features`	list of the static feature names.
`dynamic.features`	list of the dynamic feature names.
`continuous.features`	list of the continuous feature names.
`categorical.features`	list of the categorical feature names.
`ordinal.features`	list of the ordinal feature names.
`time.feature`	name of the time feature
`sub.id.feature`	name of the subject ID feature
`make.unique.separator`	symbol to be used for the make unique function (must not be present in the feature names). Defaults to ".".
`K`	number of neighbours to use. Defaults to 15.

the imputed data.frame

Sebastian Daberdaku

#' This example shows how a user can use the impute.subject() function to impute
#' the visits of a single patient by using the data from another clinical
#' register.

data(patient.data)
data(new.patient)
#' The user must define which features are static/dynamic and
#' continuous/categorical/ordinal.
static.features = c(
  "sex",
  "bmi_premorbid",
  "bmi_diagnosis",
  "fvc_diagnosis",
  "familiality",
  "genetics",
  "ftd",
  "onset_site",
  "onset_age"
)
dynamic.features = c(
  "niv",
  "peg",
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)
continuous.features = c("bmi_premorbid",
                        "bmi_diagnosis",
                        "fvc_diagnosis",
                        "onset_age")
categorical.features = c("sex",
                         "familiality",
                         "genetics",
                         "ftd",
                         "onset_site",
                         "niv",
                         "peg")
ordinal.features = c(
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)

#' In what follows, the impute.subject() function is used to impute the missing
#' values in the visits of a new patient in a 3 months wide time window.
#' Please note that missing values in the visits outside of this window will not
#' be imputed.
imputed.patient.data <-
  impute.subject(
    subject.to.impute = new.patient,
    # data frame containing two visits with missing data to be imputed
    candidates = patient.data,
    # dataset of patients to be used as candiates for the wkNNMI algorithm
    window_size = 3,
    # how many months of patient data to impute
    K = 5,
    # number of neighbours to consider for the imputation
    static.features = static.features,
    dynamic.features = dynamic.features,
    continuous.features = continuous.features,
    categorical.features = categorical.features,
    ordinal.features = ordinal.features,
    time.feature = "visit_time",
    # the time feature
    sub.id.feature = "subID"
  )