impute.subject: The function performs k-Nearest Neighbours imputation...

Description Usage Arguments Value Author(s) Examples

View source: R/imputation.wknn.mi.R

Description

This function implements an adaptive weighted k-nearest neighbours (wk-NN) imputation algorithm for clinical register data developed to explicitly handle missing values of continuous/ordinal/categorical and static/dynamic features conjointly. For each subject with missing data to be imputed, the method creates a feature vector constituted by the information collected over his/her first *window_size* time units of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An *ad hoc* similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
impute.subject(
  subject.to.impute,
  candidates,
  method = "wknn.MI",
  window_size = 3,
  t.thresh = 1,
  cont.imp.type = "w.mean",
  ord.imp.type = "w.mean",
  static.features = NULL,
  dynamic.features = NULL,
  continuous.features = NULL,
  categorical.features = NULL,
  ordinal.features = NULL,
  time.feature,
  sub.id.feature,
  make.unique.separator = ".",
  K
)

Arguments

subject.to.impute

data frame containing the visits of the subjects with missing values to be imputed.

candidates

data frame containing all the visits to be used as candidates for the imputation.

method

imputation type, to be chosen between "wknn.MI", "wknn.simple" or "knn.random". Defaults to "wknn.MI".

window_size

size of the time window to be imputed. Defaults to 3 (months).

t.thresh

time threshold parameter. Defaults to 1 (months).

cont.imp.type

imputation type for the continuous features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".

ord.imp.type

imputation type for the ordinal features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean".

static.features

list of the static feature names.

dynamic.features

list of the dynamic feature names.

continuous.features

list of the continuous feature names.

categorical.features

list of the categorical feature names.

ordinal.features

list of the ordinal feature names.

time.feature

name of the time feature

sub.id.feature

name of the subject ID feature

make.unique.separator

symbol to be used for the make unique function (must not be present in the feature names). Defaults to ".".

K

number of neighbours to use. Defaults to 15.

Value

the imputed data.frame

Author(s)

Sebastian Daberdaku

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
#' This example shows how a user can use the impute.subject() function to impute
#' the visits of a single patient by using the data from another clinical
#' register.

data(patient.data)
data(new.patient)
#' The user must define which features are static/dynamic and
#' continuous/categorical/ordinal.
static.features = c(
  "sex",
  "bmi_premorbid",
  "bmi_diagnosis",
  "fvc_diagnosis",
  "familiality",
  "genetics",
  "ftd",
  "onset_site",
  "onset_age"
)
dynamic.features = c(
  "niv",
  "peg",
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)
continuous.features = c("bmi_premorbid",
                        "bmi_diagnosis",
                        "fvc_diagnosis",
                        "onset_age")
categorical.features = c("sex",
                         "familiality",
                         "genetics",
                         "ftd",
                         "onset_site",
                         "niv",
                         "peg")
ordinal.features = c(
  "alsfrs_1",
  "alsfrs_2",
  "alsfrs_3",
  "alsfrs_4",
  "alsfrs_5",
  "alsfrs_6",
  "alsfrs_7",
  "alsfrs_8",
  "alsfrs_9",
  "alsfrs_10",
  "alsfrs_11",
  "alsfrs_12"
)

#' In what follows, the impute.subject() function is used to impute the missing
#' values in the visits of a new patient in a 3 months wide time window.
#' Please note that missing values in the visits outside of this window will not
#' be imputed.
imputed.patient.data <-
  impute.subject(
    subject.to.impute = new.patient,
    # data frame containing two visits with missing data to be imputed
    candidates = patient.data,
    # dataset of patients to be used as candiates for the wkNNMI algorithm
    window_size = 3,
    # how many months of patient data to impute
    K = 5,
    # number of neighbours to consider for the imputation
    static.features = static.features,
    dynamic.features = dynamic.features,
    continuous.features = continuous.features,
    categorical.features = categorical.features,
    ordinal.features = ordinal.features,
    time.feature = "visit_time",
    # the time feature
    sub.id.feature = "subID"
  )

wkNNMI documentation built on March 26, 2020, 6:26 p.m.