Description Usage Arguments Value Author(s) Examples
View source: R/imputation.wknn.mi.R
This function implements an adaptive weighted k-nearest neighbours (wk-NN) imputation algorithm for clinical register data developed to explicitly handle missing values of continuous/ordinal/categorical and static/dynamic features conjointly. For each subject with missing data to be imputed, the method creates a feature vector constituted by the information collected over his/her first *window_size* time units of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An *ad hoc* similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | impute.subject(
subject.to.impute,
candidates,
method = "wknn.MI",
window_size = 3,
t.thresh = 1,
cont.imp.type = "w.mean",
ord.imp.type = "w.mean",
static.features = NULL,
dynamic.features = NULL,
continuous.features = NULL,
categorical.features = NULL,
ordinal.features = NULL,
time.feature,
sub.id.feature,
make.unique.separator = ".",
K
)
|
subject.to.impute |
data frame containing the visits of the subjects with missing values to be imputed. |
candidates |
data frame containing all the visits to be used as candidates for the imputation. |
method |
imputation type, to be chosen between "wknn.MI", "wknn.simple" or "knn.random". Defaults to "wknn.MI". |
window_size |
size of the time window to be imputed. Defaults to 3 (months). |
t.thresh |
time threshold parameter. Defaults to 1 (months). |
cont.imp.type |
imputation type for the continuous features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean". |
ord.imp.type |
imputation type for the ordinal features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean". |
static.features |
list of the static feature names. |
dynamic.features |
list of the dynamic feature names. |
continuous.features |
list of the continuous feature names. |
categorical.features |
list of the categorical feature names. |
ordinal.features |
list of the ordinal feature names. |
time.feature |
name of the time feature |
sub.id.feature |
name of the subject ID feature |
make.unique.separator |
symbol to be used for the make unique function (must not be present in the feature names). Defaults to ".". |
K |
number of neighbours to use. Defaults to 15. |
the imputed data.frame
Sebastian Daberdaku
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | #' This example shows how a user can use the impute.subject() function to impute
#' the visits of a single patient by using the data from another clinical
#' register.
data(patient.data)
data(new.patient)
#' The user must define which features are static/dynamic and
#' continuous/categorical/ordinal.
static.features = c(
"sex",
"bmi_premorbid",
"bmi_diagnosis",
"fvc_diagnosis",
"familiality",
"genetics",
"ftd",
"onset_site",
"onset_age"
)
dynamic.features = c(
"niv",
"peg",
"alsfrs_1",
"alsfrs_2",
"alsfrs_3",
"alsfrs_4",
"alsfrs_5",
"alsfrs_6",
"alsfrs_7",
"alsfrs_8",
"alsfrs_9",
"alsfrs_10",
"alsfrs_11",
"alsfrs_12"
)
continuous.features = c("bmi_premorbid",
"bmi_diagnosis",
"fvc_diagnosis",
"onset_age")
categorical.features = c("sex",
"familiality",
"genetics",
"ftd",
"onset_site",
"niv",
"peg")
ordinal.features = c(
"alsfrs_1",
"alsfrs_2",
"alsfrs_3",
"alsfrs_4",
"alsfrs_5",
"alsfrs_6",
"alsfrs_7",
"alsfrs_8",
"alsfrs_9",
"alsfrs_10",
"alsfrs_11",
"alsfrs_12"
)
#' In what follows, the impute.subject() function is used to impute the missing
#' values in the visits of a new patient in a 3 months wide time window.
#' Please note that missing values in the visits outside of this window will not
#' be imputed.
imputed.patient.data <-
impute.subject(
subject.to.impute = new.patient,
# data frame containing two visits with missing data to be imputed
candidates = patient.data,
# dataset of patients to be used as candiates for the wkNNMI algorithm
window_size = 3,
# how many months of patient data to impute
K = 5,
# number of neighbours to consider for the imputation
static.features = static.features,
dynamic.features = dynamic.features,
continuous.features = continuous.features,
categorical.features = categorical.features,
ordinal.features = ordinal.features,
time.feature = "visit_time",
# the time feature
sub.id.feature = "subID"
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.