Description Usage Arguments Value Author(s) Examples
View source: R/imputation.wknn.mi.R
This function implements an adaptive weighted k-nearest neighbours (wk-NN) imputation algorithm for clinical register data developed to explicitly handle missing values of continuous/ordinal/categorical and static/dynamic features conjointly. For each subject with missing data to be imputed, the method creates a feature vector constituted by the information collected over his/her first *window_size* time units of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An *ad hoc* similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | impute.wknn(
dataset.to.impute,
window_size = 3,
t.thresh = 1,
imputation.method = "wknn.MI",
cont.imp.type = "w.mean",
ord.imp.type = "w.mean",
static.features,
dynamic.features,
continuous.features,
categorical.features,
ordinal.features,
time.feature,
sub.id.feature,
make.unique.separator = ".",
K = 15,
parallel = FALSE
)
|
dataset.to.impute |
data frame containing missing values. |
window_size |
size of the time window to be imputed. Defaults to 3 (months). |
t.thresh |
time threshold parameter. Defaults to 1 (months). |
imputation.method |
imputation type, to be chosen between "wknn.MI", "wknn.simple" or "knn.random". Defaults to "wknn.MI". |
cont.imp.type |
imputation type for the continuous features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean". |
ord.imp.type |
imputation type for the ordinal features, to be chosen between "mean", "w.mean" (weighted mean), "median" or "mode". Defaults to "w.mean". |
static.features |
list of the static feature names. |
dynamic.features |
list of the dynamic feature names. |
continuous.features |
list of the continuous feature names. |
categorical.features |
list of the categorical feature names. |
ordinal.features |
list of the ordinal feature names. |
time.feature |
name of the time feature |
sub.id.feature |
name of the subject ID feature |
make.unique.separator |
symbol to be used for the make unique function (must not be present in the feature names). Defaults to ".". |
K |
number of neighbours to use. Defaults to 15. |
parallel |
if TRUE, the iterations are performed in parallel. An appropriate parallel backed must be registered before hand, such as *doMC* or *doSNOW*. Defaults to FALSE. |
the imputed data.frame
Sebastian Daberdaku
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | #' This example shows how a user can use the impute.wknn() function to impute an
#' instance of a clinical register composed of static and dynamic, mixed-type
#' clinical data.
data(patient.data)
#' The user must define which features are static/dynamic and
#' continuous/categorical/ordinal.
static.features = c(
"sex",
"bmi_premorbid",
"bmi_diagnosis",
"fvc_diagnosis",
"familiality",
"genetics",
"ftd",
"onset_site",
"onset_age"
)
dynamic.features = c(
"niv",
"peg",
"alsfrs_1",
"alsfrs_2",
"alsfrs_3",
"alsfrs_4",
"alsfrs_5",
"alsfrs_6",
"alsfrs_7",
"alsfrs_8",
"alsfrs_9",
"alsfrs_10",
"alsfrs_11",
"alsfrs_12"
)
continuous.features = c("bmi_premorbid",
"bmi_diagnosis",
"fvc_diagnosis",
"onset_age")
categorical.features = c("sex",
"familiality",
"genetics",
"ftd",
"onset_site",
"niv",
"peg")
ordinal.features = c(
"alsfrs_1",
"alsfrs_2",
"alsfrs_3",
"alsfrs_4",
"alsfrs_5",
"alsfrs_6",
"alsfrs_7",
"alsfrs_8",
"alsfrs_9",
"alsfrs_10",
"alsfrs_11",
"alsfrs_12"
)
#' In what follows, the impute.wknn() function is used to impute the missing
#' values in the patient.data dataset in a 3 months wide time window.
#' Please note that missing values in the visits outside of this window will not
#' be imputed.
imputed.patient.data <-
impute.wknn(
dataset.to.impute = patient.data,
# dataset to impute
window_size = 3,
# how many months of patient data to impute
K = 5,
# number of neighbours to consider for the imputation
static.features = static.features,
dynamic.features = dynamic.features,
continuous.features = continuous.features,
categorical.features = categorical.features,
ordinal.features = ordinal.features,
time.feature = "visit_time",
# the time feature
sub.id.feature = "subID",
parallel = FALSE
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.