pre_impute_knn: Nearest neighbors imputation

Description Usage Arguments Details Author(s) Examples

View source: R/imputation.r

Description

Nearest neighbor methods needs to have a distance matrix of the dataset it works on. When doing repeated model fittings on subsets of the entire dataset it is unnecessary to recalculate it every time, therefore this function requires the user to manually calculate it prior to resampling and supply it in a wrapper function.

Usage

1
pre_impute_knn(data, k = 0.05, distance_matrix)

Arguments

data

Fitting and testing data sets, as returned by pre_split.

k

Number of nearest neighbors to calculate mean from. Set to < 1 to specify a fraction.

distance_matrix

A matrix, dist object or "auto". Notice that "auto" will recalculate the distance matrix in each fold, which is only meaningful in case the features of x vary between folds. Otherwise you are just wasting time.

Details

Features with fewer than k non-missing values will be removed automatically.

Author(s)

Christofer Bäcklin

Examples

1
2
3
4
5
6
7
8
x <- iris[-5]
x[sample(nrow(x), 30), 3] <- NA
my.dist <- dist(x)
evaluate(modeling_procedure("lda"), x = x, y = iris$Species,
    pre_process = function(...){
        pre_split(...) %>% pre_impute_knn(k = 4, distance_matrix = my.dist)
    }
)

emil documentation built on Aug. 1, 2018, 1:03 a.m.

Related to pre_impute_knn in emil...