classiKnn: Create a knn estimator for functional data classification.
In maierhofert/classiFunc: Classification of Functional Data

Description Usage Arguments Value References See Also Examples

Creates an efficient k nearest neighbor estimator for functional data classification. Currently supported distance measures are all metrics implemented in dist and all semimetrics suggested in Fuchs et al. (2015). Additionally, all (semi-)metrics can be used on an arbitrary order of derivation.

classiKnn(classes, fdata, grid = 1:ncol(fdata), knn = 1L, metric = "L2",
  nderiv = 0L, derived = FALSE, deriv.method = "base.diff",
  custom.metric = function(x, y, ...) {     return(sqrt(sum((x - y)^2))) },
  ...)

`classes`	[`factor(nrow(fdata))`] factor of length `nrow(fdata)` containing the classes of the observations.
`fdata`	[`matrix`] matrix containing the functional observations as rows.
`grid`	[`numeric(ncol(fdata))`] numeric vector of length `ncol(fdata)` containing the grid on which the functional observations were evaluated.
`knn`	[`integer(1)`] number of nearest neighbors to use in the k nearest neighbor algorithm.
`metric`	[`character(1)`] character string specifying the (semi-)metric to be used. For a an overview of what is available see the `method` argument in `computeDistMat`. For a full list execute `metricChoices()`.
`nderiv`	[`integer(1)`] order of derivation on which the metric shall be computed. The default is 0L.
`derived`	[`logical(1)`] Is the data given in `fdata` already derived? Default is set to `FALSE`, which will lead to numerical derivation if `nderiv >= 1L` by applying `deriv.fd` on a `Data2fd` representation of `fdata`.
`deriv.method`	[`character(1)`] character indicate which method should be used for derivation. Currently implemented are `"base.diff"`, the default, and `"fda.deriv.fd"`. `"base.diff"` uses the method `base::diff` for equidistant measures without missing values, which is faster than transforming the data into the class `fd` and deriving this using `fda::deriv.fd`. The second variant implies smoothing, which can be preferable for calculating high order derivatives.
`custom.metric`	[`function(x, y, ...)`] only used if `deriv.method = "custom.method"`. A function of functional observations `x` and `y` returning their distance. The default is the L2 distance. See how to implement your distance function in `dist`.
`...`	further arguments to and from other methods. Hand over additional arguments to `computeDistMat`, usually additional arguments for the specified (semi-)metric. Also, if `deriv.method == "fda.deriv.fd"` or `fdata` is not observed on a regular grid, additional arguments to `fdataTransform` can be specified which will be passed on to `Data2fd`.

classiKnn returns an object of class "classiKnn". This is a list containing at least the following components:

call: the original function call.
classes: a factor of length nrow(fdata) coding the response of the training data set.
fdata: the raw functional data as a matrix with the individual observations as rows.
grid: numeric vector containing the grid on which fdata is observed)
proc.fdata: the preprocessed data (missing values interpolated, derived and evenly spaced). This data is this.fdataTransform(fdata). See this.fdataTransform for more details.
knn: integer coding the number of nearest neighbors used in the k nearest neighbor classification algorithm.
metric: character string coding the distance metric to be used in computeDistMat.
nderiv: integer giving the order of derivation that is applied to fdata before computing the distances between the observations.
this.fdataTransform: preprocessing function taking new data as a matrix. It is used to transform fdata into proc.fdata and is required to preprocess new data in order to predict it. This function ensures, that preprocessing (derivation, respacing and interpolation of missing values) is done in the exact same way for the original training data set and future (test) data sets.

Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.

predict.classiKnn

# Classification of the Phoneme data
data(Phoneme)
classes = Phoneme[,"target"]

set.seed(123)
# Use 80% of data as training set and 20% as test set
train_inds = sample(1:nrow(Phoneme), size = 0.8 * nrow(Phoneme), replace = FALSE)
test_inds = (1:nrow(Phoneme))[!(1:nrow(Phoneme)) %in% train_inds]

# create functional data as matrix with observations as rows
fdata = Phoneme[,!colnames(Phoneme) == "target"]

# create k = 3 nearest neighbors classifier with L2 distance (default) of the
# first order derivative of the data
mod = classiKnn(classes = classes[train_inds], fdata = fdata[train_inds,],
                 nderiv = 1L, knn = 3L)

# predict the model for the test set
pred = predict(mod, newdata =  fdata[test_inds,], predict.type = "prob")

## Not run: 
# Parallelize across 2 CPU's
library(parallelMap)
parallelStartSocket(cpus = 2L) # parallelStartMulticore(cpus = 2L) for Linux
predict(mod, newdata =  fdata[test_inds,], predict.type = "prob", parallel = TRUE, batches = 2L)
parallelStop()

## End(Not run)