classiKernel: Create a kernel estimator for functional data classification
In maierhofert/classiFunc: Classification of Functional Data

Description Usage Arguments Value References See Also Examples

Creates an efficient kernel estimator for functional data classification. Currently supported distance measures are all metrics implemented in dist and all semimetrics suggested in Fuchs et al. (2015). Additionally, all (semi-)metrics can be used on a derivative of arbitrary order of the functional observations. For kernel functions all kernels implemented in fda.usc are available as well as custom kernel functions.

classiKernel(classes, fdata, grid = 1:ncol(fdata), h = 1, metric = "L2",
  ker = "Ker.norm", nderiv = 0L, derived = FALSE,
  deriv.method = "base.diff", custom.metric = function(x, y, ...) {    
  return(sqrt(sum((x - y)^2))) }, custom.ker = function(u) {    
  return(dnorm(u)) }, ...)

`classes`	[`factor(nrow(fdata))`] factor of length `nrow(fdata)` containing the classes of the observations.
`fdata`	[`matrix`] matrix containing the functional observations as rows.
`grid`	[`numeric(ncol(fdata))`] numeric vector of length `ncol(fdata)` containing the grid on which the functional observations were evaluated.
`h`	[numeric(1)] controls the bandwidth of the kernel function. All kernel functions `ker` should be implemented to have bandwidth = 1. The bandwidth is controlled via `h` by using `K(x) = ker(x/h)` as the kernel function.
`metric`	[`character(1)`] character string specifying the (semi-)metric to be used. For a an overview of what is available see the `method` argument in `computeDistMat`. For a full list execute `metricChoices()`.
`ker`	[numeric(1)] character string describing the kernel function to use. Available are amongst others all kernel functions from `Kernel`. For the full list execute `kerChoices()`. The usage of customized kernel function is symbolized by `ker = "custom.ker"`. The customized function can be specified in `custom.ker`
`nderiv`	[`integer(1)`] order of derivation on which the metric shall be computed. The default is 0L.
`derived`	[`logical(1)`] Is the data given in `fdata` already derived? Default is set to `FALSE`, which will lead to numerical derivation if `nderiv >= 1L` by applying `deriv.fd` on a `Data2fd` representation of `fdata`.
`deriv.method`	[`character(1)`] character indicate which method should be used for derivation. Currently implemented are `"base.diff"`, the default, and `"fda.deriv.fd"`. `"base.diff"` uses the method `base::diff` for equidistant measures without missing values, which is faster than transforming the data into the class `fd` and deriving this using `fda::deriv.fd`. The second variant implies smoothing, which can be preferable for calculating high order derivatives.
`custom.metric`	[`function(x, y, ...)`] only used if `deriv.method = "custom.method"`. A function of functional observations `x` and `y` returning their distance. The default is the L2 distance. See how to implement your distance function in `dist`.
`custom.ker`	[function(u)] customized kernel function. This has to be a function with exactly one parameter `u`, returning the numeric value of the kernel function `ker(u)`. This function is only used if `ker == "custom.ker"`. The bandwidth should be constantly equal to 1 and is controlled via `h`.
`...`	further arguments to and from other methods. Hand over additional arguments to `computeDistMat`, usually additional arguments for the specified (semi-)metric. Also, if `deriv.method == "fda.deriv.fd"` or `fdata` is not observed on a regular grid, additional arguments to `fdataTransform` can be specified which will be passed on to `Data2fd`.

classiKernel returns an object of class 'classiKernel'. This is a list containing at least the following components:

classes: a factor of length nrow(fdata) coding the response of the training data set.
fdata: the raw functional data as a matrix with the individual observations as rows.
proc.fdata: the preprocessed data (missing values interpolated, derived and evenly spaced). This data is this.fdataTransform(fdata). See this.fdataTransform for more details.
grid: numeric vector containing the grid on which fdata is observed)
h: numeric value giving the bandwidth to be used in the kernel function.
ker: character encoding the kernel function to use.
metric: character string coding the distance metric to be used in computeDistMat.
nderiv: integer giving the order of derivation that is applied to fdata before computing the distances between the observations.
this.fdataTransform: preprocessing function taking new data as a matrix. It is used to transform fdata into proc.fdata and is required to preprocess new data in order to predict it. This function ensures, that preprocessing (derivation, respacing and interpolation of missing values) is done in the exact same way for the original training data set and future (test) data sets.
call: the original function call.

Fuchs, K., J. Gertheiss, and G. Tutz (2015): Nearest neighbor ensembles for functional data with interpretable feature selection. Chemometrics and Intelligent Laboratory Systems 146, 186 - 197.

predict.classiKernel

# How to implement your own kernel function
data("ArrowHead")
classes = ArrowHead[,"target"]

set.seed(123)
train_inds = sample(1:nrow(ArrowHead), size = 0.8 * nrow(ArrowHead), replace = FALSE)
test_inds = (1:nrow(ArrowHead))[!(1:nrow(ArrowHead)) %in% train_inds]

ArrowHead = ArrowHead[,!colnames(ArrowHead) == "target"]

# custom kernel
myTriangularKernel = function(u) {
  return((1 - abs(u)) * (abs(u) < 1))
}

# create the model
mod1 = classiKernel(classes = classes[train_inds], fdata = ArrowHead[train_inds,],
                    ker = "custom.ker", h = 2, custom.ker = myTriangularKernel)

# calculate the model predictions
pred1 = predict(mod1, newdata = ArrowHead[test_inds,], predict.type = "response")

# prediction accuracy
mean(pred1 == classes[test_inds])

# create another model using an existing kernel function
mod2 = classiKernel(classes = classes[train_inds], fdata = ArrowHead[train_inds,],
                    ker = "Ker.tri", h = 2)

# calculate the model predictions
pred2 = predict(mod1, newdata = ArrowHead[test_inds,], predict.type = "response")

# prediction accuracy
mean(pred2 == classes[test_inds])
## Not run: 
# Parallelize across 2 CPU's
library(parallelMap)
parallelStartSocket(2L) # parallelStartMulticore for Linux
predict(mod1, newdata =  fdata[test_inds,], predict.type = "prob", parallel = TRUE, batches = 2L)
parallelStop()

## End(Not run)