kernel_pca: Kernel Principal Components Analysis

View source: R/kernel_pca.R

kernel_pcaR Documentation

Kernel Principal Components Analysis

Description

An implementation of Kernel Principal Components Analysis (KPCA). This can be used to perform nonlinear dimensionality reduction or preprocessing on a given dataset.

Usage

kernel_pca(
  input,
  kernel,
  bandwidth = NA,
  center = FALSE,
  degree = NA,
  kernel_scale = NA,
  new_dimensionality = NA,
  nystroem_method = FALSE,
  offset = NA,
  sampling = NA,
  verbose = FALSE
)

Arguments

input

Input dataset to perform KPCA on (numeric matrix).

kernel

The kernel to use; see the above documentation for the list of usable kernels (character).

bandwidth

Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value "1" (numeric).

center

If set, the transformed data will be centered about the origin. Default value "FALSE" (logical).

degree

Degree of polynomial, for 'polynomial' kernel. Default value "1" (numeric).

kernel_scale

Scale, for 'hyptan' kernel. Default value "1" (numeric).

new_dimensionality

If not 0, reduce the dimensionality of the output dataset by ignoring the dimensions with the smallest eigenvalues. Default value "0" (integer).

nystroem_method

If set, the Nystroem method will be used. Default value "FALSE" (logical).

offset

Offset, for 'hyptan' and 'polynomial' kernels. Default value "0" (numeric).

sampling

Sampling scheme to use for the Nystroem method: 'kmeans', 'random', 'ordered. Default value "kmeans" (character).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "FALSE" (logical).

Details

This program performs Kernel Principal Components Analysis (KPCA) on the specified dataset with the specified kernel. This will transform the data onto the kernel principal components, and optionally reduce the dimensionality by ignoring the kernel principal components with the smallest eigenvalues.

For the case where a linear kernel is used, this reduces to regular PCA.

The kernels that are supported are listed below:

* 'linear': the standard linear dot product (same as normal PCA): K(x, y) = x^T y

* 'gaussian': a Gaussian kernel; requires bandwidth: K(x, y) = exp(-(|| x - y || ^ 2) / (2 * (bandwidth ^ 2)))

* 'polynomial': polynomial kernel; requires offset and degree: K(x, y) = (x^T y + offset) ^ degree

* 'hyptan': hyperbolic tangent kernel; requires scale and offset: K(x, y) = tanh(scale * (x^T y) + offset)

* 'laplacian': Laplacian kernel; requires bandwidth: K(x, y) = exp(-(|| x - y ||) / bandwidth)

* 'epanechnikov': Epanechnikov kernel; requires bandwidth: K(x, y) = max(0, 1 - || x - y ||^2 / bandwidth^2)

* 'cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || * || y ||)

The parameters for each of the kernels should be specified with the options "bandwidth", "kernel_scale", "offset", or "degree" (or a combination of those parameters).

Optionally, the Nystroem method ("Using the Nystroem method to speed up kernel machines", 2001) can be used to calculate the kernel matrix by specifying the "nystroem_method" parameter. This approach works by using a subset of the data as basis to reconstruct the kernel matrix; to specify the sampling scheme, the "sampling" parameter is used. The sampling scheme for the Nystroem method can be chosen from the following list: 'kmeans', 'random', 'ordered'.

Value

A list with several components:

output

Matrix to save modified dataset to (numeric matrix).

Author(s)

mlpack developers

Examples

# For example, the following command will perform KPCA on the dataset "input"
# using the Gaussian kernel, and saving the transformed data to
# "transformed": 

## Not run: 
output <- kernel_pca(input=input, kernel="gaussian")
transformed <- output$output

## End(Not run)

mlpack documentation built on Sept. 27, 2023, 1:07 a.m.

Related to kernel_pca in mlpack...