Gaussian_Kernel_feature: Kernel PCA using Gaussian kernel

View source: R/Gaussian.kernel.R

Gaussian_Kernel_featureR Documentation

Kernel PCA using Gaussian kernel

Description

This function transforms the imput covariate space to the feature space using kernel PCA. The kernel matrix K is constructed by Gaussian kernel. The input is the original covariate matrix and the number of candidate sigma from the kernel function used. The function returns the list of the feature space and other parameters from the kernel PCA.

Usage

Gaussian_Kernel_feature(X, n_sigma = 20, k = 500)

Arguments

X

The covariate matrix with the rows corresponding to the subjects/participants and the columns corresponding to the covariates.

n_sigma

The number of bandwidth value of the Gaussian kernel function used to generate the feature space. See details below. Default to 20.

k

The number of top K eigen value calculated. Note that k should be smaller than the sample size. The minimum of k and the sample size will be used. Defaults to 500.

Details

Kernel PCA is a simple way of implemention of RKHS to extend certain parametric methods with flexible modeling. Kernel PCA can extend the input covariates to a feature space of non-linear transformations defined by a kernel function. The kernel function (Gaussian kernel is implemented here) defines an inner product in the feature space between each pair of subjects. The kernel matrix K is constructed by calculating a measure of similarity between any two subjects with the Gaussian kernel function. And different sigma, the bandwitdh of the Gaussian kernel, determines the set of functions in the transformed space. Then eigen decomposition is conducted on the kernel matrix. The eigen vector matrix is the transformed feature matrix. For computational efficiency, only the top k eigen values are calculated. The range of the bandwidth sigma is the 0.1 and 0.9 quantiles of the L_2 distances between samples. The set of sigma includes n_sigma values from the function input within this range that are equally spaced on the log-scale. For each sigma value, the number of the eigen vectors corresponding to 90%, 95%, and 99% variance explained by the eigen values are calculated.

Value

A list containing the following components:

  • "features": A list of transformed feature space corresponding to each bandwidth sigma. Each elements of this list is a list whose elements are the feature space matrixes containing the number of the selected columns correspond to 90%, 95%, and 99% variance explained by the eigen values.

  • "r": A matrix containing the number of the eigen vectors corresponding to 90%, 95%, and 99% variance explained by the eigen values for each bandwith sigma.

  • "d": A list of eigen value corresponding to each bandwidth sigma.

  • "sigma_seq": The value of the bandwith sigma from the Gaussian kernel function.

References

Chen, W. and Zhang, H. (2007) The condition of kernelizing an algorithm and an equivalence between kernel methods. In Iberian Conference on Pattern Recognition and Image Analysis, 228-245. Springer.

Examples

KS = Kang_Schafer_Simulation(n = 1000, seeds = 5050)
# Misspecified propensity score model
X = KS$Data[,7:10]
# Transform X with Gaussian kernel using 10 sigma
x.k = Gaussian_Kernel_feature(X = X, n_sigma = 10)


fiona19832008/PSLB documentation built on April 14, 2022, 12:41 a.m.