Dirac: Kernels for categorical variables

View source: R/kernel_functions.R

DiracR Documentation

Kernels for categorical variables

Description

From a matrix or data.frame with dimension NxD, where N>1, D>0, 'Dirac()' computes the simplest kernel for categorical data. Samples should be in the rows and features in the columns. When there is a single feature, 'Dirac()' returns 1 if the category (or class, or level) is the same in two given samples, and 0 otherwise. Instead, when D>1, the results for the D features are combined doing a sum, a mean, or a weighted mean.

Usage

Dirac(X, comp = "mean", coeff = NULL, feat_space = FALSE)

Arguments

X

Matrix (class "character") or data.frame (class "character", or columns = "factor"). The elements in X are assumed to be categorical in nature.

comp

When D>1, this argument indicates how the variables of the dataset are combined. Options are: "mean", "sum" and "weighted". (Defaults: "mean")

  • "sum" gives the same importance to all variables, and returns an unnormalized kernel matrix.

  • "mean" gives the same importance to all variables, and returns a normalized kernel matrix (all its elements range between 0 and 1).

  • "weighted" weights each variable according to the 'coeff' parameter, and returns a normalized kernel matrix.

coeff

(optional) A vector of weights with length D.

feat_space

If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE).

Value

Kernel matrix (dimension: NxN), or a list with the kernel matrix and the feature space.

References

Belanche, L. A., and Villegas, M. A. (2013). Kernel functions for categorical variables with application to problems in the life sciences. Artificial Intelligence Research and Development (pp. 171-180). IOS Press. Link

Examples

# Categorical data
summary(CO2)
Kdirac <- Dirac(CO2[,1:3])
## Display a subset of the kernel matrix:
Kdirac[c(1,15,50,65),c(1,15,50,65)]

kerntools documentation built on April 3, 2025, 7:52 p.m.