KFOCI: Kernel Feature Ordering by Conditional Independence

View source: R/KPC.R

KFOCIR Documentation

Kernel Feature Ordering by Conditional Independence

Description

Variable selection with KPC using directed K-NN graph or minimum spanning tree (MST)

Usage

KFOCI(
  Y,
  X,
  Z = NULL,
  k = kernlab::rbfdot(1/(2 * stats::median(stats::dist(Y))^2)),
  Knn = min(ceiling(NROW(Y)/20), 20),
  num_features = NULL,
  stop = TRUE,
  numCores = parallel::detectCores(),
  verbose = FALSE
)

Arguments

Y

a matrix of responses (n by dy).

X

a matrix of predictors (n by dx).

Z

Integer vector of column indices in X to pre-condition on. These variables are always included in the conditioning set and are not re-selected. The default NULL corresponds to no pre-conditioning.

k

a function k(y, y') of class kernel. It can be the kernel implemented in kernlab e.g., Gaussian kernel: rbfdot(sigma = 1), linear kernel: vanilladot().

Knn

a positive integer indicating the number of nearest neighbor; or "MST". The suggested choice of Knn is 0.05n for samples up to a few hundred observations. For large n, the suggested Knn is sublinear in n. That is, it may grow slower than any linear function of n. The computing time is approximately linear in Knn. A smaller Knn takes less time.

num_features

the number of variables to be selected from the non-pre-conditioned set, cannot be larger than dx - |Z|. The default value is NULL and in that case it will be set equal to dx - |Z|. If stop == TRUE (see below), then num_features is the maximal number of variables to be selected (selection may stop earlier).

stop

If stop == TRUE, then the automatic stopping criterion (stops at the first instance of negative Tn, as mentioned in the paper) will be implemented and continued till num_features many variables are selected. If stop == FALSE then exactly num_features many variables are selected.

numCores

number of cores that are going to be used for parallelizing the process.

verbose

whether to print each selected variables during the forward stepwise algorithm

Details

A stepwise forward selection of variables using KPC. At each step it selects the X_j that maximizes \hat{\rho^2}(Y,X_j |selected X_i). When Z is specified, the algorithm conditions on those variables throughout, i.e. the formal goal is then to find a subset S \subset \lbrace 1, \dotsc, dx\rbrace\setminus Z such that Y \perp X_{S^c}\mid (X_Z, X_S). It is suggested to normalize the predictors before applying KFOCI. Euclidean distance is used for computing the K-NN graph and the MST.

Value

The algorithm returns a vector of the indices from 1,...,dx from the non-pre-conditioned set of the selected variables in the same order that they were selected. The variables at the front are expected to be more informative in predicting Y.

See Also

KPCgraph, KPCRKHS, KPCRKHS_VS

Examples

n = 200
p = 10
X = matrix(rnorm(n * p), ncol = p)
Y = X[, 1] * X[, 2] + sin(X[, 1] * X[, 3])
KFOCI(Y, X, k=kernlab::rbfdot(1), Knn=1, numCores=1)
# 1 2 3

KPC documentation built on May 3, 2026, 1:07 a.m.

Related to KFOCI in KPC...