select.features: Select important features

View source: R/select.features.R

select.featuresR Documentation

Select important features

Description

Select features using supervised or unsupervised kernel method. A supervised feature selection method is performed if Y is provided.

Usage

## S3 method for class 'features'
select(
  X,
  Y = NULL,
  kx.func = c("linear", "gaussian.radial.basis", "bray"),
  ky.func = c("linear", "gaussian.radial.basis"),
  keepX = NULL,
  method = c("kernel", "kpca", "graph"),
  lambda = NULL,
  n_components = 2,
  Lg = NULL,
  mu = 1,
  max_iter = 100,
  nstep = 50,
  ...
)

Arguments

X

a numeric matrix (or data frame) used to select variables. NAs not allowed.

Y

a numeric matrix (or data frame) used to select variables. NAs not allowed.

kx.func

the kernel function name to use on X. Widely used kernel functions are pre-implemented, and can be directly used by setting kx.func to one of the following values: "linear", "gaussian.radial.basis" or "bray". Default: "linear". If Y is provided, the kernel "bray" is not allowed.

ky.func

the kernel function name to use on Y. Available kernels are: "linear", and "gaussian.radial.basis". Default: "linear". This value is ignored when Y is not provided.

keepX

the number of variables to select.

method

the method to use. Either an unsupervised variable selection method ("kernel"), a kernel PCA oriented variable selection method ("kpca") or a structure driven variable selection selection ("graph"). Default: "kernel".

lambda

the penalization parameter that controls the trade-off between the minimization of the distorsion and the sparsity of the solution parameter.

n_components

how many principal components should be used with method "kpca". Required with method "kpca". Default: 2.

Lg

the Laplacian matrix of the graph representing relations between the input dataset variables. Required with method "graph".

mu

the penalization parameter that controls the trade-off between the the distorsion and the influence of the graph. Default: 1.

max_iter

the maximum number of iterations. Default: 100.

nstep

the number of values used for the regularization path. Default: 50.

...

the kernel function arguments. In particular, sigma("gaussian.radial.basis"): double. The inverse kernel width used by "gaussian.radial.basis".

Value

ukfs returns a vector of sorted selected features indexes.

Author(s)

Celine Brouard <celine.brouard@inrae.fr> Jerome Mariette <jerome.mariette@inrae.fr> Nathalie Vialaneix <nathalie.vialaneix@inrae.fr>

References

Brouard C., Mariette J., Flamary R. and Vialaneix N. (2022). Feature selection for kernel methods in systems biology. NAR Genomics and Bioinformatics, 4(1), lqac014. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/nargab/lqac014")}.

See Also

compute.kernel

Examples

## These examples require the installation of python modules
## See installation instruction at: http://mixkernel.clementine.wf

data("Koren.16S")
## Not run: 
 sf.res <- select.features(Koren.16S$data.raw, kx.func = "bray", lambda = 1,
                           keepX = 40, nstep = 1)
 colnames(Koren.16S$data.raw)[sf.res]

## End(Not run)

data("nutrimouse")
## Not run: 
 grb.func <- "gaussian.radial.basis"
 genes <- center.scale(nutrimouse$gene)
 lipids <- center.scale(nutrimouse$lipid)
 sf.res <- select.features(genes, lipids, kx.func = grb.func, 
                           ky.func = grb.func, keepX = 40)
 colnames(nutrimouse$gene)[sf.res]

## End(Not run)


mixKernel documentation built on Sept. 18, 2023, 5:16 p.m.