sivs: Stable Iterative Variable Selection

View source: R/sivs.R

sivsR Documentation

Stable Iterative Variable Selection

Description

The name is an acronym for Stable Iterative Variable Selection. This function will iteratively run a machine learning method that can incorporate a shrinkage method using multiple random seeds in order to find the smallest set of features that can robustly be predictive.

Usage

sivs(
  x,
  y,
  test.ratio = 1/3,
  method = "glmnet",
  iter.count = 100,
  nfolds = 10,
  sample.grouping = NULL,
  parallel.cores = "grace",
  progressbar = TRUE,
  verbose = "general",
  return.fits = FALSE,
  return.roc = FALSE,
  return.sessionInfo = TRUE,
  lib.paths = .libPaths(),
  debug.mode = FALSE,
  ...
)

Arguments

x

The input data. Each row should represent a sample and each column should represent a feature.

y

Response variable. It should be of class factor for classification and of class Surv for survival.

test.ratio

How much of the data should be cut and used for testing

method

The internal machine learning method to be used

iter.count

How many iterations should the function go through

nfolds

How many folds should the training cross-validations have

sample.grouping

A character, numeric or factor vector to specify how the samples should be grouped/bundled together in the cross-validation binning. If set to NULL the grouping will be skipped. Samples with the same value will always be kept together in the same bins in cross-validation. This is especially useful when having multiple samples from the same individual. Default is NULL.

parallel.cores

How many cores should be used in the iterative process. The value should be the number of threads in numeric form, or any of these values: "max", "grace", FALSE, NULL. If set to "max", all cores will be used and in large datasets you might face your computer struggling and ultimately errors. If set to "grace", one core will be left out so that it can be used by other processes in the machine. If set to NULL of FALSE, the code will run sequentially and without the parallel backend.

progressbar

Logical. If the progressbar should be shown. Default is TRUE.

verbose

Character. How detailed the progress should be reported. The value should be a character vector of length 1. "detailed" will report every single step. "general" will report only main steps. "none" or FALSE will suppress any reporting.

return.fits

Logical. Whether the fit object for each iterative run should be returned. Having the fits in the final object would significantly increase the final object size. Default is FALSE.

return.roc

Logical. Whether the ROC object for each iterative run should be returned. Having the fits in the final object would significantly increase the final object size. Default is FALSE.

return.sessionInfo

Logical. Whether the utils::sessionInfo() be included in the final object. This is useful for reproducibility purposes. Default is TRUE.

lib.paths

A character vector that contains the paths that the dependency libraries are in it. REMEMBER to set this if you are using packrat.

debug.mode

Whether or not the debug mode should be enabled.

...

Other parameters to be passed to the training method. For example the value of alpha in glmnet.

Value

An object with S3 class "sivs". run.info$call: The call that produced this object run.info$sessionInfo: The object produced by utils::sessionInfo()

Examples

## Not run: 
# considering that you have your data object as `DATA` where you have rows
# as samples and columns as features, and the response value as a vector
# named `RESP`:

# simple defult run
sivs_object <- sivs(x = DATA, y = RESP)

# simple run with using only 3 CPU cores
sivs_object <- sivs(x = DATA, y = RESP, parallel.cores = 3)


# get the variable importance values
sivs_object$vimp

# get the condision that the sivs was ran in
sivs_object$run.info$call
sivs_object$run.info$sessionInfo

## End(Not run)

## WORKING EXAMPLE
## Note that this example does not logically make sense as iris data has only
## 4 columns and there is no need for SIVS to take care of feature selection
## therefore this example is only here for testing purposes.

tmp <- subset(x = iris, subset = Species != "setosa")

tmp <- varhandle::unfactor(tmp)

sivs_obj <- sivs(x = tmp[, c("Sepal.Length", "Sepal.Width",
                             "Petal.Length", "Petal.Width")],
                 y = factor(tmp$Species),
                 family = "binomial",
                 verbose = "detailed",
                 progressbar = FALSE,
                 nfolds = 3,
                 parallel.cores = FALSE,
                 iter.count = 20)



sivs documentation built on Nov. 2, 2023, 6:05 p.m.

Related to sivs in sivs...