var.select.mir: Variable selection with mutual impurity reduction (MIR)
In StephanSeifert/SurrogateMinimalDepth: Surrogate minimal depth variable importance

var.select.mir

R Documentation

Variable selection with mutual impurity reduction (MIR)

Description

This function executes MIR applying ranger for random forests generation and actual impurity reduction and a modified version of rpart to find surrogate variables.

Usage

var.select.mir(
  x = NULL,
  y = NULL,
  ntree = 500,
  type = "regression",
  s = NULL,
  mtry = NULL,
  min.node.size = 1,
  num.threads = NULL,
  status = NULL,
  save.ranger = FALSE,
  save.memory = FALSE,
  min.var.p = 200,
  p.t.sel = 0.01,
  p.t.rel = 0.01,
  select.var = TRUE,
  select.rel = FALSE,
  case.weights = NULL,
  corr.rel = TRUE,
  t = 5,
  method.rel = "janitza",
  method.sel = "janitza",
  num.threads.rel = NULL
)

Arguments

`x`	matrix or data.frame of predictor variables with variables in columns and samples in rows (Note: missing values are not allowed)
`y`	vector with values of phenotype variable (Note: will be converted to factor if classification mode is used). For survival forests this is the time variable.
`ntree`	number of trees. Default is 500.
`type`	mode of prediction ("regression" or "classification"). Default is regression.
`s`	predefined number of surrogate splits (it may happen that the actual number of surrogate splits differs in individual nodes). Default is 1 % of no. of variables.
`mtry`	number of variables to possibly split at in each node. Default is no. of variables^(3/4) ("^3/4") as recommended by (Ishwaran 2011). Also possible is "sqrt" and "0.5" to use the square root or half of the no. of variables.
`min.node.size`	minimal node size. Default is 1.
`num.threads`	number of threads used for parallel execution. Default is number of CPUs available.
`status`	status variable, only applicable to survival data. Use 1 for event and 0 for censoring.
`save.ranger`	set TRUE if ranger object should be saved. Default is that ranger object is not saved (FALSE).
`save.memory`	Use memory saving (but slower) splitting mode. No effect for survival and GWAS data. Warning: This option slows down the tree growing, use only if you encounter memory problems. (This parameter is transfered to ranger)
`min.var.p`	minimum number of permuted variables used to determine p-value for variable selection of important variables. Default is 200.
`p.t.sel`	p.value threshold for selection of important variables. Default is 0.01.
`p.t.rel`	p.value threshold for selection of related variables. Default is 0.01.
`select.var`	set False if only importance should be calculated and no variables should be selected.
`select.rel`	set False if only relations should be calculated and no variables should be selected.
`case.weights`	Weights for sampling of training observations. Observations with larger weights will be selected with higher probability in the bootstrap (or subsampled) samples for the trees.
`corr.rel`	set FALSE if non-corrected variable relations should be used for calculation of MIR. In this case the method "janitza" should not be used for selection of important variables
`t`	variable to calculate threshold for non-corrected relation analysis. Default is 5.
`method.rel`	Method to compute p-values for selection of related variables with var.relations.corr. Use "janitza" for the method by Janitza et al. (2016) or "permutation" to utilize permuted variables.
`method.sel`	Method to compute p-values for selection of important variables. Use "janitza" for the method by Janitza et al. (2016) (can only be used when corrected variable relations are utilized) or "permutation" to utilize permuted variables.
`num.threads.rel`	number of threads used for determination of relations. Default is number of CPUs available. (this process can be memory-intensive and it can be preferable to reduce this)

Value

list with the following components:

info: list with results containing:
- MIR: the calculated variable importance for each variable based on mutual impurity reduction.
- pvalue: the obtained p-values for each variable.
- selected: variables has been selected (1) or not (0).
- relations: a list containing the results of variable relation analysis.
- parameters: a list that contains the parameters s, type, mtry, p.t.sel, p.t.rel and method.sel that were used.
var: vector of selected variables.
ranger: ranger object.

References

Nembrini, S. et al. (2018) The revival of the Gini importance? Bioinformatics, 34, 3711–3718. https://academic.oup.com/bioinformatics/article/34/21/3711/4994791
Seifert, S. et al. (2019) Surrogate minimal depth as an importance measure for variables in random forests. Bioinformatics, 35, 3663–3671. https://academic.oup.com/bioinformatics/article/35/19/3663/5368013

Examples

# read data
data("SMD_example_data")


# select variables (usually more trees are needed)
set.seed(42)
res = var.select.mir(x = SMD_example_data[,2:ncol(SMD_example_data)], y = SMD_example_data[,1],s = 10, ntree = 10)
res$var

StephanSeifert/SurrogateMinimalDepth documentation built on Aug. 7, 2023, 1:59 a.m.

StephanSeifert/SurrogateMinimalDepth index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

StephanSeifert/SurrogateMinimalDepth
Surrogate minimal depth variable importance

var.select.mir: Variable selection with mutual impurity reduction (MIR)
In StephanSeifert/SurrogateMinimalDepth: Surrogate minimal depth variable importance

Variable selection with mutual impurity reduction (MIR)

Description

Usage

Arguments

Value

References

Examples

Related to var.select.mir in StephanSeifert/SurrogateMinimalDepth...

R Package Documentation

Browse R Packages

We want your feedback!

StephanSeifert/SurrogateMinimalDepth Surrogate minimal depth variable importance

var.select.mir: Variable selection with mutual impurity reduction (MIR) In StephanSeifert/SurrogateMinimalDepth: Surrogate minimal depth variable importance

Variable selection with mutual impurity reduction (MIR)

Description

Usage

Arguments

Value

References

Examples

Related to var.select.mir in StephanSeifert/SurrogateMinimalDepth...

R Package Documentation

Browse R Packages

We want your feedback!

StephanSeifert/SurrogateMinimalDepth
Surrogate minimal depth variable importance

var.select.mir: Variable selection with mutual impurity reduction (MIR)
In StephanSeifert/SurrogateMinimalDepth: Surrogate minimal depth variable importance