vistla: Influence path identification with the Vistla algorithm

View source: R/vistla.R

vistlaR Documentation

Influence path identification with the Vistla algorithm

Description

Detects influence paths.

Usage

vistla(x, ...)

## S3 method for class 'formula'
vistla(formula, data, ..., yn)

## S3 method for class 'data.frame'
vistla(
  x,
  y,
  ...,
  flow,
  iomin,
  targets,
  estimator = c("mle", "kt"),
  verbose = FALSE,
  yn = "Y",
  ensemble,
  threads
)

## Default S3 method:
vistla(x, ...)

Arguments

x

data frame of predictors.

...

pass-through arguments, ignored.

formula

alternatively, formula describing the task, in a form root~predictors, which adheres to standard R behaviours. Accepts + to add a predictor, - to omit one, and . to import whole data. Use I to calculate new predictors. When present in data, response is getting omitted from predictors.

data

data.frame in context of which the formula will be executed; can be omitted when not using ..

yn

name of the root (Y value), used in result pretty-printing and plots. Must be a single-element character vector.

y

vistla tree root, a feature from which influence paths will be traced.

flow

algorithm mode, specifying the iota function which gives local score to an edge of an edge graph. If in doubt, use the default, "fromdown". Consult the documentation of the flow function for more information.

iomin

score threshold below which path is not considered further. The higher value the less paths are generated, which also lowers the time taken by the function. The default value of 0 turns of this filtering. The same effect can be later achieved with the prune function.

targets

a vector of target feature names. If given, the algorithm will stop just after reaching the last of them, rather than after tracing all paths from the root. The same effect can be later achieved with the prune function. This is a simple method to remove irrelevant paths, yet it comes with a substantial increase in computational burden.

estimator

mutual information estimator to use. "mle" — maximal likelihood, requires all features to be discrete (factors or booleans). "kt" — Kendall transformation, requires all features to be either ordinal (numeric, integer or ordered factor) or bi-valued (two-level factors or booleans).

verbose

when set to TRUE, turns on reporting of the algorithm progress.

ensemble

used to switch vistla to the ensemble mode, in which a number of vistla models are built over permuted realisations of the input, and merged into a single consensus tree. Should be given an output of the ensemble function; as a short-cut, one can pass a single number, which will be interpreted as the number of replications with other ensemble parameter default. That is, ensemble=30 is equivalent to ensemble=ensemble(n=30). Permutations are applied before estimators.

threads

number of threads to use. When missing or set to 0, vistla uses all available cores.

Value

Normally, the tracing results represented as an object of a class vistla. Use paths and path_to functions to extract individual paths, branches to get the whole tree and mi_scores to get the basic score matrix.

When ensemble argument is given, a hierarchy object with the scored being counts of times certain path was present among the replicated ensemble, possibly pruned.

Note

The ensemble mode is both faster and makes better use of multithreading than replicating vistla manually.

References

"Kendall transformation brings a robust categorical representation of ordinal data" M.B. Kursa. SciRep 12, 8341 (2022).


vistla documentation built on June 24, 2024, 5:17 p.m.

Related to vistla in vistla...