model.based: Decision Tree Model Based Meta-features
In rivolli/mfe: Meta-Feature Extractor

model.based

R Documentation

Decision Tree Model Based Meta-features

Description

Decision Tree (DT) Model Based meta-features are the measures designed to extract characteristics of a DT model induced from a dataset.

Usage

model.based(...)

## Default S3 method:
model.based(x, y, features = "all", summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
model.based(formula, data, features = "all", summary = c("mean", "sd"), ...)

Arguments

`...`	Further arguments passed to the summarization functions.
`x`	A data.frame contained only the input attributes.
`y`	A factor response vector with one label for each row/component of x.
`features`	A list of features names or `"all"` to include all them.
`summary`	A list of summarization functions or empty for all values. See post.processing method to more information. (Default: `c("mean", "sd")`)
`formula`	A formula to define the class column.
`data`	A data.frame dataset contained the input attributes and class. The details section describes the valid values for this group.

Details

The following features are allowed for this method:

"leaves": Number of leaves of the DT model.
"leavesBranch": Size of branches, which consists in the level of all leaves of the DT model (multi-valued).
"leavesCorrob": Leaves corroboration, which is the proportion of examples that belong to each leaf of the DT model (multi-valued).
"leavesHomo": Homogeneity, which is the number of leaves divided by the structural shape of the DT model (multi-valued).
"leavesPerClass": Leaves per class, which is the proportion of leaves of the DT model associated with each class (multi-valued).
"nodes": Number of nodes of the DT model.
"nodesPerAttr": Ratio of the number of nodes of the DT model per the number of attributes.
"nodesPerInst": Ratio of the number of nodes of the DT model per the number of instances.
"nodesPerLevel": Number of nodes of the DT model per level (multi-valued).
"nodesRepeated": Repeated nodes, which is the number of repeated attributes that appear in the DT model (multi-valued).
"treeDepth": Tree depth, which is the level of all tree nodes and leaves of the DT model (multi-valued).
"treeImbalance": Tree imbalance (multi-valued).
"treeShape": Tree shape, which is the probability of arrive in each leaf given a random walk. We call this as the structural shape of the DT model (multi-valued).
"varImportance": Variable importance. It is calculated using the Gini index to estimate the amount of information used in the DT model (multi-valued).

Value

A list named by the requested meta-features.

References

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approach to meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 - 42, 2000.

Yonghong Peng, Peter A. Flach, Carlos Soares, and Pavel Brazdil. Improved dataset characterization for meta-learning. In 5th International Conference on Discovery Science (DS), pages 141 - 152, 2002.

Examples

## Extract all meta-features using formula
model.based(Species ~ ., iris)

## Extract some meta-features
model.based(iris[1:4], iris[5], c("nodes", "leaves", "treeShape"))

## Use another summarization function
model.based(Species ~ ., iris, summary=c("min", "median", "max"))

rivolli/mfe documentation built on March 29, 2022, 11:08 p.m.