model.based: Decision Tree Model Based Meta-features

View source: R/model.based.R

model.basedR Documentation

Decision Tree Model Based Meta-features

Description

Decision Tree (DT) Model Based meta-features are the measures designed to extract characteristics of a DT model induced from a dataset.

Usage

model.based(...)

## Default S3 method:
model.based(x, y, features = "all", summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
model.based(formula, data, features = "all", summary = c("mean", "sd"), ...)

Arguments

...

Further arguments passed to the summarization functions.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class. The details section describes the valid values for this group.

Details

The following features are allowed for this method:

"leaves"

Number of leaves of the DT model.

"leavesBranch"

Size of branches, which consists in the level of all leaves of the DT model (multi-valued).

"leavesCorrob"

Leaves corroboration, which is the proportion of examples that belong to each leaf of the DT model (multi-valued).

"leavesHomo"

Homogeneity, which is the number of leaves divided by the structural shape of the DT model (multi-valued).

"leavesPerClass"

Leaves per class, which is the proportion of leaves of the DT model associated with each class (multi-valued).

"nodes"

Number of nodes of the DT model.

"nodesPerAttr"

Ratio of the number of nodes of the DT model per the number of attributes.

"nodesPerInst"

Ratio of the number of nodes of the DT model per the number of instances.

"nodesPerLevel"

Number of nodes of the DT model per level (multi-valued).

"nodesRepeated"

Repeated nodes, which is the number of repeated attributes that appear in the DT model (multi-valued).

"treeDepth"

Tree depth, which is the level of all tree nodes and leaves of the DT model (multi-valued).

"treeImbalance"

Tree imbalance (multi-valued).

"treeShape"

Tree shape, which is the probability of arrive in each leaf given a random walk. We call this as the structural shape of the DT model (multi-valued).

"varImportance"

Variable importance. It is calculated using the Gini index to estimate the amount of information used in the DT model (multi-valued).

Value

A list named by the requested meta-features.

References

Hilan Bensusan, Christophe Giraud-Carrier, and Claire Kennedy. A higher-order approach to meta-learning. In 10th International Conference Inductive Logic Programming (ILP), pages 33 - 42, 2000.

Yonghong Peng, Peter A. Flach, Carlos Soares, and Pavel Brazdil. Improved dataset characterization for meta-learning. In 5th International Conference on Discovery Science (DS), pages 141 - 152, 2002.

See Also

Other meta-features: clustering(), complexity(), concept(), general(), infotheo(), itemset(), landmarking(), relative(), statistical()

Examples

## Extract all meta-features using formula
model.based(Species ~ ., iris)

## Extract some meta-features
model.based(iris[1:4], iris[5], c("nodes", "leaves", "treeShape"))

## Use another summarization function
model.based(Species ~ ., iris, summary=c("min", "median", "max"))

rivolli/mfe documentation built on March 29, 2022, 11:08 p.m.