landmarking: Landmarking and Subsampling Landmarking Meta-features

Description Usage Arguments Details Value References See Also Examples

View source: R/landmarking.R

Description

Landmarking measures are simple and fast learners, from which performance can be extracted.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
landmarking(...)

## Default S3 method:
landmarking(
  x,
  y,
  features = "all",
  summary = c("mean", "sd"),
  size = 1,
  folds = 10,
  score = "accuracy",
  ...
)

## S3 method for class 'formula'
landmarking(
  formula,
  data,
  features = "all",
  summary = c("mean", "sd"),
  size = 1,
  folds = 10,
  score = "accuracy",
  ...
)

Arguments

...

Further arguments passed to the summarization functions.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

size

The percentage of examples subsampled. Values different from 1 generate the subsampling-based landmarking metafeatures. (Default: 1.0)

folds

The number of k equal size subsamples in k-fold cross-validation.(Default: 10)

score

The evaluation measure used to score the classification performance. c("accuracy", "balanced.accuracy", "kappa"). (Default: "accuracy").

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class. The details section describes the valid values for this group.

Details

The following features are allowed for this method:

"bestNode"

Construct a single decision tree node model induced by the most informative attribute to establish the linear separability (multi-valued).

"eliteNN"

Elite nearest neighbor uses the most informative attribute in the dataset to induce the 1-nearest neighbor. With the subset of informative attributes is expected that the models should be noise tolerant (multi-valued).

"linearDiscr"

Apply the Linear Discriminant classifier to construct a linear split (non parallel axis) in the data to establish the linear separability (multi-valued).

"naiveBayes"

Evaluate the performance of the Naive Bayes classifier. It assumes that the attributes are independent and each example belongs to a certain class based on the Bayes probability (multi-valued).

"oneNN"

Evaluate the performance of the 1-nearest neighbor classifier. It uses the euclidean distance of the nearest neighbor to determine how noisy is the data (multi-valued).

"randomNode"

Construct a single decision tree node model induced by a random attribute. The combination with "bestNode" measure can establish the linear separability (multi-valued).

"worstNode"

Construct a single decision tree node model induced by the worst informative attribute. The combination with "bestNode" measure can establish the linear separability (multi-valued).

Value

A list named by the requested meta-features.

References

Bernhard Pfahringer, Hilan Bensusan, and Christophe Giraud-Carrier. Meta-learning by landmarking various learning algorithms. In 17th International Conference on Machine Learning (ICML), pages 743 - 750, 2000.

See Also

Other meta-features: clustering(), complexity(), concept(), general(), infotheo(), itemset(), model.based(), relative(), statistical()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Extract all meta-features using formula
landmarking(Species ~ ., iris)

## Extract some meta-features
landmarking(iris[1:4], iris[5], c("bestNode", "randomNode", "worstNode"))

## Use another summarization function
landmarking(Species ~ ., iris, summary=c("min", "median", "max"))

## Use 2 folds and balanced accuracy
landmarking(Species ~ ., iris, folds=2, score="balanced.accuracy")

## Extract the subsapling landmarking
landmarking(Species ~ ., iris, size=0.7)

mfe documentation built on July 1, 2020, 10:46 p.m.