rule_double: Outlying bivariate linear continuous association rule finder

Description Usage Arguments Value Examples

Description

This function allows you to search for association rules on outlying bivariate linear continuous features against a binary label. The predicted label is 0, and the overfitting severity is very high (see: Kaggle's Santander Customer Satisfaction competition). Unlike the univariate rule finder, it cannot be used to score outliers first (a 300 feature matrix can get to about 9000 features...). Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages.

Usage

1
2
3
rule_double(data, label, train_rows = length(label), iterations = 1000,
  minimal_score = 25, minimal_node = 5, false_negatives = 2,
  seed = 11111)

Arguments

data

The data.frame containing the features to make association rules on, or the scoring matrix. Missing values are not allowed.

label

The target label as an integer vector (each value must be either 0 or 1). 1 must be the miniority label.

train_rows

The rows used for training the association rules. Must be your training set, whose length is equal to length(labels). Defaults to length(label).

iterations

The amount of iterations allowed for limited-memory Gradient Descent

minimal_score

The association rule finder will not accept any node under the allowed outlying score. Defaults to 25.

minimal_node

The association rule finder will not accept any node containing under that specific amount of samples. Defaults to 5.

false_negatives

The association rule will allow at most (false_negatives - 1) false negatives. A higher allows a more permissive algorithm, lower makes it very difficult to converge (or to find any rule at all). Defaults to 2.

seed

The random seed for reproducibility. Defaults to 11111.

Value

A vector with nrow(data) elements: the general result for each observation using bivariate rules.

Examples

1
2
3
4
5
## Not run: 
rules <- rule_double(data = scored_data, label = target, iterations = 100)
preds <- preds[rules[(length(target)+1):(nrow(data))] == 0] <- 0

## End(Not run)

Laurae2/Laurae documentation built on May 8, 2019, 7:59 p.m.