sbrl | R Documentation |
Fit the scalable bayesian rule lists model with given data and parameters. It generates a model that is a probabilistic classifier that optimizes the posterior of a Bayesian hierarchical model over pre-mined association rules.
sbrl(tdata, iters=30000, pos_sign="1",
neg_sign="0", rule_minlen=1, rule_maxlen=1,
minsupport_pos=0.10, minsupport_neg=0.10,
lambda=10.0, eta=1.0, alpha=c(1,1), nchain=10)
tdata |
a dataframe, with a "label" column specifying the correct labels for each observation. |
iters |
the number of iterations for each MCMC chain. |
pos_sign |
the sign for the positive labels in the "label" column. |
neg_sign |
the sign for the negative labels in the "label" column. |
rule_minlen |
the minimum number of cardinality for rules to be mined from the dataframe. |
rule_maxlen |
the maximum number of cardinality for rules to be mined from the dataframe. |
minsupport_pos |
a number between 0 and 1, for the minimum percentage support for the positive observations. |
minsupport_neg |
a number between 0 and 1, for the minimum percentage support for the negative observations. |
lambda |
a hyperparameter for the expected length of the rule list. |
eta |
a hyperparameter for the expected cardinality of the rules in the optimal rule list. |
alpha |
a prior pseudo-count for the positive and negative classes. fixed at 1's |
nchain |
an integer for the number of the chains that MCMC will be running. |
Return a list of :
rs |
a ruleset which contains the rule indices and their positive probabilities for the best rule list by training sbrl with the given data and parameters. |
rulenames |
a list of all the rule names mined with |
featurenames |
a list of all the feature names. |
mat_feature_rule |
a binary matrix representing which features are included in which rules. |
Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer
Hongyu Yang, Cynthia Rudin, Margo Seltzer (2017) Scalable Bayesian Rule Lists. Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3921-3930, 2017.
Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan (2015) Building Interpretable Classifiers with Rules using Bayesian Analysis. Annals of Applied Statistics, 2015.
# Let us use the titactoe dataset
data(tictactoe)
for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])}
# Train on two-thirds of the data
b = round(2*nrow(tictactoe)/3, digit=0)
data_train <- tictactoe[1:b, ]
# Test on the remaining one third of the data
data_test <- tictactoe[(b+1):nrow(tictactoe), ]
# data_train, data_test are dataframes with factor columns
# The class column is "label"
# Run the sbrl algorithm on the training set
sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1",
neg_sign="0", rule_minlen=1, rule_maxlen=3,
minsupport_pos=0.10, minsupport_neg=0.10,
lambda=10.0, eta=1.0, nchain=25)
print(sbrl_model)
# Make predictions on the test set
yhat <- predict(sbrl_model, data_test)
# yhat will be a list of predicted negative and positive probabilities for the test data.
#clean up
rm(list = ls())
gc()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.