R/RcppExports.R

Defines functions corels

Documented in corels

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' R Interface to 'Certifiably Optimal RulE ListS (Corels)'
#'
#' CORELS is a custom discrete optimization technique for building rule lists over a categorical feature space. The algorithm provides the optimal solution with a certificate of optimality. By leveraging algorithmic bounds, efficient data structures, and computational reuse, it achieves several orders of magnitude speedup in time and a massive reduction of memory consumption. This approach produces optimal rule lists on practical problems in seconds, and offers a novel alternative to CART and other decision tree methods.
#' @title Corels interace
#' @param rules_file Character variable with file name for training data; see corels documentation and data section below.
#' @param labels_file Character variable with file name for training data labels; see corels documentation and data section below.
#' @param log_dir Character variable with logfile directory name
#' @param meta_file Optional character variable with file name for minor data with bit vector to support equivalent points bound (see Theorem 20 in Section 3.14).
#' @param run_bfs Boolean toggle for \sQuote{breadth-first search}. Exactly one of \sQuote{breadth-first search} or \sQuote{curiosity_policy} \emph{must} be specified.
#' @param calculate_size Optional boolean toggle to calculate upper bound on remaining search space size which adds a small overheard; default is to not do this.
#' @param run_curiosity Boolean toggle
#' @param curiosity_policy Integer value (between 1 and 4) for best-fist search policy. Exactly one of \sQuote{breadth-first search} or \sQuote{curiosity_policy} \emph{must} be specified. The four different prirization schemes are chosen, respectively, by values of one for prioritize by curiousity (see Section 5.1 of the paper), two for prioritize by the lower bound, three for prioritize by the objective or four for depth-first search.
#' @param latex_out Optional boolean toggle to select LaTeX output of the output rule list.
#' @param map_type Optional integer value for the symmetry-aware map. Use zero for no symmetry-aware map (this is also the default), one for permutation map, and two for the captured vector map.
#' @param verbosity_policy Optional character variable one containing one or more of the terms \sQuote{rule}, \sQuote{label}, \sQuote{minor}, \sQuote{samples}, \sQuote{progress}, \sQuote{loud}, or \sQuote{silent}.
#' @param max_num_nodes Integer value for the maximum trie cache size; execution stops when the number of node isn trie exceeds this number; default is 100000.
#' @param regularization Optional double value, default is 0.01 which can be thought of as a penalty equivalent to misclassifying 1\% of the data when increasing the length of a rule list by one association rule.
#' @param logging_frequency Optional integer value with default of 1000.
#' @param ablation Integer value, default value is zero, one excludes the minimum support bounds (see Section 3.7), two excludes the lookahead bound (see Lemma 2 in Section 3.4).
#' @return A constant bool for now
#' @seealso The corels C++ implementation at https://github.com/nlarusstone/corels, the website at https://github.com/nlarusstone/corels and the Python implementation at https://github.com/fingoldin/pycorels.
#' @references Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and Cynthia Rudin. *Learning Certifiably Optimal Rule Lists for Categorical Data.* JMLR 2018, http://www.jmlr.org/papers/volume18/17-716/17-716.pdf
#' Nicholas Larus-Stone, Elaine Angelino, Daniel Alabi, Margo Seltzer, Vassilios Kaxiras, Aditya Saligrama, Cynthia Rudin. *Systems Optimizations for Learning Certifiably Optimal Rule Lists*. SysML 2018 http://www.sysml.cc/doc/2018/54.pdf
#' Nicholas Larus-Stone. *Learning Certifiably Optimal Rule Lists: A Case For Discrete Optimization in the 21st Century. Senior thesis 2017. https://dash.harvard.edu/handle/1/38811502.
#' Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, Cynthia Rudin. *Learning certifiably optimal rule lists for categorical data*. KDD 2017, https://www.kdd.org/kdd2017/papers/view/learning-certifiably-optimal-rule-lists-for-categorical-data.
#' @examples
#' library(corels)
#'
#' logdir <- tempdir()
#' rules_file <- system.file("sample_data", "compas_train.out", package="corels")
#' labels_file <- system.file("sample_data", "compas_train.label", package="corels")
#' meta_file <- system.file("sample_data", "compas_train.minor", package="corels")
#'
#' stopifnot(file.exists(rules_file),
#'           file.exists(labels_file),
#'           file.exists(meta_file),
#'           dir.exists(logdir))
#'
#' corels(rules_file, labels_file, logdir, meta_file,
#'        verbosity_policy = "silent",
#'        regularization = 0.015,
#'        curiosity_policy = 2,   # by lower bound
#'        map_type = 1) 	   # permutation map
#' cat("See ", logdir, " for result file.")
corels <- function(rules_file, labels_file, log_dir, meta_file = "", run_bfs = FALSE, calculate_size = FALSE, run_curiosity = FALSE, curiosity_policy = 0L, latex_out = FALSE, map_type = 0L, verbosity_policy = 0L, max_num_nodes = 100000L, regularization = 0.01, logging_frequency = 1000L, ablation = 0L) {
    .Call(`_corels_corels`, rules_file, labels_file, log_dir, meta_file, run_bfs, calculate_size, run_curiosity, curiosity_policy, latex_out, map_type, verbosity_policy, max_num_nodes, regularization, logging_frequency, ablation)
}

Try the corels package in your browser

Any scripts or data that you put into this service are public.

corels documentation built on Feb. 4, 2022, 5:07 p.m.