HorseRuleFit: Horseshoe RuleFit
In horserule: Flexible Non-Linear Regression with the HorseRule Algorithm

Description Usage Arguments Value Examples

Fits the Horseshoe Rulefit model described in \insertRefcite1horserule (https://arxiv.org/abs/1702.05008)

HorseRuleFit(X = NULL, y = NULL, Xtest = NULL, ytest = NULL,
  niter = 1000, burnin = 100, thin = 1, restricted = 0.001,
  shrink.prior = "HS", beta = 2, alpha = 1, linp = 1, ensemble = "RF",
  L = 4, S = 6, ntree = 250, minsup = 0.025, mix = 0.5,
  linterms = NULL, intercept = F, ytransform = "linear")

`X`	A matrix or dataframe containing the predictor variables to be used.
`y`	A vector containing the response variables. If numeric regression is performed and classification otherwise.
`Xtest`	optional matrix or dataframe containing predictor variables of test set.
`ytest`	optional vector containing the response values of the test set.
`niter`	number of iterations for the horseshoe sampling.
`burnin`	number of initial samples to be disregarded as burnin.
`thin`	thinning parameter.
`restricted`	Threshold for restricted Gibbs sampling. In each iteration only coefficients with scale > restricted are updated. Set restricted = 0 for unrestricted Gibbs sampling.
`shrink.prior`	Specifies the shrinkage prior to be used for regularization. Currently the options "HS" and "HS+" for the Horseshoe+ are supported.
`beta`	Hyperparameter to control the extra shrinkage on the rule complexity meassured as the rule length.
`alpha`	Hyperparameter to control the extra shrinkage on the rules that cover only few observations. Set alpha = beta = 0 for the standard horseshoe without rule structure prior.
`linp`	Hyperparameter to control prior shrinkage of linear terms. Set linp > 1 if strong linear effects are assumed.
`ensemble`	Which ensemble method should be used to generate the rules? Options are "RF","GBM" or "both".
`L`	Parameter controling the complexity of the generated rules. Higher values lead to more complex rules.
`S`	Parameter controlling the minimum number of observations in the tree growing process.
`ntree`	Number of trees in the ensemble step from which the rules are extracted.
`minsup`	Rules with support < minsup are removed. Can be used to prevent overfitting.
`mix`	If ensemble = "both" mixntree are generated via random forest and (1-mix)ntree trees via gradient boosting.
`linterms`	specifies the columns in X which should be included as linear terms in the hs rulefit model. Specified columns need to be numeric. Categorical variables have to be transformed (e.g. to dummies) before included as linear effects.
`intercept`	If TRUE an intercept is included. Note that the y by default is shifted to have 0 mean therefor not necessary for regression. For classification highly recommended.
`ytransform`	Choose "log" for logarithmic transform of y.

An object of class HorseRuleFit, which is a list of the following components:

`bhat`	Posterior mean of the regression coefficients.
`postdraws`	List contraining the Posterior samples of the regression coefficients, error variance sigma and shrinkage tau.
`rules`	Vector containing the decision rules.
`df`	Matrix containing original training data and the decision rule covariates (normalized).
`y`	Response in train data.
`prior`	Vector rule structure prior for the individual rules.
`modelstuff`	List contraining the parameters used and values used for the normalization (means and sds).
`pred`	If Test data was supplied, gives back the predicted values.
`err`	If y-test was also supplies additionally gives back a test error score (RMSE for regression, Missclassificationrate for Classficitaion).

library(MASS)
library(horserule)
data(Boston)
# Split in train and test data
N = nrow(Boston)
train = sample(1:N, 400)
Xtrain = Boston[train,-14]
ytrain = Boston[train, 14]
Xtest = Boston[-train, -14]
ytest = Boston[-train, 14]

# Run the HorseRuleFit with 100 trees
# Increase Number of trees and the number of posterior samples for better modelfit
hrres = HorseRuleFit(X = Xtrain, y=ytrain,
                    thin=1, niter=100, burnin=10,
                    L=5, S=6, ensemble = "both", mix=0.3, ntree=100,
                    intercept=FALSE, linterms=1:13, ytransform = "log",
                    alpha=1, beta=2, linp = 1, restricted = 0)

# Calculate the error
pred = predict(hrres, Xtest, burnin=100, postmean=TRUE)
sqrt(mean((pred-ytest)^2))

# Look at the most important rules/linear effects.
importance_hs(hrres)

# Look at the input variable importance.
Variable_importance(hrres, var_names=colnames(Xtrain))

[1] NaN
                                            Rule    2.5% Imp   50% Imp
1                                  X[,13]>14.915 0.017741760 1.0000000
2  X[,6]<=6.825 & X[,8]>1.46205 & X[,13]<=14.915 0.632873322 0.9313999
3                     X[,5]<=0.657 & X[,6]>6.722 0.347844167 0.5417166
4                     X[,6]>6.81 & X[,13]<=14.92 0.020011357 0.5714349
5                  X[,1]<=6.009615 & X[,6]>6.722 0.012881494 0.3591535
6     X[,1]>9.87002 & X[,5]>0.675 & X[,13]>15.53 0.133436088 0.2333702
7                                     Linear:X 9 0.094403967 0.2295709
8                  X[,13]>7.865 & X[,13]<=19.245 0.004297878 0.2017054
9                                  X[,13]<=7.865 0.003928697 0.1716683
10                                  X[,11]>19.65 0.020404644 0.1194420
   97.5% Imp         bhat
1  1.0000000 -0.298571192
2  1.0000000 -0.265037766
3  0.9638981  0.209080996
4  0.8972277 -0.216688327
5  0.7106812 -0.125093181
6  0.3803345 -0.142542105
7  0.4744710  0.003582599
8  0.4042945  0.055214359
9  0.4432554  0.054064584
10 0.3425073 -0.039479130