sbfc: Selective Bayesian Forest Classifier (SBFC) algorithm

Description Usage Arguments Details Value Examples

View source: R/sbfc.R

Description

Runs the SBFC algorithm on a discretized data set. To discretize your data, use the data_disc command.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
sbfc(
  data,
  nstep = NULL,
  thin = 50,
  burnin_denom = 5,
  cv = T,
  thinoutputs = F,
  alpha = 5,
  y_penalty = 1,
  x_penalty = 4
)

Arguments

data

Discretized data set:

TrainX

Matrix containing the training data.

TrainY

Vector containing the class labels for the training data.

TestX

Matrix containing the test data (optional).

TestY

Vector containing the class labels for the test data (optional).

nstep

Number of MCMC steps, default max(10000, 10 * ncol(TrainX)).

thin

Thinning factor for the MCMC.

burnin_denom

Denominator of the fraction of total MCMC steps discarded as burnin (default=5).

cv

Do cross-validation on the training set (if test set is not provided).

thinoutputs

Return thinned MCMC outputs (parents, groups, trees, logposterior), rather than all outputs (default=FALSE).

alpha

Dirichlet hyperparameter(default=1)

y_penalty

Prior coefficient for y-edges, which penalizes signal group size (default=1)

x_penalty

Prior coefficient for x-edges, which penalizes tree size (default=4)

Details

Data needs to be discretized before running SBFC.
If the test data matrix TestX is provided, SBFC runs on the entire training set TrainX, and provides predicted class labels for the test data. If the test data class vector TestY is provided, the accuracy is computed. If the test data matrix TestX is not provided, and cv is set to TRUE, SBFC performs cross-validation on the training data set TrainX, and returns predicted classes and accuracy for the training data.

Value

An object of class sbfc:

accuracy

Classification accuracy (on the test set if provided, otherwise cross-validation accuracy on training set).

predictions

Vector of class label predictions (for the test set if provided, otherwise for the training set).

probabilities

Matrix of class label probabilities (for the test set if provided, otherwise for the training set).

runtime

Total runtime of the algorithm in seconds.

parents

Matrix representing the structures sampled by MCMC, where parents[i,j] is the index of the parent of node j at iteration i (0 if node is a root).

groups

Matrix representing the structures sampled by MCMC, where groups[i,j] indicates which group node j belongs to at iteration j (0 is noise, 1 is signal).

trees

Matrix representing the structures sampled by MCMC, where trees[i,j] indicates which tree node j belongs to at iteration j.

logposterior

Vector representing the log posterior at each iteration of the MCMC.

Parameters

nstep, thin, burnin_denom, cv, thinoutputs, alpha, y_penalty, x_penalty.

If cv=TRUE, the MCMC samples from the first fold are returned (parents, groups, trees, logposterior).

Examples

1
2
3
4
data(madelon)
madelon_result = sbfc(madelon)
data(heart)
heart_result = sbfc(heart, cv=FALSE)

sbfc documentation built on Jan. 16, 2022, 1:06 a.m.