honestRF-honestRF: honestRF-Constructor

Description Usage Arguments Format Value

Description

Initialize a 'honestRF' object.

Usage

1
2
3
4
5
6
honestRF(x, y, ntree = 500, replace = TRUE, sampsize = if (replace)
  nrow(x) else ceiling(0.632 * nrow(x)), sample.fraction = NULL,
  mtry = max(floor(ncol(x)/3), 1), nodesizeSpl = 3, nodesizeAvg = 3,
  splitratio = 1, seed = as.integer(runif(1) * 1000), verbose = FALSE,
  nthread = 0, splitrule = "variance", middleSplit = FALSE,
  reuseHonestRF = NULL)

Arguments

x

A data frame of all training predictors.

y

A vector of all training responses.

ntree

The number of trees to grow in the forest. The default value is 500.

replace

An indicator of whether sampling of training data is with replacement. The default value is TRUE.

sampsize

The size of total samples to draw for the training data. If sampling with replacement, the default value is the length of the training data. If samplying without replacement, the default value is two-third of the length of the training data.

sample.fraction

if this is given, then sampsize is ignored and set to be round(length(y) * sample.fraction). It must be a real number between 0 and 1

mtry

The number of variables randomly selected at each split point. The default value is set to be one third of total number of features of the training data.

nodesizeSpl

The minimum observations contained in terminal nodes. The default value is 3.

nodesizeAvg

Minimum size of terminal nodes for averaging dataset. The default value is 3.

splitratio

Proportion of the training data used as the splitting dataset. It is a ratio between 0 and 1. If the ratio is 1, then essentially splitting dataset becomes the total entire sampled set and the averaging dataset is empty. If the ratio is 0, then the splitting data set is empty and all the data is used for the averaging data set (This is not a good usage however since there will be no data available for splitting).

seed

random seed

verbose

if training process in verbose mode

nthread

Number of threads to train and predict thre forest. The default number is 0 which represents using all cores.

splitrule

only variance is implemented at this point and it contains specifies the loss function according to which the splits of random forest should be made

middleSplit

if the split value is taking the average of two feature values. If false, it will take a point based on a uniform distribution between two feature values. (Default = FALSE)

reuseHonestRF

pass in an 'honestRF' object which will recycle the dataframe the old object created. It will save some space working on the same dataset.

Format

An object of class NULL of length 0.

Value

A 'honestRF' object.


soerenkuenzel/hte documentation built on June 12, 2018, 4:26 p.m.