honestRF_R-honestRF_R: honestRF_R Constructor

Description Usage Arguments Format Value

Description

Initialize a 'honestRF_R' object.

Usage

1
2
3
4
honestRF_R(x, y, ntree = 500, replace = TRUE, sampsize = if (replace)
  nrow(x) else ceiling(0.632 * nrow(x)), mtry = max(floor(ncol(x)/3), 1),
  nodesizeSpl = 5, nthread = 1, splitrule = "variance",
  avgfunc = avgMean, splitratio = 1, nodesizeAvg = 5)

Arguments

x

A data frame of all training predictors.

y

A vector of all training responses.

ntree

The number of trees to grow in the forest. The default value is 500.

replace

An indicator of whether sampling of training data is with replacement. The default value is TRUE.

sampsize

The size of total samples to draw for the training data. If sampling with replacement, the default value is the length of the training data. If samplying without replacement, the default value is two-third of the length of the training data.

mtry

The number of variables randomly selected at each split point. The default value is set to be one third of total number of features of the training data.

nodesizeSpl

The minimum observations contained in terminal nodes. The default value is 5.

nthread

The number of threads to use in parallel computing. The default value is 1.

splitrule

A string to specify how to find the best split among all candidate feature values. The current version only supports 'variance' which minimizes the overall MSE after splitting. The default value is 'variance'.

avgfunc

An averaging function to average observations in the node. The function is used for prediction. The input of this function should be a dataframe of predictors 'x' and a vector of outcomes 'y'. The output is a scalar. The default function is to take the mean of vector 'y'.

splitratio

Proportion of the training data used as the splitting dataset. It is a ratio between 0 and 1. If the ratio is 1, then essentially splitting dataset becomes the total entire sampled set and the averaging dataset is empty. If the ratio is 0, then the splitting data set is empty and all the data is used for the averaging data set (This is not a good usage however since there will be no data available for splitting).

nodesizeAvg

Minimum size of terminal nodes for averaging dataset. The default value is 5.

Format

An object of class NULL of length 0.

Value

A 'honestRF_R' object.


soerenkuenzel/hte documentation built on June 12, 2018, 4:26 p.m.