Description Usage Arguments Value Author(s) References See Also Examples
Classification with Random Forest based on Top Scoring Pairs
1 | tsp.randomForest(x, y = NULL, xtest = NULL, ytest = NULL, ntree = 500, type = "classification", mtry = if (!is.null(y) && !is.factor(y)) max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))), replace = TRUE, classwt = NULL, cutoff, strata, sampsize = if (replace) nrow(x) else ceiling(0.632 * nrow(x)), nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, maxnodes = NULL, importance = FALSE, localImp = FALSE, nPerm = 1, proximity = FALSE, oob.prox = proximity, norm.votes = TRUE, do.trace = FALSE, keep.forest = !is.null(y) && is.null(xtest), keep.inbag = FALSE, ...)
|
x |
a data frame or a matrix of predictors, or a formula describing the model to be fitted |
y |
A response vector. If omitted, tsp.randomForest will run in unsupervised mode. |
xtest |
a data frame or matrix (like x) containing predictors for the test set. |
ytest |
response for the test set. |
ntree |
Number of trees to grow. |
type |
turn on the ”classification" mode in ”randomForest". |
mtry |
Number of top scoring pairs randomly sampled as candidates at each split. |
replace |
Should sampling of cases be done with or without replacement? |
classwt |
Priors of the classes. Need not add up to one. Ignored for regression. |
cutoff |
(Classification only) A vector of length equal to number of classes. The |
strata |
A (factor) variable that is used for stratified sampling. |
sampsize |
Size(s) of sample to draw. For classification, if sampsize is a vector of the length the number of strata, then sampling is stratified by strata, and the elements of sampsize indicate the numbers to be drawn from the strata. |
nodesize |
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). |
maxnodes |
Maximum number of terminal nodes trees in the forest can have. |
importance |
Should importance of top scoring pairs be assessed? |
localImp |
Should casewise importance measure be computed? |
nPerm |
Number of times the OOB data are permuted per tree for assessing top scoring pair importance. |
proximity |
Should proximity measure among the rows be calculated? |
oob.prox |
Should proximity be calculated only on ”out-of-bag" data? |
norm.votes |
If TRUE (default), the final result of votes are expressed as fractions. If FALSE, raw vote counts are returned (useful for combining results from different runs). Ignored for regression. |
do.trace |
If set to TRUE, give a more verbose output as randomForest is run. If set to some integer, then running output is printed for every do.trace trees. |
keep.forest |
If set to FALSE, the forest will not be retained in the output object. If xtest is given, defaults to FALSE. |
keep.inbag |
Should an n by ntree matrix be returned that keeps track of which samples are ”in-bag" in which trees (but not how many times, if sampling with replacement) |
... |
Additional arguments. |
call |
the original call to |
type |
one of |
predicted |
the predicted values of the input data based on out-of-bag samples. |
importance |
a matrix with |
importanceSD |
The “standard errors” of the permutation-based
importance measure. For classification, a |
localImp |
a p by n matrix containing the casewise importance
measures, the [i,j] element of which is the importance of i-th
variable on the j-th case. |
ntree |
number of trees grown. |
mtry |
number of predictors sampled for spliting at each node. |
forest |
(a list that contains the entire forest; |
err.rate |
(classification only) vector error rates of the prediction on the input data, the i-th element being the (OOB) error rate for all trees up to the i-th. |
confusion |
(classification only) the confusion matrix of the prediction (based on OOB data). |
votes |
(classification only) a matrix with one row for each input data point and one column for each class, giving the fraction or number of (OOB) ‘votes’ from the random forest. |
oob.times |
number of times cases are ‘out-of-bag’ (and thus used in computing OOB error estimate) |
proximity |
if |
mse |
(regression only) vector of mean square errors: sum of squared
residuals divided by |
rsq |
(regression only) “pseudo R-squared”: 1 - |
test |
if test set is given (through the |
Xiaolin Yang, Han Liu
Breiman, L. (2001), Random Forests, Machine Learning Breiman, L. (2002), "Manual On Setting Up, Using, And Understanding Random Forests V3.1", http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf.
1 2 3 4 5 6 7 |
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16
Loading required package: tree
Loading required package: randomForest
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
Loading required package: gbm
Loaded gbm 2.1.5
1 2 3 4 5 6 7 8 9 10
1 1 0 1 0 0 0 1 0 1
Levels: 0 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.