View source: R/max.subtree.rfsrc.R
max.subtree.rfsrc | R Documentation |
Extract maximal subtree information from a RF-SRC object. Used for variable selection and identifying interactions between variables.
## S3 method for class 'rfsrc'
max.subtree(object,
max.order = 2, sub.order = FALSE, conservative = FALSE, ...)
object |
An object of class |
max.order |
Non-negative integer specifying the maximum interaction order for which minimal depth is calculated. Defaults to 2. Set |
sub.order |
Logical. If |
conservative |
Logical. If |
... |
Additional arguments passed to or from other methods. |
The maximal subtree for a variable x is the largest subtree in which the root node splits on x. The largest possible maximal subtree is the full tree (root node), though multiple maximal subtrees may exist for a variable. A variable may also have no maximal subtree if it is never used for splitting. See Ishwaran et al. (2010, 2011) for further discussion.
The minimal depth of a maximal subtree-called the first-order depth-quantifies the predictive strength of a variable. It is defined as the distance from the root node to the parent of the closest maximal subtree for x. Smaller values indicate stronger predictive impact. A variable is flagged as strong if its minimal depth is below the mean of the minimal depth distribution.
The second-order depth is the distance from the root to the second-closest maximal subtree of x. To request depths beyond first order, use the max.order
option (e.g., max.order = 2
returns both first and second-order depths). Set max.order = 0
to retrieve first-order depths for each variable in each tree.
Set sub.order = TRUE
to obtain the relative minimal depth of
each variable j within the maximal subtree of another variable
i. This returns a p x p
matrix (with p
the number
of variables) whose entry (i,j) is the normalized relative depth of
j in i's subtree. Entry (i,i) gives the depth of
i relative to the root. Read the matrix across rows to assess
inter-variable relationships: small (i,j) entries suggest interactions
between variables i and j. See find.interaction
for further details.
For competing risks, all analyses are unconditional (non-event specific).
Invisibly returns a list with the following components:
order |
Matrix of order depths for each variable up to
|
count |
Average number of maximal subtrees per variable, normalized by tree size. |
nodes.at.depth |
List of vectors recording the number of non-terminal nodes at each depth level for each tree. |
sub.order |
Matrix of average minimal depths of each variable relative to others (i.e., conditional minimal depth matrix). |
threshold |
Threshold value for selecting strong variables based on the mean of the minimal depth distribution. |
threshold.1se |
Conservative threshold equal to the mean minimal depth plus one standard error. |
topvars |
Character vector of selected variable names using the |
topvars.1se |
Character vector of selected variable names using the |
percentile |
Percentile value of minimal depth for each variable. |
density |
Estimated density of the minimal depth distribution. |
second.order.threshold |
Threshold used for selecting strong second-order depth variables. |
Hemant Ishwaran and Udaya B. Kogalur
Ishwaran H., Kogalur U.B., Gorodeski E.Z, Minn A.J. and Lauer M.S. (2010). High-dimensional variable selection for survival data. J. Amer. Statist. Assoc., 105:205-217.
Ishwaran H., Kogalur U.B., Chen X. and Minn A.J. (2011). Random survival forests for high-dimensional data. Statist. Anal. Data Mining, 4:115-132.
holdout.vimp.rfsrc
,
vimp.rfsrc
## ------------------------------------------------------------
## survival analysis
## first and second order depths for all variables
## ------------------------------------------------------------
data(veteran, package = "randomForestSRC")
v.obj <- rfsrc(Surv(time, status) ~ . , data = veteran)
v.max <- max.subtree(v.obj)
# first and second order depths
print(round(v.max$order, 3))
# the minimal depth is the first order depth
print(round(v.max$order[, 1], 3))
# strong variables have minimal depth less than or equal
# to the following threshold
print(v.max$threshold)
# this corresponds to the set of variables
print(v.max$topvars)
## ------------------------------------------------------------
## regression analysis
## try different levels of conservativeness
## ------------------------------------------------------------
mtcars.obj <- rfsrc(mpg ~ ., data = mtcars)
max.subtree(mtcars.obj)$topvars
max.subtree(mtcars.obj, conservative = TRUE)$topvars
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.