library(aorsf)
The default variable importance technique, ANOVA, is calculated while you fit an oblique random forest ensemble.
fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id) fit
ANOVA is the default because it is fast, but it may not be as decisive as the permutation and negation techniques for variable selection.
the 'raw' variable importance values can be accessed from the fit object
fit$get_importance_raw()
these are 'raw' because values for factors have not been aggregated into a single value. Currently there is one value for k-1 levels of a k level factor. For example, you can see edema_1 and edema_0.5 in the importance values above because edema is a factor variable with levels of 0, 0.5, and 1.
To get aggregated values across all levels of each factor,
importance
element from the orsf
fit:r
# this assumes you used group_factors = TRUE in orsf()
fit$importance
orsf_vi()
with group_factors set to TRUE
(the default)r
orsf_vi(fit)
Note that you can make the default returned importance values ungrouped by setting group_factors
to FALSE
in the orsf_vi
functions or the orsf
function.
You can fit an oblique random forest without VI, then add VI later
fit_no_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'none') # Note: you can't call orsf_vi_anova() on fit_no_vi because anova # VI can only be computed while the forest is being grown. orsf_vi_negate(fit_no_vi) orsf_vi_permute(fit_no_vi)
fit an oblique random forest and compute vi at the same time
fit_permute_vi <- orsf(pbc_orsf, Surv(time, status) ~ . - id, importance = 'permute') # get the vi instantly (i.e., it doesn't need to be computed again) orsf_vi_permute(fit_permute_vi)
You can still get negation VI from this fit, but it needs to be computed
orsf_vi_negate(fit_permute_vi)
The default prediction accuracy functions work well most of the time:
fit_standard <- orsf(penguins_orsf, bill_length_mm ~ ., tree_seeds = 1) # Default method for prediction accuracy with VI is R-squared orsf_vi_permute(fit_standard)
But sometimes you want to do something specific and the defaults just won't work. For these cases, you can compute VI with any function you'd like to measure prediction accuracy by supplying a valid function to the oobag_fun
input. For example, we use mean absolute error below. Higher values are considered good when aorsf
computes prediction accuracy, so we make our function return a pseudo R-squared based on mean absolute error:
rsq_mae <- function(y_mat, w_vec, s_vec){ mae_standard <- mean(abs((y_mat - mean(y_mat)) * w_vec)) mae_fit <- mean(abs((y_mat - s_vec) * w_vec)) 1 - mae_fit / mae_standard } fit_custom <- orsf_update(fit_standard, oobag_fun = rsq_mae) # not much changes, but the difference between variables shrinks # and the ordering of sex and island has swapped orsf_vi_permute(fit_custom)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.