library(aorsf)

Examples

ANOVA importance

The default variable importance technique, ANOVA, is calculated while you fit an oblique random forest ensemble.

fit <- orsf(pbc_orsf, Surv(time, status) ~ . - id)

fit

ANOVA is the default because it is fast, but it may not be as decisive as the permutation and negation techniques for variable selection.

Raw VI values

the 'raw' variable importance values can be accessed from the fit object

fit$get_importance_raw()

these are 'raw' because values for factors have not been aggregated into a single value. Currently there is one value for k-1 levels of a k level factor. For example, you can see edema_1 and edema_0.5 in the importance values above because edema is a factor variable with levels of 0, 0.5, and 1.

Collapse VI across factor levels

To get aggregated values across all levels of each factor,

r # this assumes you used group_factors = TRUE in orsf() fit$importance

r orsf_vi(fit)

Note that you can make the default returned importance values ungrouped by setting group_factors to FALSE in the orsf_vi functions or the orsf function.

Add VI to an oblique random forest

You can fit an oblique random forest without VI, then add VI later

fit_no_vi <- orsf(pbc_orsf,
                  Surv(time, status) ~ . - id,
                  importance = 'none')

# Note: you can't call orsf_vi_anova() on fit_no_vi because anova
# VI can only be computed while the forest is being grown.

orsf_vi_negate(fit_no_vi)

orsf_vi_permute(fit_no_vi)

Oblique random forest and VI all at once

fit an oblique random forest and compute vi at the same time

fit_permute_vi <- orsf(pbc_orsf,
                       Surv(time, status) ~ . - id,
                       importance = 'permute')

# get the vi instantly (i.e., it doesn't need to be computed again)
orsf_vi_permute(fit_permute_vi)

You can still get negation VI from this fit, but it needs to be computed

orsf_vi_negate(fit_permute_vi)

Custom functions for VI

The default prediction accuracy functions work well most of the time:

fit_standard <- orsf(penguins_orsf, bill_length_mm ~ ., tree_seeds = 1)

# Default method for prediction accuracy with VI is R-squared
orsf_vi_permute(fit_standard)

But sometimes you want to do something specific and the defaults just won't work. For these cases, you can compute VI with any function you'd like to measure prediction accuracy by supplying a valid function to the oobag_fun input. For example, we use mean absolute error below. Higher values are considered good when aorsf computes prediction accuracy, so we make our function return a pseudo R-squared based on mean absolute error:

rsq_mae <- function(y_mat, w_vec, s_vec){

 mae_standard <- mean(abs((y_mat - mean(y_mat)) * w_vec))
 mae_fit <- mean(abs((y_mat - s_vec) * w_vec))

 1 - mae_fit / mae_standard

}

fit_custom <- orsf_update(fit_standard, oobag_fun = rsq_mae)

# not much changes, but the difference between variables shrinks
# and the ordering of sex and island has swapped
orsf_vi_permute(fit_custom)


bcjaeger/aorsf documentation built on April 3, 2025, 4:16 p.m.