importance.rhf: Time-localized VarPro importance for random hazard forests
In randomForestRHF: Random Hazard Forests

View source: R/importance.rhf.R

importance.rhf

R Documentation

Time-localized VarPro importance for random hazard forests

Description

Computes time-localized variable importance for a random hazard forest (RHF) by using VarPro (variable priority) importance and restricting rule and near-miss memberships to pseudo-individuals whose start-stop intervals overlap selected windows from the master time grid time.interest. The working response used for VarPro importance is now taken directly from the RHF object as the logarithm of the upstream integrated hazard exposure, giving a fast localized view of how variable importance evolves over time.

Usage

importance.rhf(o,
 cache = NULL,
 time.index = NULL,
 trim = 0.1,
 sort = TRUE,
 max.rules.tree,
 max.tree,
 eps = 1e-6,
 y.external = NULL,
 verbose = FALSE,
 ...)

varpro.cache.rhf(o,
 max.rules.tree = 150L,
 max.tree = 150L,
 y.external = NULL,
 eps = 1e-6,
 verbose = FALSE)

## S3 method for class 'importance.rhf'
print(x,
 top = 10L,
 rank.by = c("q90", "median", "mean", "max"),
 digits = 4L,
 scientific.threshold = 1e4,
 ...)

## S3 method for class 'importance.rhf'
as.data.frame(x,
  row.names = NULL,
  optional = FALSE,
  format = c("long", "variable_by_time", "time_by_variable"),
  ...)

dotmatrix.importance.rhf(x,
 vars = NULL,
 top_n_union = 15L,
 variable.labels = NULL,
 time.labels = NULL,
 sort_by = c("q90", "sum", "max", "mean", "median", "alphabetical", "cluster", "none"),
 sort_abs = TRUE,
 transform = c("none", "log10"),
 color_by = c("value", "sign", "single", "none"),
 point_color = "steelblue4",
 value_colors = c("grey85", "steelblue4"),
 sign_colors = c("firebrick3", "grey90", "steelblue4"),
 cex.range = c(0.6, 3.2),
 size.cap = 0.99,
 color.cap = 0.99,
 alpha = 0.9,
 show.grid = TRUE,
 grid.col = "grey92",
 legend = TRUE,
 display.note = TRUE,
 xlab = "",
 ylab = "",
 main = "RHF time-localized VarPro importance",
 axis.cex = 0.7,
 var.cex = 0.7,
 time.label.srt = 45,
 save_plot = FALSE,
 out.file = "rhf_time_varpro_dotmatrix.pdf",
 width = 11,
 height = NULL,
 mar = NULL,
 legend.width = 0.7,
 ...)

## S3 method for class 'importance.rhf'
plot(x,
 type = c("dotmatrix", "lines"),
 vars = NULL,
 top = 10L,
 rank.by = c("q90", "median", "mean", "max"),
 curve = c("step", "line", "lowess"),
 smooth.f = 2/3,
 display.cap = 0.99,
 display.note = TRUE,
 xlab = NULL,
 ylab = NULL,
 lty = 1,
 lwd = 2,
 ...)

Arguments

`o`	A RHF object with class `"rhf"`.
`cache`	Optional cache object returned by `varpro.cache()`. If `NULL`, the cache is built internally. Supplying a cache is useful when repeated calls are made.
`time.index`	Optional vector identifying which windows of the time grid `o$time.interest` are to be analyzed. This may be an integer index vector or a logical vector of length `length(o$time.interest)`. If omitted, all windows are used.
`trim`	Tuning parameter passed to the underlying VarPro importance workhorse. `trim` controls winsorized aggregation across trees.
`sort`	Logical. If `TRUE`, variables are ordered within each window in decreasing importance before the long-format output is assembled.
`max.rules.tree`, `max.tree`	Arguments controlling rule extraction when the cache is built.
`y.external`	Optional externally supplied working response. When `NULL`, the working response is built internally from the RHF object's integrated hazard exposure values.
`eps`	Nonnegative value added before taking the logarithm of the integrated hazard exposure when `y.external` is not supplied.
`verbose`	Logical. If `TRUE`, reports cache construction and per-window progress.
`x`	An object of class `"importance.rhf"`.
`top`, `rank.by`	Arguments used by `print()` and by `plot(type = "lines")`. Printing now ranks variables robustly over time by default using `rank.by = "q90"`. The line plot also uses `rank.by` when `vars` is omitted.
`digits`, `scientific.threshold`	Formatting controls for `print()`, used to keep very large importance values readable.
`row.names`, `optional`	Included for compatibility with `as.data.frame()`.
`format`	Output format for `as.data.frame()`. `"long"` returns the long-format table, `"variable_by_time"` returns a data frame whose rows are variables and columns are times, and `"time_by_variable"` returns the transpose with window metadata.
`type`, `vars`, `top_n_union`	Arguments controlling which variables are displayed and which plot is produced. `type = "dotmatrix"` gives the time-by-variable dot-matrix display; `type = "lines"` gives a line, step, or smoothed view for selected variables. When `vars` is omitted, the line plot chooses `top` variables using `rank.by`, while the dot-matrix plot uses the union of the top `top_n_union` variables across time.
`curve`, `smooth.f`, `lty`, `lwd`, `display.cap`, `display.note`	Arguments for `type = "lines"`. `curve` chooses between step, ordinary line, and lowess-smoothed displays; `smooth.f` is passed to `stats::lowess()` when needed; `lty` and `lwd` control line type and line width. `display.cap` applies display-only quantile capping to stabilize the vertical scale in the presence of extreme spikes, and `display.note` toggles the on-plot note when capping is applied. The same `display.note` flag is also used by the dot-matrix plot.
`variable.labels`, `time.labels`, `sort_by`, `sort_abs`	Arguments controlling variable labeling and ordering in the dot-matrix plot. Variable labels may be supplied as a named vector or a two-column data frame. Variables may be ordered by robust aggregate importance, alphabetically, hierarchical clustering, or left in their existing order.
`transform`, `color_by`, `point_color`, `value_colors`, `sign_colors`, `cex.range`, `size.cap`, `color.cap`, `alpha`, `legend`	Arguments controlling dot size, color encoding, display-only quantile capping, transparency, and the optional right-side legend in the dot-matrix plot.
`xlab`, `ylab`, `main`, `axis.cex`, `var.cex`, `time.label.srt`, `show.grid`, `grid.col`, `mar`, `legend.width`, `width`, `height`, `save_plot`, `out.file`	Display, layout, and export options for the plotting helpers. By default the dot-matrix plot uses blank axis labels, draws light guide lines, computes margins automatically, and can optionally be written to file with `save_plot = TRUE`.
`...`	Additional arguments passed to internal calculations or plotting routines.

Details

This routine implements a fast localization strategy for RHF VarPro importance. The master time grid is taken from time.interest. For a window corresponding to a selected grid index, the method keeps only those pseudo-individuals whose start-stop interval overlaps that window, while reusing the same sampled rules and near-miss sets already obtained from the RHF fit.

For RHF objects the underlying VarPro importance calculation follows a regression-style approach in which the working response is the logarithm of the integrated hazard exposure, and local rule importance is computed by comparing this working response in a rule versus its near-miss set. Time localization is achieved by restricting those memberships within each window rather than rebuilding the entire rule structure repeatedly.

The helper varpro.cache() stores the minimum information needed for repeated localized importance calculations: a regression-style rule template, window metadata, the working response source, and precomputed window-local rule statistics. During cache construction, raw OOB and complementary memberships are converted into compact per-window rule summaries, so the later window sweep does not need to rescan membership vectors.

The returned importance matrix has variables in rows and selected time windows in columns. Column names correspond to the right endpoints of the selected windows. The long-format table contains the same values together with window metadata such as start, stop, midpoint, number at risk, and number of active rules.

Printing and plotting share a robust strategy. Summaries default to a robust over-time ranking based on the 90th percentile, and the plotting helpers apply optional quantile capping for display only. This prevents rare extreme spikes from flattening curves while preserving the original importance matrix for downstream analyses.

Value

varpro.cache() returns an object containing cached rule memberships, the working response used for importance, start-stop information for pseudo-individuals, time-window metadata, and the rule extraction settings.

importance.rhf() returns a list including:

importance.matrix: matrix of localized importance values with variables in rows and selected time windows in columns.
importance.long: long-format data frame containing variable, time, window metadata, and localized importance.
window.info: data frame describing the analyzed windows, including start, stop, midpoint, n.risk, and n.rules.
y.source: source of the working response. This is "int.haz.oob", "int.haz.test", or "y.external".
trim: tuning value used in importance aggregation.

print() returns its input invisibly after displaying a short summary that includes robust over-time summaries for the leading variables.

as.data.frame() returns one of the supported data-frame views.

dotmatrix.importance.rhf() produces a base-R dot-matrix plot and returns plotting metadata invisibly.

plot() returns invisibly the result of the underlying plotting helper.

Examples


################################################################
##
## simulation model
##
################################################################

## draw simulation (can be modified)
n <- 400
p <- 10
simid <- 2
d <- hazard.simulation(type = simid, n = n, p = p, nrecords = 4)$dta

## fit a RHF model with weighted mtry (use for high-dimension)
f <- "Surv(id, start, stop, event) ~ ."
o <- rhf(f, d, ntree = 50, nsplit = 5, xvar.wt = xvar.wt.rhf(f, d))
print(o)

## time-localized RHF importance across the full time grid
imp.t <- importance.rhf(o)
print(imp.t)

## extract the variable-by-time matrix
print(head(imp.t$importance.matrix))

oldpar <- par(mfrow=c(1,1))

## dot-matrix importance plot (default)
plot(imp.t)

## step-style importance line plot for the top variables
## (ranked by the 90th percentile over time and display-capped at q99)
plot(imp.t, type = "lines", top = 10)

## smoothed importance plot for all variables with display capping
plot(imp.t, type = "lines", curve = "lowess", smooth.f = 0.5,
     display.cap = 0.95)

## dot-matrix plot with robust ordering and display capping
plot(imp.t, sort_by = "q90", size.cap = 0.99, color.cap = 0.99)

par(oldpar)


## reuse a cache for repeated calls on subsets of the time grid
cache <- varpro.cache(o)
imp.t.sub <- importance.rhf(
  o,
  cache = cache,
  time.index = seq(1, length(o$time.interest), by = 5),
  verbose = TRUE
)

## long-format export
print(head(as.data.frame(imp.t.sub)))

randomForestRHF documentation built on April 24, 2026, 1:07 a.m.