View source: R/importance.rhf.R
| importance.rhf | R Documentation |
Computes time-localized variable importance for a random hazard forest
(RHF) by using VarPro (variable priority) importance and restricting
rule and near-miss memberships to pseudo-individuals whose start-stop
intervals overlap selected windows from the master time grid
time.interest. The working response used for VarPro importance is
now taken directly from the RHF object as the logarithm of the upstream
integrated hazard exposure, giving a fast localized view of how variable
importance evolves over time.
importance.rhf(o,
cache = NULL,
time.index = NULL,
trim = 0.1,
sort = TRUE,
max.rules.tree,
max.tree,
eps = 1e-6,
y.external = NULL,
verbose = FALSE,
...)
varpro.cache.rhf(o,
max.rules.tree = 150L,
max.tree = 150L,
y.external = NULL,
eps = 1e-6,
verbose = FALSE)
## S3 method for class 'importance.rhf'
print(x,
top = 10L,
rank.by = c("q90", "median", "mean", "max"),
digits = 4L,
scientific.threshold = 1e4,
...)
## S3 method for class 'importance.rhf'
as.data.frame(x,
row.names = NULL,
optional = FALSE,
format = c("long", "variable_by_time", "time_by_variable"),
...)
dotmatrix.importance.rhf(x,
vars = NULL,
top_n_union = 15L,
variable.labels = NULL,
time.labels = NULL,
sort_by = c("q90", "sum", "max", "mean", "median", "alphabetical", "cluster", "none"),
sort_abs = TRUE,
transform = c("none", "log10"),
color_by = c("value", "sign", "single", "none"),
point_color = "steelblue4",
value_colors = c("grey85", "steelblue4"),
sign_colors = c("firebrick3", "grey90", "steelblue4"),
cex.range = c(0.6, 3.2),
size.cap = 0.99,
color.cap = 0.99,
alpha = 0.9,
show.grid = TRUE,
grid.col = "grey92",
legend = TRUE,
display.note = TRUE,
xlab = "",
ylab = "",
main = "RHF time-localized VarPro importance",
axis.cex = 0.7,
var.cex = 0.7,
time.label.srt = 45,
save_plot = FALSE,
out.file = "rhf_time_varpro_dotmatrix.pdf",
width = 11,
height = NULL,
mar = NULL,
legend.width = 0.7,
...)
## S3 method for class 'importance.rhf'
plot(x,
type = c("dotmatrix", "lines"),
vars = NULL,
top = 10L,
rank.by = c("q90", "median", "mean", "max"),
curve = c("step", "line", "lowess"),
smooth.f = 2/3,
display.cap = 0.99,
display.note = TRUE,
xlab = NULL,
ylab = NULL,
lty = 1,
lwd = 2,
...)
o |
A RHF object with class |
cache |
Optional cache object returned by |
time.index |
Optional vector identifying which windows of the time
grid |
trim |
Tuning parameter passed to the underlying VarPro
importance workhorse. |
sort |
Logical. If |
max.rules.tree, max.tree |
Arguments controlling rule extraction when the cache is built. |
y.external |
Optional externally supplied working response. When
|
eps |
Nonnegative value added before taking the logarithm of the
integrated hazard exposure when |
verbose |
Logical. If |
x |
An object of class |
top, rank.by |
Arguments used by |
digits, scientific.threshold |
Formatting controls for
|
row.names, optional |
Included for compatibility with
|
format |
Output format for |
type, vars, top_n_union |
Arguments controlling which
variables are displayed and which plot is produced. |
curve, smooth.f, lty, lwd, display.cap, display.note |
Arguments for
|
variable.labels, time.labels, sort_by, sort_abs |
Arguments controlling variable labeling and ordering in the dot-matrix plot. Variable labels may be supplied as a named vector or a two-column data frame. Variables may be ordered by robust aggregate importance, alphabetically, hierarchical clustering, or left in their existing order. |
transform, color_by, point_color, value_colors, sign_colors, cex.range, size.cap, color.cap, alpha, legend |
Arguments controlling dot size, color encoding, display-only quantile capping, transparency, and the optional right-side legend in the dot-matrix plot. |
xlab, ylab, main, axis.cex, var.cex, time.label.srt, show.grid, grid.col, mar, legend.width, width, height, save_plot, out.file |
Display,
layout, and export options for the plotting helpers. By default the
dot-matrix plot uses blank axis labels, draws light guide lines, computes
margins automatically, and can optionally be written to file with
|
... |
Additional arguments passed to internal calculations or plotting routines. |
This routine implements a fast localization strategy for RHF
VarPro importance. The master time grid is taken from
time.interest. For a window corresponding to a selected grid
index, the method keeps only those pseudo-individuals whose start-stop
interval overlaps that window, while reusing the same sampled rules and
near-miss sets already obtained from the RHF fit.
For RHF objects the underlying VarPro importance calculation follows a regression-style approach in which the working response is the logarithm of the integrated hazard exposure, and local rule importance is computed by comparing this working response in a rule versus its near-miss set. Time localization is achieved by restricting those memberships within each window rather than rebuilding the entire rule structure repeatedly.
The helper varpro.cache() stores the minimum information needed
for repeated localized importance calculations: a regression-style rule
template, window metadata, the working response source, and precomputed
window-local rule statistics. During cache construction, raw OOB and
complementary memberships are converted into compact per-window rule
summaries, so the later window sweep does not need to rescan membership
vectors.
The returned importance matrix has variables in rows and selected time windows in columns. Column names correspond to the right endpoints of the selected windows. The long-format table contains the same values together with window metadata such as start, stop, midpoint, number at risk, and number of active rules.
Printing and plotting share a robust strategy. Summaries default to a robust over-time ranking based on the 90th percentile, and the plotting helpers apply optional quantile capping for display only. This prevents rare extreme spikes from flattening curves while preserving the original importance matrix for downstream analyses.
varpro.cache() returns an object containing cached rule
memberships, the working response used for importance, start-stop
information for pseudo-individuals, time-window metadata, and the rule
extraction settings.
importance.rhf() returns a list including:
importance.matrix: matrix of localized importance values with
variables in rows and selected time windows in columns.
importance.long: long-format data frame containing variable,
time, window metadata, and localized importance.
window.info: data frame describing the analyzed windows,
including start, stop, midpoint, n.risk, and n.rules.
y.source: source of the working response. This is
"int.haz.oob", "int.haz.test", or "y.external".
trim: tuning value used in importance aggregation.
print() returns its input invisibly after displaying a short summary
that includes robust over-time summaries for the leading variables.
as.data.frame() returns one of the supported data-frame views.
dotmatrix.importance.rhf() produces a base-R dot-matrix plot and
returns plotting metadata invisibly.
plot() returns invisibly the result of the underlying plotting helper.
rhf
################################################################
##
## simulation model
##
################################################################
## draw simulation (can be modified)
n <- 400
p <- 10
simid <- 2
d <- hazard.simulation(type = simid, n = n, p = p, nrecords = 4)$dta
## fit a RHF model with weighted mtry (use for high-dimension)
f <- "Surv(id, start, stop, event) ~ ."
o <- rhf(f, d, ntree = 50, nsplit = 5, xvar.wt = xvar.wt.rhf(f, d))
print(o)
## time-localized RHF importance across the full time grid
imp.t <- importance.rhf(o)
print(imp.t)
## extract the variable-by-time matrix
print(head(imp.t$importance.matrix))
oldpar <- par(mfrow=c(1,1))
## dot-matrix importance plot (default)
plot(imp.t)
## step-style importance line plot for the top variables
## (ranked by the 90th percentile over time and display-capped at q99)
plot(imp.t, type = "lines", top = 10)
## smoothed importance plot for all variables with display capping
plot(imp.t, type = "lines", curve = "lowess", smooth.f = 0.5,
display.cap = 0.95)
## dot-matrix plot with robust ordering and display capping
plot(imp.t, sort_by = "q90", size.cap = 0.99, color.cap = 0.99)
par(oldpar)
## reuse a cache for repeated calls on subsets of the time grid
cache <- varpro.cache(o)
imp.t.sub <- importance.rhf(
o,
cache = cache,
time.index = seq(1, length(o$time.interest), by = 5),
verbose = TRUE
)
## long-format export
print(head(as.data.frame(imp.t.sub)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.