Home

/

CRAN

/

rhf: Random Hazard Forests
In randomForestRHF: Random Hazard Forests

rhf	R Documentation

Random Hazard Forests

Description

Random Hazard Forest (RHF) is a tree-ensemble survival method that estimates case-specific hazards and cumulative hazard functions on a working time grid and supports time-dependent covariates using counting-process (start/stop) format.

Usage

rhf(formula,
      data,
      ntree = 500,
      nsplit = 10,
      treesize = NULL,
      nodesize = NULL,
      block.size = 10,
      bootstrap = c("by.root", "none", "by.user"),
      samptype = c("swor", "swr"),
      samp = NULL,
      case.wt = NULL,
      membership = TRUE,
      sampsize = if (samptype == "swor") function(x){x * .632} else function(x){x},
      xvar.wt = NULL,
      ntime = 50,
      min.events.per.gap = 10,
      seed = NULL,
      do.trace = FALSE,
      ...)

Arguments

`formula`	Model formula specifying the response and predictors.
`data`	Data frame containing the variables referenced in `formula`.
`ntree`	Number of trees to grow in the ensemble.
`nsplit`	Non-negative integer specifying the number of random split points to consider for each variable. When set to 0, all possible split points are evaluated (deterministic splitting), which may be slower. The default is 10.
`treesize`	Specifies the desired tree size, defined as the number of terminal nodes. Can be provided as a function of sample size `n`. If unspecified, an internal default is used.
`nodesize`	Minimum number of cases required in a terminal node. If not specified, an internal default is used.
`block.size`	Controls the granularity of cumulative error rate reporting. Setting this to an integer between 1 and `ntree` yields the cumulative error every `block.size` trees.
`bootstrap`	Bootstrap strategy for generating inbag samples. The default is `"by.root"`, which bootstraps by sampling with or without replacement (default is without; see `samptype`). If set to `"none"`, no bootstrapping is performed (OOB predictions and errors are then unavailable). The option `"by.user"` uses a user-defined bootstrap specified by `samp`.
`samptype`	Sampling type used when `bootstrap = "by.root"`. Options are `"swor"` (sampling without replacement, default) and `"swr"` (sampling with replacement).
`samp`	User-specified bootstrap when `bootstrap = "by.user"`. Should be an `n` by `ntree` array, where each entry gives the number of times a case appears in the inbag sample for a given tree.
`case.wt`	Non-negative vector of case weights (not required to sum to 1). Higher weights increase the probability of an observation being selected during bootstrapping or subsampling. Using real-valued weights is preferred over integer weights.
`membership`	Logical flag indicating whether to return terminal node membership and inbag information.
`sampsize`	Specifies the size of the bootstrap sample when `bootstrap = "by.root"`. For sampling without replacement, it represents the requested subsample size (default is 0.632 times the sample size). For sampling with replacement, it equals the sample size. Can be supplied as a numeric value or a function.
`xvar.wt`	Non-negative vector of variable selection probabilities for splitting. Values do not need to sum to 1. Defaults to uniform selection probability.
`ntime`	Controls the working time grid used for all ensemble calculations. Can be: `ntime >= 1` (integer): Request approximately `ntime` grid points chosen from the observed event times. When used with `min.events.per.gap`, grid points are selected adaptively so that each interval contains at least `min.events.per.gap` events (when possible); therefore the resulting grid may contain fewer than `ntime` points when events are sparse (especially in the tail). numeric vector: A user-supplied set of time points. Each value is aligned (snapped) to the nearest observed event time and duplicates are removed. `0` or `NULL`: Use all observed (unique) event times.
`min.events.per.gap`	Minimum number of observed events required in each time interval when `ntime` is specified as an integer. Together with ntime, this provides an automatic event-balanced grid that allocates more resolution where events are dense and avoids sparse tail intervals.
`seed`	Negative integer setting the random seed for reproducibility.
`do.trace`	Time in seconds between progress updates printed to the console.
`...`	Additional arguments (currently unused).

Details

rhf() grows an ensemble of hazard trees for survival outcomes specified in counting-process notation Surv(id, start, stop, event). Predictors may include both time-static variables and time-dependent covariates (TDCs). The fitted object contains case-specific ensemble estimates of the hazard and cumulative hazard on the working time grid time.interest.

Random Hazard Forests (RHF) extends Random Survival Forests (RSF) in two key ways: (1) it directly estimates the hazard function, and (2) it accommodates time-dependent covariates.

Tree construction uses best-first splitting (BFS) guided by empirical risk reduction. The risk is based on a smooth convex surrogate for the nonparametric log-likelihood functional, as described in Lee, Chen, and Ishwaran (2021).

Splits can occur on both static and time-dependent covariates (TDCs), but not on time itself. Time splitting is unnecessary, as terminal node hazard estimators already incorporate time appropriately. To encourage selection of time-dependent covariates, use the xvar.wt option to weight their importance.

The case-specific hazard estimator uses a "stitched" active-record rule on the time.interest grid: each record is treated as active from just after its start time up to (and including) the next record's start time, with the final record extended to the maximum time in time.interest.

Data Format: Input data must follow standard counting process format and include the following four columns (column names must match those specified in the counting process formula; see examples):

id: A unique integer identifier for each individual. Repeated for multiple rows corresponding to that individual.
start: The start time for each interval. All time values must be scaled to the interval [0, 1].
stop: The stop time for each interval. Must satisfy stop > start.
event: A binary event indicator (0 = censored, 1 = event). Only one event is allowed per individual; otherwise, the individual is treated as censored.

Use the helper function convert.counting to convert conventional survival data to counting process format. See the examples section for further illustration and guidance on data preparation.

Value

An object of class "rhf" containing the fitted forest and related results. Components include (among others) the working time grid time.interest and case-specific ensemble estimates of the hazard and cumulative hazard. When bootstrapping is used, these are typically returned as hazard.oob/chf.oob (out-of-bag) and hazard.inbag/chf.inbag (in-bag). Use predict.rhf to obtain estimates for new data.

Author(s)

Hemant Ishwaran and Udaya B. Kogalur

References

Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R, Rnews, 7(2):25-31.

Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests, Ann. App. Statist., 2:841-860.

Lee, D.K. and Chen N. and Ishwaran H (2021). Boosted nonparametric hazards with time-dependent covariates. Annals of Statistics, 49: 2101-2128.

Ishwaran H. (2025). Multivariate Statistics: Classical Foundations and Modern Machine Learning. Chapman and Hall.

Ishwaran H., Kogalur U.B., Hsich E.M. and Lee D.K. (2026). Random hazard forests.

Examples


  
## ------------------------------------------------------------
##  time-static pbc: parameters set for fast CRAN run
## ------------------------------------------------------------

## load the data
data(pbc, package = "randomForestSRC")
## convert the data to counting process
d <- convert.counting(Surv(days, status) ~ ., na.omit(pbc))
## set the formula
f <- "Surv(id, start, stop, event) ~ ."

## rhf call
print((o <- rhf(f, d, ntree = 3)))

## smooth hazard estimator for specific cases
plot(o, idx=c(1,5,10))
plot(o, idx=c(1,5,10), hazard.only=TRUE)
plot(o, idx=c(1,5,10), hazard.only=TRUE, lwd=0)
plot(o, idx=1:10, lwd=0, hazard.only=TRUE, legend.show=FALSE)



## ------------------------------------------------------------
##  time-static pbc: compare RHF to RSF
## ------------------------------------------------------------

data(pbc, package = "randomForestSRC")
d <- convert.counting(Surv(days, status) ~ ., na.omit(pbc))

## first we run rhf
o.rhf <- rhf("Surv(id, start, stop, event) ~ .", d)
h <- o.rhf$hazard.oob
time <- o.rhf$time.interest
delta <- c(0, diff(time))
H <- apply(h, 1, function(x) {cumsum(delta * x)})
S.rhf <- exp(-H)

## next we run rsf, using the same time.interest grid
o.rsf <- randomForestSRC::rfsrc(Surv(days, status) ~ ., pbc, ntime = time)
S.rsf <- t(o.rsf$survival.oob)

## graphical parameters
bass <- 3
oldpar <- par(mfrow=c(3,2))

## plot survival results 
matplot(time,S.rhf,pch=16)
matplot(time,S.rsf,pch=16)
matplot(time,S.rhf-S.rsf,pch=16)
boxplot(S.rhf-S.rsf,ylab="S.rhf-S.rsf",xaxt="n",outline=FALSE,range=1e-10)
abline(h=0,lwd=3,col=2)

## plot chf and hazard results
matplot(time,H,ylab="CHF",pch=16)
matplot(time,t(h),ylab="hazard",pch=16,type="n",ylim=c(0,quantile(c(h),.95)))
nO <- lapply(1:nrow(h), function(i) {
  lines(supsmu(time, h[i,], bass=bass),type="s",col=gray(.4))
})

par(oldpar)

## ------------------------------------------------------------
##  TDC illustration (using built in hazard simulation function)
## ------------------------------------------------------------

d1 <- hazard.simulation(1)$dta
d2 <- hazard.simulation(2)$dta
d3 <- hazard.simulation(3)$dta
f <- "Surv(id, start, stop, event) ~ ."

o1 <- rhf(f, d1)
o2 <- rhf(f, d2)
o3 <- rhf(f, d3)

plot(o1,idx=4)
plot(o2,idx=1:3)
plot(o3,idx=2)


## ------------------------------------------------------------
##  peakVO2: demonstrates time-varying auc
## ------------------------------------------------------------

data(peakVO2, package = "randomForestSRC")
d <- convert.counting(Surv(ttodead, died)~., peakVO2)
f <- "Surv(id, start, stop, event) ~ ."

## build the forest
o1 <- rhf(f, d, nodesize=1)
o2 <- rhf(f, d, nodesize=15)

## acquire auc-t
a1.chf <- auct.rhf(o1) ## same as auct.rhf(o1, marker = "chf")
a1.haz <- auct.rhf(o1, marker = "haz")
a2.chf <- auct.rhf(o2) ## same as auct.rhf(o2, marker = "chf")
a2.haz <- auct.rhf(o2, marker = "haz")

## print auc-t
print(a1.chf)
print(a2.chf)
print(a1.haz)
print(a2.haz)

## plot auc-t
oldpar <- par(mfrow=c(2,2))
plot(a1.chf, main="nodesize 1")
plot(a1.haz, main="nodesize 1")
plot(a2.chf, main="nodesize 15")
plot(a2.haz, main="nodesize 15")
par(oldpar)

## ------------------------------------------------------------
##  more detailed example of time-varying performance 
## ------------------------------------------------------------

d <- hazard.simulation(1)$dta
f <- "Surv(id, start, stop, event) ~ ."
treesize <- c(10, 30, 100)

oldpar <- par(mfrow=c(3,2))
perf <- lapply(treesize, function(tz) {
  o <- rhf(f, d, treesize=tz, nodesize=1)
  print(o)
  rO <- list(chf = auct.rhf(o, marker="chf"),
             haz = auct.rhf(o, marker="haz"))
  plot(rO$chf, main=paste("chf marker, treesize=", tz))
  plot(rO$haz, main=paste("hazard marker, treesize=", tz))
  rO
})
par(oldpar)

## ------------------------------------------------------------
##  example illustrating tuning the tree size
##  using function 'rhf.tune.treesize'
## ------------------------------------------------------------

d <- hazard.simulation(1)$dta
f <- "Surv(id, start, stop, event) ~ ."

tune <- tune.rhf(f, d)  ## same as rhf.tune.treesize(f,d)
oldpar <- par(mfrow=c(1,1))
plot(tune)
par(oldpar)

## ------------------------------------------------------------
##  example illustrating guided feature selection
##  using built in 'xvar.wt.rhf' helper
## ------------------------------------------------------------

d <- hazard.simulation(1)$dta
f <- "Surv(id, start, stop, event) ~ ."

o <- rhf(f,d)
o.gfs <- rhf(f,d, xvar.wt = xvar.wt.rhf(f, d))

print(o)
print(o.gfs)

randomForestRHF documentation built on April 24, 2026, 1:07 a.m.

randomForestRHF index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

randomForestRHF
Random Hazard Forests

rhf: Random Hazard Forests
In randomForestRHF: Random Hazard Forests

Random Hazard Forests

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to rhf in randomForestRHF...

R Package Documentation

Browse R Packages

We want your feedback!

randomForestRHF Random Hazard Forests

rhf: Random Hazard Forests In randomForestRHF: Random Hazard Forests

Random Hazard Forests

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to rhf in randomForestRHF...

R Package Documentation

Browse R Packages

We want your feedback!

randomForestRHF
Random Hazard Forests

rhf: Random Hazard Forests
In randomForestRHF: Random Hazard Forests