pi.surv: Estimate the model of Willems et al. (2025).

View source: R/BoundingCovariateEffects.R

pi.survR Documentation

Estimate the model of Willems et al. (2025).

Description

This function estimates bounds on the coefficients the single- index model \Lambda(x^\top \beta(t)) for the conditional cumulative distribution function of the event time.

Usage

pi.surv(
  data,
  idx.param.of.interest,
  idxs.c,
  t,
  par.space,
  search.method = "GS",
  add.options = list(),
  verbose = 0,
  picturose = FALSE,
  parallel = FALSE
)

Arguments

data

Data frame containing the data on which to fit the model. The columns should be named as follows: 'Y' = observed timed, 'Delta' = censoring indicators, 'X0' = intercept column, 'X1' - 'Xp' = covariates.

idx.param.of.interest

Index of element in the covariate vector for which the identified interval should be estimated. It can also be specified as idx.param.of.interest = "all", in which case identified intervals will be computed for all elements in the parameter vector. Note that idx.param.of.interest = 1 corresponds to the intercept parameter.

idxs.c

Vector of indices of the continuous covariates. Suppose the given data contains 5 covariates, of which 'X2' and 'X5' are continuous, this argument should be specified as idxs.c = c(2, 5).

t

Time point for which to estimate the identified set of \beta(t).

par.space

Matrix containing bounds on the space of the parameters. The first column corresponds to lower bounds, the second to upper bounds. The i'th row corresponds to the bounds on the i'th element in the parameter vector.

search.method

The search method to be used to find the identified interval. Default is search.method = "GS".

add.options

List of additional options to be specified to the method. Notably, it can be used to select the link function \Lambda(t)) that should be considered. Currently, the link function leading to an accelerated failure time model ("AFT_ll", default) and the link function leading to a Cox proportional hazards model ("Cox_wb") are implemented. Other options can range from 'standard' hyperparameters such as the confidence level of the test and number of instrumental functions to be used, to technical hyperparameters regarding the search method and test implementation.

General hyperparameters:

cov.ranges:

known bounds on each of the covariates in the data set.

norm.func.name:

Name of the normalization function to be used. Can be either "normalize.covariates1" or "normalize.covariates2" (default). The former is a simple elementwise rescaling. The latter uses the PCA approach.

inst.func.family:

Family of instrumental functions to be used for all covariates. Options are "box", "spline" and "cd". The former two are only applicable for continuous covariates. The latter can also handle discrete covariates. Default is "cd".

G.c:

The class of instrumental functions used for the continuous covariates in the model, in case "cd" is selected as inst.func.family:. Options are "box" and "spline". Default is "spline".

degree:

The degree of the B-spline functions, should they be used as instrumental functions for the continuous covariates. Default is 3.

link.function:

Name of the link function to be used. Options are "AFT_ll" for the AFT model with log-logistic baseline, or "Cox_wb" for the Cox PH model (originally with Weibull baseline, but now for a general) baseline hazard).

K.bar:

Number of refinement steps when obtaining the critical value. See Bei (2024).

B:

Number of bootstrap samples to be used when obtaining the bootstrap distribution of the test statistic.

ignore.empty.IF:

Boolean value indicating whether instrumental functions with empty support should be ignored. Default is FALSE. The feature ignore.empty.IF = TRUE is experimental, so there might exist edge cases for which the implementation will fail to run.

Hyperparameters specific to the EAM implementation:

min.dist/max.dist:

The minimum/maximum distance of sampled points from the current best value for the coefficient of interest.

min.eval/max.eval:

The minimum/maximum number of points evaluated in the initial feasible point search.

nbr.init.sample.points:

The total number of drawn points required in the initial drawing process.

nbr.init.unif:

The total number of uniformly drawn points in the initial set of starting values.

nbr.points.per.iter.init:

Number of points sampled per iteration in the initial drawing process.

nbr.start.vals:

Number of starting values for which to run the optimization algorithm for the expected improvement.

nbr.opt.EI:

Number of optimal theta values found by the optimization algorithm to return.

nbr.extra:

Number of extra randomly drawn points to add to the set of optimal theta values (to be supplied to the next E-step).

min.improvement:

Minimum amount that the current best root of the violation curve should improve by wrt. the its previous value.

min.possible.improvement:

Minimum amount that the next iteration should be able to improve upon the current best value of the root.

EAM.min.iter:

Minimum amount of EAM iterations to run.

max.iter:

Maximum amount of EAM iterations to run.

Hyperparameters specific to the gridsearch implementation:

min.eval/max.eval:

Minimum and maximum number of evaluations.

next.gs.point:

Function that determines the next point in the grid search sequence.

step.size:

Step size of the grid.

bin.search.tol:

Binary search tolerance.

max.iter:

Maximum number of iterations that the algorithm can run.

Other (hidden) options can also be overwritten, though we highly discourage this. If necessary, you can consult the source code of this functions to find the names of the desired parameters and add their name alongside their desired value as an entry in options (e.g. options$min.var <- 1e-4. Again, not recommended!).

verbose

Verbosity level. The higher the value, the more verbose the method will be. Default is verbose = 0.

picturose

Picturosity flag. If TRUE, a plot illustrating the workings of the algorithm will updated during runtime. Default is picturose = FALSE.

parallel

Flag for whether or not parallel computing should be used. Default is parallel = FALSE. When parallel = TRUE, this implementation will use min(detectCores() - 1, 10) cores to construct the parallel back-end.

Value

Matrix containing the identified intervals of the specified coefficients, as well as corresponding convergence information of the estimation algorithm.

References

Willems, I., Beyhum, J. and Van Keilegom, I. (2025). Partial identification for a class of survival models under dependent censoring. (Submitted).

Examples



  # Clear workspace
  rm(list = ls())

  # Load the survival package
  library(survival)

  # Set random seed
  set.seed(123)

  # Load and preprocess data
  data <- survival::lung
  data[, "intercept"] <- rep(1, nrow(data))
  data[, "status"] <- data[, "status"] - 1
  data <- data[, c("time", "status", "intercept", "age", "sex")]
  colnames(data) <- c("Y", "Delta", "X0", "X1", "X2")

  # Standardize age variable
  data[, "X1"] <- scale(data[, "X1"])

  ## Example:
  ## - Link function: AFT link function (default setting)
  ## - Number of IF: 5 IF per continuous covariate (default setting)
  ## - Search method: Binary search
  ## - Type of IF: Cubic spline functions for continuous covariate, indicator
  ##   function for discrete covariate (default setting).

  # Settings for main estimation function
  idx.param.of.interest <- 2 # Interest in effect of age
  idxs.c <- 1                # X1 (age) is continuous
  t <- 200                   # Model imposed at t = 200
  search.method <- "GS"      # Use binary search
  par.space <- matrix(rep(c(-10, 10), 3), nrow = 3, byrow = TRUE)
  add.options <- list()
  picturose <- TRUE
  parallel <- FALSE

  # Estimate the identified intervals
  pi.surv(data, idx.param.of.interest, idxs.c, t, par.space,
          search.method = search.method, add.options = add.options,
          picturose = picturose, parallel = parallel)




depCensoring documentation built on Nov. 7, 2025, 1:06 a.m.