knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/benchmarking-", fig.width = 7, fig.height = 5, dpi = 150, message = FALSE, warning = FALSE ) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")
The bigPLScox package now provides three distinct engines for extracting PLS
components in Cox regression: the original matrix-based solver (big_pls_cox),
the new Armadillo backend (big_pls_cox_fast), and the stochastic gradient
descent routine (big_pls_cox_gd). This vignette shows how to benchmark those
implementations against coxgpls() and the standard survival::coxph() fit.
We rely on the bench package to collect timing and
memory statistics. All chunks are wrapped in the LOCAL flag so that CRAN builds
can skip the heavier computations.
needed <- c("bigmemory", "bench", "survival", "plsRcox", "bigPLScox") has <- vapply(needed, requireNamespace, logical(1), quietly = TRUE) if (!all(has)) { missing <- paste(needed[!has], collapse = ", ") knitr::opts_chunk$set(eval = FALSE) message("Note: skipping code execution because these packages are missing: ", missing) }
library(bigPLScox) library(survival) library(bench) library(bigmemory) library(plsRcox)
We reuse the helper dataCox() to generate survival outcomes with right
censoring. Adjust n and p to stress-test the solvers on larger problems.
set.seed(2024) sim_design <- dataCox( n = 2000, lambda = 2, rho = 1.5, x = matrix(rnorm(2000 * 50), ncol = 50), beta = c(1, 3, rep(0, 48)), cens.rate = 5 ) cox_data <- list( x = as.matrix(sim_design[, -(1:3)]), time = sim_design$time, status = sim_design$status ) X_big <- bigmemory::as.big.matrix(cox_data$x)
coxgpls() workflow and
the dense variants of the big-memory solvers against survival::coxph().big_pls_cox_fast() and
big_pls_cox_gd() directly on a big.matrix, illustrating the benefit of the
shared-memory interface.The number of iterations can be increased for more stable measurements when running locally.
bench_dense <- bench::mark( coxgpls = coxgpls( cox_data$x, cox_data$time, cox_data$status, ncomp = 5 ), big_pls = big_pls_cox(cox_data$x, cox_data$time, cox_data$status, ncomp = 5), big_pls_fast = big_pls_cox_fast(cox_data$x, cox_data$time, cox_data$status, ncomp = 5), big_pls_bb = big_pls_cox_gd(X_big, cox_data$time, cox_data$status, ncomp = 5), big_pls_gd = big_pls_cox_gd(X_big, cox_data$time, cox_data$status, ncomp = 5, method = "gd"), coxph = coxph(Surv(cox_data$time, cox_data$status) ~ cox_data$x, ties = "breslow"), iterations = 75, check = FALSE ) bench_dense$expression <- names(bench_dense$expression) bench_dense[, c("expression", "median", "itr/sec", "mem_alloc")]
A higher itr/sec value indicates faster throughput. The fast backend should
provide the best compromise between speed and accuracy in the dense setting.
The next experiment reuses the big.matrix version of the predictors and adds
the SGD solver. Because coxgpls() requires an in-memory matrix, it is not
included in this round.
bench_big <- bench::mark( big_pls_fast = big_pls_cox_fast(X_big, cox_data$time, cox_data$status, ncomp = 5), big_pls = big_pls_cox(cox_data$x, cox_data$time, cox_data$status, ncomp = 5), big_pls_gd = big_pls_cox_gd(X_big, cox_data$time, cox_data$status, ncomp = 5, max_iter = 300), iterations = 75, check = FALSE ) bench_big$expression <- names(bench_big$expression) bench_big[, c("expression", "median", "itr/sec", "mem_alloc")]
The classical solver is still run on the dense matrix for reference, but only the
fast and SGD routines operate on the big.matrix without copying. Expect the
fast backend to dominate runtime while SGD maintains scalability for
out-of-memory workflows.
bench includes autoplot() and plot() helpers for visual inspection of the
timing distribution.
plot(bench_dense, type = "jitter")
plot(bench_big, type = "jitter")
Additional geometries, such as ridge plots, are available via
autoplot(bench_res, type = "jitter").
To integrate the benchmark with automated pipelines, store the results under the
inst/benchmarks/ directory. Each run can include meta-data such as dataset
parameters, number of iterations, or git commit identifiers.
if (!dir.exists("inst/benchmarks/results")) { dir.create("inst/benchmarks/results", recursive = TRUE) } write.csv(bench_dense[, 1:9], file = "inst/benchmarks/results/benchmark-dense.csv", row.names = FALSE) write.csv(bench_big[, 1:9], file = "inst/benchmarks/results/benchmark-big.csv", row.names = FALSE)
The package also ships with standalone scripts under inst/benchmarks/ that
mirror this vignette while exposing additional configuration points. Run them
from the repository root as:
Rscript inst/benchmarks/cox-benchmark.R Rscript inst/benchmarks/benchmark_bigPLScox.R Rscript inst/benchmarks/cox_pls_benchmark.R
Each script accepts environment variables to adjust the problem size and stores
results in inst/benchmarks/results/ with time-stamped filenames.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.