bigscale: Construct Scaled Design Matrices for Big Survival Models
In bigPLScox: Partial Least Squares for Cox Models with Big Matrices

bigscale

R Documentation

Construct Scaled Design Matrices for Big Survival Models

Description

Prepares a large-scale feature matrix for stochastic gradient descent byapplying optional normalisation, stratified sampling, and batching rules.

Usage

bigscale(
  formula = survival::Surv(time = time, status = status) ~ .,
  data,
  norm.method = "standardize",
  strata.size = 20,
  batch.size = 1,
  features.mean = NULL,
  features.sd = NULL,
  parallel.flag = FALSE,
  num.cores = NULL,
  bigmemory.flag = FALSE,
  num.rows.chunk = 1e+06,
  col.names = NULL,
  type = "short"
)

Arguments

`formula`	formula used to extract the outcome and predictors that should be included in the scaled design matrix.
`data`	Input data source containing the variables referenced in `formula`.
`norm.method`	Normalisation strategy (for example centring or standardising columns) applied to the feature matrix.
`strata.size`	Number of observations to retain from each stratum when constructing stratified batches.
`batch.size`	Total size of each mini-batch produced by the scaling routine.
`features.mean`	Optional vector of column means that can be reused to normalise multiple data sets in a consistent manner.
`features.sd`	Optional vector of column standard deviations that pairs with `features.mean` during scaling.
`parallel.flag`	Logical flag signalling whether the scaling work should be parallelised across cores.
`num.cores`	Number of processor cores allocated when `parallel.flag` is `TRUE`.
`bigmemory.flag`	Logical flag specifying whether intermediate results should be stored in bigmemory-backed matrices.
`num.rows.chunk`	Chunk size used when streaming data from on-disk objects into memory.
`col.names`	Optional character vector assigning column names to the generated design matrix.
`type`	Type of model or preprocessing target being prepared, such as survival or regression.

Value

A scaled design matrix of the scaler class along with metadata describing the transformation that was applied. time.indices: indices of the time variable cens.indices: indices of the censored variables features.indices: indices of the features time.sd: standard deviation of the time variable time.mean: mean of the time variable features.sd: standard deviation of the features features.mean: mean of the features nr: number of rows nc: number of columns col.names: columns names

Examples

data(micro.censure, package = "bigPLScox")
surv_data <- stats::na.omit(
  micro.censure[, c("survyear", "DC", "sexe", "Agediag")]
)
scaled <- bigscale(
  survival::Surv(survyear, DC) ~ .,
  data = surv_data,
  norm.method = "standardize",
  batch.size = 16
)

bigPLScox documentation built on Nov. 18, 2025, 5:06 p.m.