| bigscale | R Documentation |
Prepares a large-scale feature matrix for stochastic gradient descent byapplying optional normalisation, stratified sampling, and batching rules.
bigscale(
formula = survival::Surv(time = time, status = status) ~ .,
data,
norm.method = "standardize",
strata.size = 20,
batch.size = 1,
features.mean = NULL,
features.sd = NULL,
parallel.flag = FALSE,
num.cores = NULL,
bigmemory.flag = FALSE,
num.rows.chunk = 1e+06,
col.names = NULL,
type = "short"
)
formula |
formula used to extract the outcome and predictors that should be included in the scaled design matrix. |
data |
Input data source containing the variables referenced in
|
norm.method |
Normalisation strategy (for example centring or standardising columns) applied to the feature matrix. |
strata.size |
Number of observations to retain from each stratum when constructing stratified batches. |
batch.size |
Total size of each mini-batch produced by the scaling routine. |
features.mean |
Optional vector of column means that can be reused to normalise multiple data sets in a consistent manner. |
features.sd |
Optional vector of column standard deviations that pairs
with |
parallel.flag |
Logical flag signalling whether the scaling work should be parallelised across cores. |
num.cores |
Number of processor cores allocated when
|
bigmemory.flag |
Logical flag specifying whether intermediate results should be stored in bigmemory-backed matrices. |
num.rows.chunk |
Chunk size used when streaming data from on-disk objects into memory. |
col.names |
Optional character vector assigning column names to the generated design matrix. |
type |
Type of model or preprocessing target being prepared, such as survival or regression. |
A scaled design matrix of the scaler class along with metadata describing the transformation that was applied. time.indices: indices of the time variable cens.indices: indices of the censored variables features.indices: indices of the features time.sd: standard deviation of the time variable time.mean: mean of the time variable features.sd: standard deviation of the features features.mean: mean of the features nr: number of rows nc: number of columns col.names: columns names
bigSurvSGD.na.omit() for fitting models that use the scaled
features.
data(micro.censure, package = "bigPLScox")
surv_data <- stats::na.omit(
micro.censure[, c("survyear", "DC", "sexe", "Agediag")]
)
scaled <- bigscale(
survival::Surv(survyear, DC) ~ .,
data = surv_data,
norm.method = "standardize",
batch.size = 16
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.