export_risk_stratification_data-methods: Extract and export sample risk group stratification and...
In familiar: End-to-End Automated Machine Learning and Model Evaluation

export_risk_stratification_data

R Documentation

Extract and export sample risk group stratification and associated tests.

Description

Extract and export sample risk group stratification and associated tests for data in a familiarCollection.

Usage

export_risk_stratification_data(
  object,
  dir_path = NULL,
  export_strata = TRUE,
  time_range = NULL,
  export_collection = FALSE,
  ...
)

## S4 method for signature 'familiarCollection'
export_risk_stratification_data(
  object,
  dir_path = NULL,
  export_strata = TRUE,
  time_range = NULL,
  export_collection = FALSE,
  ...
)

## S4 method for signature 'ANY'
export_risk_stratification_data(
  object,
  dir_path = NULL,
  export_strata = TRUE,
  time_range = NULL,
  export_collection = FALSE,
  ...
)

Arguments

`object`	A `familiarCollection` object, or other other objects from which a `familiarCollection` can be extracted. See details for more information.
`dir_path`	Path to folder where extracted data should be saved. `NULL` will allow export as a structured list of data.tables.
`export_strata`	Flag that determines whether the raw data or strata are exported.
`time_range`	Time range for which strata should be created. If `NULL`, the full time range is used.
`export_collection`	(optional) Exports the collection if TRUE.
`...`	Arguments passed on to `extract_risk_stratification_data`, `as_familiar_collection` `data` A `dataObject` object, `data.table` or `data.frame` that constitutes the data that are assessed. `is_pre_processed` Flag that indicates whether the data was already pre-processed externally, e.g. normalised and clustered. Only used if the `data` argument is a `data.table` or `data.frame`. `cl` Cluster created using the `parallel` package. This cluster is then used to speed up computation through parallellisation. `ensemble_method` Method for ensembling predictions from models for the same sample. Available methods are: `median` (default): Use the median of the predicted values as the ensemble value for a sample. `mean`: Use the mean of the predicted values as the ensemble value for a sample. `verbose` Flag to indicate whether feedback should be provided on the computation and extraction of various data elements. `message_indent` Number of indentation steps for messages shown during computation and extraction of various data elements. `detail_level` (optional) Sets the level at which results are computed and aggregated. `ensemble`: Results are computed at the ensemble level, i.e. over all models in the ensemble. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the model performance of the ensemble model for each bootstrap. `hybrid` (default): Results are computed at the level of models in an ensemble. This means that, for example, bias-corrected estimates of model performance are directly computed using the models in the ensemble. If there are at least 20 trained models in the ensemble, performance is computed for each model, in contrast to `ensemble` where performance is computed for the ensemble of models. If there are less than 20 trained models in the ensemble, bootstraps are created so that at least 20 point estimates can be made. `model`: Results are computed at the model level. This means that, for example, bias-corrected estimates of model performance are assessed by creating (at least) 20 bootstraps and computing the performance of the model for each bootstrap. Note that each level of detail has a different interpretation for bootstrap confidence intervals. For `ensemble` and `model` these are the confidence intervals for the ensemble and an individual model, respectively. That is, the confidence interval describes the range where an estimate produced by a respective ensemble or model trained on a repeat of the experiment may be found with the probability of the confidence level. For `hybrid`, it represents the range where any single model trained on a repeat of the experiment may be found with the probability of the confidence level. By definition, confidence intervals obtained using `hybrid` are at least as wide as those for `ensemble`. `hybrid` offers the correct interpretation if the goal of the analysis is to assess the result of a single, unspecified, model. `hybrid` is generally computationally less expensive then `ensemble`, which in turn is somewhat less expensive than `model`. A non-default `detail_level` parameter can be specified for separate evaluation steps by providing a parameter value in a named list with data elements, e.g. `list("auc_data"="ensemble", "model_performance"="hybrid")`. This parameter can be set for the following data elements: `auc_data`, `decision_curve_analyis`, `model_performance`, `permutation_vimp`, `ice_data`, `prediction_data` and `confusion_matrix`. `confidence_level` (optional) Numeric value for the level at which confidence intervals are determined. In the case bootstraps are used to determine the confidence intervals bootstrap estimation, `familiar` uses the rule of thumb `n = 20 / ci.level` to determine the number of required bootstraps. The default value is `0.95`. `familiar_data_names` Names of the dataset(s). Only used if the `object` parameter is one or more `familiarData` objects. `collection_name` Name of the collection.

Details

Data is usually collected from a familiarCollection object. However, you can also provide one or more familiarData objects, that will be internally converted to a familiarCollection object. It is also possible to provide a familiarEnsemble or one or more familiarModel objects together with the data from which data is computed prior to export. Paths to the previous files can also be provided.

All parameters aside from object and dir_path are only used if object is not a familiarCollection object, or a path to one.

Three tables are exported in a list:

data: Contains the assigned risk group for a given sample, along with its reported survival time and censoring status.
hr_ratio: Contains the hazard ratio between different risk groups.
logrank: Contains the results from the logrank test between different risk groups.