.parse_experiment_settings | R Documentation |
Internal function for parsing settings related to the experimental setup
.parse_experiment_settings(
config = NULL,
batch_id_column = waiver(),
sample_id_column = waiver(),
series_id_column = waiver(),
development_batch_id = waiver(),
validation_batch_id = waiver(),
outcome_name = waiver(),
outcome_column = waiver(),
outcome_type = waiver(),
event_indicator = waiver(),
censoring_indicator = waiver(),
competing_risk_indicator = waiver(),
class_levels = waiver(),
signature = waiver(),
novelty_features = waiver(),
exclude_features = waiver(),
include_features = waiver(),
reference_method = waiver(),
experimental_design = waiver(),
imbalance_correction_method = waiver(),
imbalance_n_partitions = waiver(),
...
)
config |
A list of settings, e.g. from an xml file. |
batch_id_column |
(recommended) Name of the column containing batch or cohort identifiers. This parameter is required if more than one dataset is provided, or if external validation is performed. In familiar any row of data is organised by four identifiers:
|
sample_id_column |
(recommended) Name of the column containing
sample or subject identifiers. See If unset, every row will be identified as a single sample. |
series_id_column |
(optional) Name of the column containing series
identifiers, which distinguish between measurements that are part of a
series for a single sample. See If unset, rows which share the same batch and sample identifiers but have a different outcome are assigned unique series identifiers. |
development_batch_id |
(optional) One or more batch or cohort
identifiers to constitute data sets for development. Defaults to all, or
all minus the identifiers in |
validation_batch_id |
(optional) One or more batch or cohort
identifiers to constitute data sets for external validation. Defaults to
all data sets except those in |
outcome_name |
(optional) Name of the modelled outcome. This name will
be used in figures created by If not set, the column name in |
outcome_column |
(recommended) Name of the column containing the
outcome of interest. May be identified from a formula, if a formula is
provided as an argument. Otherwise an error is raised. Note that |
outcome_type |
(recommended) Type of outcome found in the outcome column. The outcome type determines many aspects of the overall process, e.g. the available feature selection methods and learners, but also the type of assessments that can be conducted to evaluate the resulting models. Implemented outcome types are:
If not provided, the algorithm will attempt to obtain outcome_type from contents of the outcome column. This may lead to unexpected results, and we therefore advise to provide this information manually. Note that |
event_indicator |
(recommended) Indicator for events in |
censoring_indicator |
(recommended) Indicator for right-censoring in
|
competing_risk_indicator |
(recommended) Indicator for competing
risks in |
class_levels |
(optional) Class levels for |
signature |
(optional) One or more names of feature columns that are considered part of a specific signature. Features specified here will always be used for modelling. Ranking from feature selection has no effect for these features. |
novelty_features |
(optional) One or more names of feature columns that should be included for the purpose of novelty detection. |
exclude_features |
(optional) Feature columns that will be removed
from the data set. Cannot overlap with features in |
include_features |
(optional) Feature columns that are specifically
included in the data set. By default all features are included. Cannot
overlap with |
reference_method |
(optional) Method used to set reference levels for categorical features. There are several options:
|
experimental_design |
(required) Defines what the experiment looks
like, e.g.
The different components are linked using Different subsampling methods can be used in conjunction with the basic workflow components:
As shown in the example above, sampling algorithms can be nested. The simplest valid experimental design is Alternatively, the |
imbalance_correction_method |
(optional) Type of method used to address class imbalances. Available options are:
This parameter is only used in combination with imbalance partitioning in
the experimental design, and |
imbalance_n_partitions |
(optional) Number of times random undersampling should be repeated. 10 undersampled subsets with balanced classes are formed by default. |
... |
Unused arguments. |
List of parameters related to data parsing and the experiment.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.