.finish_data_preparation | R Documentation |
Internal function for finalising generic data processing
.finish_data_preparation(
data,
sample_id_column,
batch_id_column,
series_id_column,
outcome_column,
outcome_type,
include_features,
class_levels,
censoring_indicator,
event_indicator,
competing_risk_indicator,
check_stringency = "strict",
reference_method = "auto"
)
data |
data.table with feature data |
sample_id_column |
(recommended) Name of the column containing
sample or subject identifiers. See If unset, every row will be identified as a single sample. |
batch_id_column |
(recommended) Name of the column containing batch or cohort identifiers. This parameter is required if more than one dataset is provided, or if external validation is performed. In familiar any row of data is organised by four identifiers:
|
series_id_column |
(optional) Name of the column containing series
identifiers, which distinguish between measurements that are part of a
series for a single sample. See If unset, rows which share the same batch and sample identifiers but have a different outcome are assigned unique series identifiers. |
outcome_column |
(recommended) Name of the column containing the
outcome of interest. May be identified from a formula, if a formula is
provided as an argument. Otherwise an error is raised. Note that |
outcome_type |
(recommended) Type of outcome found in the outcome column. The outcome type determines many aspects of the overall process, e.g. the available feature selection methods and learners, but also the type of assessments that can be conducted to evaluate the resulting models. Implemented outcome types are:
If not provided, the algorithm will attempt to obtain outcome_type from contents of the outcome column. This may lead to unexpected results, and we therefore advise to provide this information manually. Note that |
include_features |
(optional) Feature columns that are specifically
included in the data set. By default all features are included. Cannot
overlap with |
class_levels |
(optional) Class levels for |
censoring_indicator |
(recommended) Indicator for right-censoring in
|
event_indicator |
(recommended) Indicator for events in |
competing_risk_indicator |
(recommended) Indicator for competing
risks in |
check_stringency |
Specifies stringency of various checks. This is mostly:
|
reference_method |
(optional) Method used to set reference levels for categorical features. There are several options:
|
This function is used to update data.table provided by loading the data. When part of the main familiar workflow, this function is used after .parse_initial_settings –> .load_data –> .update_initial_settings.
When used to parse external data (e.g. in conjunction with familiarModel) it follows after .load_data. Hence the function contains several checks which are otherwise part of .update_initial_settings.
data.table with expected column names.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.