dot-check_input_identifier_column: Internal function for checking consistency of the identifier...
In familiar: End-to-End Automated Machine Learning and Model Evaluation

.check_input_identifier_column

R Documentation

Internal function for checking consistency of the identifier columns

Description

This function checks whether an identifier column is consistent, i.e. appears it exists, there is only one, and there is no overlap with any user-provided feature columns, identifiers, or

Usage

.check_input_identifier_column(
  id_column,
  data,
  signature = NULL,
  exclude_features = NULL,
  include_features = NULL,
  other_id_column = NULL,
  outcome_column = NULL,
  col_type,
  check_stringency = "strict"
)

Arguments

`id_column`	Character string indicating the currently inspected identifier column.
`data`	Data set as loaded using the `.load_data` function.
`signature`	(optional) One or more names of feature columns that are considered part of a specific signature. Features specified here will always be used for modelling. Ranking from feature selection has no effect for these features.
`exclude_features`	(optional) Feature columns that will be removed from the data set. Cannot overlap with features in `signature`, `novelty_features` or `include_features`.
`include_features`	(optional) Feature columns that are specifically included in the data set. By default all features are included. Cannot overlap with `exclude_features`, but may overlap `signature`. Features in `signature` and `novelty_features` are always included. If both `exclude_features` and `include_features` are provided, `include_features` takes precedence, provided that there is no overlap between the two.
`other_id_column`	Character string indicating another identifier column.
`outcome_column`	Character string indicating the outcome column(s).
`col_type`	Character string indicating the type of column, i.e. `sample` or `batch`.
`check_stringency`	Specifies stringency of various checks. This is mostly: `strict`: default value used for `summon_familiar`. Thoroughly checks input data. Used internally for checking development data. `external_warn`: value used for `extract_data` and related methods. Less stringent checks, but will warn for possible issues. Used internally for checking data for evaluation and explanation. `external`: value used for external methods such as `predict`. Less stringent checks, particularly for identifier and outcome columns, which may be completely absent. Used internally for `predict`.