as_prediction_table: Convert to prediction table object
In familiar: End-to-End Automated Machine Learning and Model Evaluation

as_prediction_table

R Documentation

Convert to prediction table object

Description

Creates a prediction table object from input data.

Usage

as_prediction_table(
  x,
  type,
  y = waiver(),
  batch_id = waiver(),
  sample_id = waiver(),
  series_id = waiver(),
  repetition_id = waiver(),
  time = waiver(),
  class_levels = waiver(),
  value_range = waiver(),
  event_indicator = waiver(),
  censoring_indicator = waiver(),
  learner = waiver(),
  vimp_method = waiver(),
  model_object = NULL,
  data = NULL
)

Arguments

`x`	Values predicted using a learner. For all but `classification` problems, predicted values should be a single vector of values in any format that results in a single-column `data.table` using `data.table::as.data.table`. For `classification` problems, predicted values are probabilities for each class. Here, it is recommended to ensure probabilities can be mapped to their respective class, e.g. using a named list.
`type`	The type of prediction table that should be created. The following types are available: `regression`: The predicted values are values for a regression. `classification`: The predicted values are probabilities for specific classes. `hazard_ratio`: The predicted values are hazard ratios. `cumulative_hazard`: The predicted values are cumulative hazards at time `time`. `expected_survival_time`: The predicted values are expected survival times. `survival_probability`: The predicted values are survival probabilities at time `time`.
`y`	Known outcome value corresponding to each entry in `x`. For survival-related outcomes, two sets of values are expected, corresponding to the observed time and event status, respectively. Alternatively, a `survival::Surv` object can be provided.
`batch_id`	(optional) Array of batch or cohort identifiers. In familiar any row of data is organised by four identifiers: The batch identifier `batch_id`: This denotes the group to which a set of samples belongs, e.g. patients from a single study, samples measured in a batch, etc. The batch identifier is used for batch normalisation, as well as selection of development and validation datasets. The sample identifier `sample_id`: This denotes the sample level, e.g. data from a single individual. Subsets of data, e.g. bootstraps or cross-validation folds, are created at this level. The series identifier `series_id`: Indicates measurements on a single sample that may not share the same outcome value, e.g. a time series, or the number of cells in a view. The repetition identifier `repetition_id`: Indicates repeated measurements in a single series where any feature values may differ, but the outcome does not. Repetition identifiers are always implicitly set when multiple entries for the same series of the same sample in the same batch that share the same outcome are encountered.
`sample_id`	(optional) Array of sample or subject identifiers. See `batch_id` above for more details. If unset, every row will be identified as a single sample.
`series_id`	(optional) Array of series identifiers, which distinguish between measurements that are part of a series for a single sample. See `batch_id` above for more details.
`repetition_id`	(optional) Array of repetition identifiers, which distinguishes between repeated measurements within a single series. See `batch_id` above for more details.
`time`	Time point at which the predicted values are generated e.g. the cumulative risks generated by random forest. This parameter is only relevant for `survival` outcomes.
`class_levels`	(optional) Class levels for `binomial` or `multinomial` outcomes. This argument can be used to specify the ordering of levels for categorical outcomes. These class levels must exactly match the levels present in the outcome column.
`value_range`	Range of observed, not predicted, values. This parameter is only relevant for `continuous` outcomes.
`event_indicator`	(recommended) Indicator for events in `survival` and `competing_risk` analyses. `familiar` will automatically recognise `1`, `true`, `t`, `y` and `yes` as event indicators, including different capitalisations. If this parameter is set, it replaces the default values.
`censoring_indicator`	(recommended) Indicator for right-censoring in `survival` and `competing_risk` analyses. `familiar` will automatically recognise `0`, `false`, `f`, `n`, `no` as censoring indicators, including different capitalisations. If this parameter is set, it replaces the default values.
`learner`	The type of learner that generated the predictions.
`vimp_method`	The type of variable importance method for identifying the features included by the learner that generated the predictions.
`model_object`	A familiarModel or familiarEnsemble that can be used (and is used internally) for setting several of the other arguments of this function.
`data`	A familiar dataObject object that can be used (and is used internally) for setting many of the other arguments of this function.