extract_feature_expression: Internal function to extract feature expressions.
In familiar: End-to-End Automated Machine Learning and Model Evaluation

extract_feature_expression

R Documentation

Internal function to extract feature expressions.

Description

Computes and extracts feature expressions for features used in a familiarEnsemble object.

Usage

extract_feature_expression(
  object,
  data,
  feature_similarity,
  sample_similarity,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_similarity_metric = waiver(),
  sample_cluster_method = waiver(),
  sample_linkage_method = waiver(),
  sample_similarity_metric = waiver(),
  evaluation_times = waiver(),
  message_indent = 0L,
  verbose = FALSE,
  ...
)

Arguments

`object`	A `familiarEnsemble` object, which is an ensemble of one or more `familiarModel` objects.
`data`	A `dataObject` object, `data.table` or `data.frame` that constitutes the data that are assessed.
`feature_similarity`	Table containing pairwise distance between sample. This is used to determine cluster information, and indicate which samples are similar. The table is created by the `extract_sample_similarity` method.
`feature_cluster_method`	The method used to perform clustering. These are the same methods as for the `cluster_method` configuration parameter: `none`, `hclust`, `agnes`, `diana` and `pam`. `none` cannot be used when extracting data regarding mutual correlation or feature expressions. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_linkage_method`	The method used for agglomerative clustering in `hclust` and `agnes`. These are the same methods as for the `cluster_linkage_method` configuration parameter: `average`, `single`, `complete`, `weighted`, and `ward`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_similarity_metric`	Metric to determine pairwise similarity between features. Similarity is computed in the same manner as for clustering, and `feature_similarity_metric` therefore has the same options as `cluster_similarity_metric`: `mcfadden_r2`, `cox_snell_r2`, `nagelkerke_r2`, `spearman`, `kendall` and `pearson`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`sample_cluster_method`	The method used to perform clustering based on distance between samples. These are the same methods as for the `cluster_method` configuration parameter: `hclust`, `agnes`, `diana` and `pam`. `none` cannot be used when extracting data for feature expressions. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`sample_linkage_method`	The method used for agglomerative clustering in `hclust` and `agnes`. These are the same methods as for the `cluster_linkage_method` configuration parameter: `average`, `single`, `complete`, `weighted`, and `ward`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`sample_similarity_metric`	Metric to determine pairwise similarity between samples. Similarity is computed in the same manner as for clustering, but `sample_similarity_metric` has different options that are better suited to computing distance between samples instead of between features: `gower`, `euclidean`. The underlying feature data is scaled to the `[0, 1]` range (for numerical features) using the feature values across the samples. The normalisation parameters required can optionally be computed from feature data with the outer 5% (on both sides) of feature values trimmed or winsorised. To do so append `⁠_trim⁠` (trimming) or `⁠_winsor⁠` (winsorising) to the metric name. This reduces the effect of outliers somewhat. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`evaluation_times`	One or more time points that are used for in analysis of survival problems when data has to be assessed at a set time, e.g. calibration. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects. Only used for `survival` outcomes.
`message_indent`	Number of indentation steps for messages shown during computation and extraction of various data elements.
`verbose`	Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
`...`	Unused arguments.