extract_feature_similarity: Internal function to extract the feature distance table.
In familiar: End-to-End Automated Machine Learning and Model Evaluation

extract_feature_similarity

R Documentation

Internal function to extract the feature distance table.

Description

Computes and extracts the feature distance table for features used in a familiarEnsemble object. This table can be used to cluster features, and is exported directly by export_feature_similarity.

Usage

extract_feature_similarity(
  object,
  data,
  cl = NULL,
  estimation_type = waiver(),
  aggregate_results = waiver(),
  confidence_level = waiver(),
  bootstrap_ci_method = waiver(),
  is_pre_processed = FALSE,
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  feature_similarity_metric = waiver(),
  verbose = FALSE,
  message_indent = 0L,
  ...
)

Arguments

`object`	A `familiarEnsemble` object, which is an ensemble of one or more `familiarModel` objects.
`data`	A `dataObject` object, `data.table` or `data.frame` that constitutes the data that are assessed.
`cl`	Cluster created using the `parallel` package. This cluster is then used to speed up computation through parallellisation.
`estimation_type`	(optional) Sets the type of estimation that should be possible. This has the following options: `point`: Point estimates. `bias_correction` or `bc`: Bias-corrected estimates. A bias-corrected estimate is computed from (at least) 20 point estimates, and `familiar` may bootstrap the data to create them. `bootstrap_confidence_interval` or `bci` (default): Bias-corrected estimates with bootstrap confidence intervals (Efron and Hastie, 2016). The number of point estimates required depends on the `confidence_level` parameter, and `familiar` may bootstrap the data to create them. As with `detail_level`, a non-default `estimation_type` parameter can be specified for separate evaluation steps by providing a parameter value in a named list with data elements, e.g. `list("auc_data"="bci", "model_performance"="point")`. This parameter can be set for the following data elements: `auc_data`, `decision_curve_analyis`, `model_performance`, `permutation_vimp`, `ice_data`, and `prediction_data`.
`aggregate_results`	(optional) Flag that signifies whether results should be aggregated during evaluation. If `estimation_type` is `bias_correction` or `bc`, aggregation leads to a single bias-corrected estimate. If `estimation_type` is `bootstrap_confidence_interval` or `bci`, aggregation leads to a single bias-corrected estimate with lower and upper boundaries of the confidence interval. This has no effect if `estimation_type` is `point`. The default value is equal to `TRUE` except when assessing metrics to assess model performance, as the default violin plot requires underlying data. As with `detail_level` and `estimation_type`, a non-default `aggregate_results` parameter can be specified for separate evaluation steps by providing a parameter value in a named list with data elements, e.g. `list("auc_data"=TRUE, , "model_performance"=FALSE)`. This parameter exists for the same elements as `estimation_type`.
`confidence_level`	(optional) Numeric value for the level at which confidence intervals are determined. In the case bootstraps are used to determine the confidence intervals bootstrap estimation, `familiar` uses the rule of thumb `n = 20 / ci.level` to determine the number of required bootstraps. The default value is `0.95`.
`bootstrap_ci_method`	(optional) Method used to determine bootstrap confidence intervals (Efron and Hastie, 2016). The following methods are implemented: `percentile` (default): Confidence intervals obtained using the percentile method. `bc`: Bias-corrected confidence intervals. Note that the standard method is not implemented because this method is often not suitable due to non-normal distributions. The bias-corrected and accelerated (BCa) method is not implemented yet.
`is_pre_processed`	Flag that indicates whether the data was already pre-processed externally, e.g. normalised and clustered. Only used if the `data` argument is a `data.table` or `data.frame`.
`feature_cluster_method`	The method used to perform clustering. These are the same methods as for the `cluster_method` configuration parameter: `none`, `hclust`, `agnes`, `diana` and `pam`. `none` cannot be used when extracting data regarding mutual correlation or feature expressions. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_linkage_method`	The method used for agglomerative clustering in `hclust` and `agnes`. These are the same methods as for the `cluster_linkage_method` configuration parameter: `average`, `single`, `complete`, `weighted`, and `ward`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_cluster_cut_method`	The method used to divide features into separate clusters. The available methods are the same as for the `cluster_cut_method` configuration parameter: `silhouette`, `fixed_cut` and `dynamic_cut`. `silhouette` is available for all cluster methods, but `fixed_cut` only applies to methods that create hierarchical trees (`hclust`, `agnes` and `diana`). `dynamic_cut` requires the `dynamicTreeCut` package and can only be used with `agnes` and `hclust`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_similarity_threshold`	The threshold level for pair-wise similarity that is required to form feature clusters with the `fixed_cut` method. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_similarity_metric`	Metric to determine pairwise similarity between features. Similarity is computed in the same manner as for clustering, and `feature_similarity_metric` therefore has the same options as `cluster_similarity_metric`: `mcfadden_r2`, `cox_snell_r2`, `nagelkerke_r2`, `spearman`, `kendall` and `pearson`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`verbose`	Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
`message_indent`	Number of indentation steps for messages shown during computation and extraction of various data elements.
`...`	Unused arguments.

Value

A data.table containing pairwise distance between features. This data is only the upper triangular of the complete matrix (i.e. the sparse unitriangular representation). Diagonals will always be 0.0 and the lower triangular is mirrored.

familiar documentation built on Sept. 30, 2024, 9:18 a.m.