export_feature_similarity-methods: Extract and export mutual correlation between features.
In familiar: End-to-End Automated Machine Learning and Model Evaluation

export_feature_similarity

R Documentation

Extract and export mutual correlation between features.

Description

Extract and export mutual correlation between features in a familiarCollection.

Usage

export_feature_similarity(
  object,
  dir_path = NULL,
  aggregate_results = TRUE,
  features = waiver(),
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  export_dendrogram = FALSE,
  export_ordered_data = FALSE,
  export_clustering = FALSE,
  export_collection = FALSE,
  ...
)

## S4 method for signature 'familiarCollection'
export_feature_similarity(
  object,
  dir_path = NULL,
  aggregate_results = TRUE,
  features = waiver(),
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  export_dendrogram = FALSE,
  export_ordered_data = FALSE,
  export_clustering = FALSE,
  export_collection = FALSE,
  ...
)

## S4 method for signature 'ANY'
export_feature_similarity(
  object,
  dir_path = NULL,
  aggregate_results = TRUE,
  features = waiver(),
  feature_cluster_method = waiver(),
  feature_linkage_method = waiver(),
  feature_cluster_cut_method = waiver(),
  feature_similarity_threshold = waiver(),
  export_dendrogram = FALSE,
  export_ordered_data = FALSE,
  export_clustering = FALSE,
  export_collection = FALSE,
  ...
)

Arguments

`object`	A `familiarCollection` object, or other other objects from which a `familiarCollection` can be extracted. See details for more information.
`dir_path`	Path to folder where extracted data should be saved. `NULL` will allow export as a structured list of data.tables.
`aggregate_results`	Flag that signifies whether results should be aggregated for export.
`features`	Features that should be considered for extracting information from. Typically called in external workflows, e.g. for plotting. Internally, i.e. from summon_familiar, this variable is not used.
`feature_cluster_method`	The method used to perform clustering. These are the same methods as for the `cluster_method` configuration parameter: `none`, `hclust`, `agnes`, `diana` and `pam`. `none` cannot be used when extracting data regarding mutual correlation or feature expressions. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_linkage_method`	The method used for agglomerative clustering in `hclust` and `agnes`. These are the same methods as for the `cluster_linkage_method` configuration parameter: `average`, `single`, `complete`, `weighted`, and `ward`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_cluster_cut_method`	The method used to divide features into separate clusters. The available methods are the same as for the `cluster_cut_method` configuration parameter: `silhouette`, `fixed_cut` and `dynamic_cut`. `silhouette` is available for all cluster methods, but `fixed_cut` only applies to methods that create hierarchical trees (`hclust`, `agnes` and `diana`). `dynamic_cut` requires the `dynamicTreeCut` package and can only be used with `agnes` and `hclust`. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`feature_similarity_threshold`	The threshold level for pair-wise similarity that is required to form feature clusters with the `fixed_cut` method. If not provided explicitly, this parameter is read from settings used at creation of the underlying `familiarModel` objects.
`export_dendrogram`	Add dendrogram in the data element objects.
`export_ordered_data`	Add feature label ordering to data in the data element objects.
`export_clustering`	Add clustering information to data.
`export_collection`	(optional) Exports the collection if TRUE.
`...`	Arguments passed on to `as_familiar_collection`, `as_data_object` `familiar_data_names` Names of the dataset(s). Only used if the `object` parameter is one or more `familiarData` objects. `collection_name` Name of the collection. `data` A `data.frame` or `data.table`, a path to such tables on a local or network drive, or a path to tabular data that may be converted to these formats. `check_stringency` Specifies stringency of various checks. This is mostly: `strict`: default value used for `summon_familiar`. Thoroughly checks input data. Used internally for checking development data. `external_warn`: value used for `extract_data` and related methods. Less stringent checks, but will warn for possible issues. Used internally for checking data for evaluation and explanation. `external`: value used for external methods such as `predict`. Less stringent checks, particularly for identifier and outcome columns, which may be completely absent. Used internally for `predict`. `.no_features_required` Internal flag to signify that data without features is allowed. Default: FALSE (most processing steps require features). `batch_id_column` (recommended) Name of the column containing batch or cohort identifiers. This parameter is required if more than one dataset is provided, or if external validation is performed. In familiar any row of data is organised by four identifiers: The batch identifier `batch_id_column`: This denotes the group to which a set of samples belongs, e.g. patients from a single study, samples measured in a batch, etc. The batch identifier is used for batch normalisation, as well as selection of development and validation datasets. The sample identifier `sample_id_column`: This denotes the sample level, e.g. data from a single individual. Subsets of data, e.g. bootstraps or cross-validation folds, are created at this level. The series identifier `series_id_column`: Indicates measurements on a single sample that may not share the same outcome value, e.g. a time series, or the number of cells in a view. The repetition identifier: Indicates repeated measurements in a single series where any feature values may differ, but the outcome does not. Repetition identifiers are always implicitly set when multiple entries for the same series of the same sample in the same batch that share the same outcome are encountered. `sample_id_column` (recommended) Name of the column containing sample or subject identifiers. See `batch_id_column` above for more details. If unset, every row will be identified as a single sample. `series_id_column` (optional) Name of the column containing series identifiers, which distinguish between measurements that are part of a series for a single sample. See `batch_id_column` above for more details. If unset, rows which share the same batch and sample identifiers but have a different outcome are assigned unique series identifiers. `development_batch_id` (optional) One or more batch or cohort identifiers to constitute data sets for development. Defaults to all, or all minus the identifiers in `validation_batch_id` for external validation. Required if external validation is performed and `validation_batch_id` is not provided. `validation_batch_id` (optional) One or more batch or cohort identifiers to constitute data sets for external validation. Defaults to all data sets except those in `development_batch_id` for external validation, or none if not. Required if `development_batch_id` is not provided. `outcome_name` (optional) Name of the modelled outcome. This name will be used in figures created by `familiar`. If not set, the column name in `outcome_column` will be used for `binomial`, `multinomial`, and `continuous` outcomes. For other outcomes (`survival` and `competing_risk`) no default is used. `outcome_column` (recommended) Name of the column containing the outcome of interest. May be identified from a formula, if a formula is provided as an argument. Otherwise an error is raised. Note that `survival` and `competing_risk` outcome type outcomes require two columns that indicate the time-to-event or the time of last follow-up and the event status. `outcome_type` (recommended) Type of outcome found in the outcome column. The outcome type determines many aspects of the overall process, e.g. the available variable importance methods and learners, but also the type of assessments that can be conducted to evaluate the resulting models. Implemented outcome types are: `binomial`: categorical outcome with 2 levels. `multinomial`: categorical outcome with 2 or more levels. `continuous`: general continuous numeric outcomes. `survival`: survival outcome for time-to-event data. If not provided, the algorithm will attempt to obtain outcome_type from contents of the outcome column. This may lead to unexpected results, and we therefore advise to provide this information manually. Note that `competing_risk` survival analysis are not fully supported, and is currently not a valid choice for `outcome_type`. The `count` outcome type was deprecated in version 2.0.0, and superseded by `continuous`. `class_levels` (optional) Class levels for `binomial` or `multinomial` outcomes. This argument can be used to specify the ordering of levels for categorical outcomes. These class levels must exactly match the levels present in the outcome column. `event_indicator` (recommended) Indicator for events in `survival` and `competing_risk` analyses. `familiar` will automatically recognise `1`, `true`, `t`, `y` and `yes` as event indicators, including different capitalisations. If this parameter is set, it replaces the default values. `censoring_indicator` (recommended) Indicator for right-censoring in `survival` and `competing_risk` analyses. `familiar` will automatically recognise `0`, `false`, `f`, `n`, `no` as censoring indicators, including different capitalisations. If this parameter is set, it replaces the default values. `competing_risk_indicator` (recommended) Indicator for competing risks in `competing_risk` analyses. There are no default values, and if unset, all values other than those specified by the `event_indicator` and `censoring_indicator` parameters are considered to indicate competing risks. `exclude_features` (optional) Feature columns that will be removed from the data set. Cannot overlap with features in `signature`, `novelty_features` or `include_features`. `include_features` (optional) Feature columns that are specifically included in the data set. By default all features are included. Cannot overlap with `exclude_features`, but may overlap `signature`. Features in `signature` and `novelty_features` are always included. If both `exclude_features` and `include_features` are provided, `include_features` takes precedence, provided that there is no overlap between the two. `reference_method` (optional) Method used to set reference levels for categorical features. There are several options: `auto` (default): Categorical features that are not explicitly set by the user, i.e. columns containing boolean values or characters, use the most frequent level as reference. Categorical features that are explicitly set, i.e. as factors, are used as is. `always`: Both automatically detected and user-specified categorical features have the reference level set to the most frequent level. Ordinal features are not altered, but are used as is. `never`: User-specified categorical features are used as is. Automatically detected categorical features are simply sorted, and the first level is then used as the reference level. This was the behaviour prior to familiar version 1.3.0.

Details

All parameters aside from object and dir_path are only used if object is not a familiarCollection object, or a path to one.

Feature similarity data can be created from dataObject, or data.table objects. For data.table, see as_data_object for additional arguments.

Value

A list containing a data.table (if dir_path is not provided), or nothing, as all data is exported to csv files.

familiar documentation built on June 2, 2026, 1:08 a.m.

familiar index

Evaluation and explanation" Introducing familiar" Learning algorithms and hyperparameter optimisation" Performance metrics" Variable importance methods"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

familiar
End-to-End Automated Machine Learning and Model Evaluation

export_feature_similarity-methods: Extract and export mutual correlation between features.
In familiar: End-to-End Automated Machine Learning and Model Evaluation

Extract and export mutual correlation between features.

Description

Usage

Arguments

Details

Value

Related to export_feature_similarity-methods in familiar...

R Package Documentation

Browse R Packages

We want your feedback!

familiar End-to-End Automated Machine Learning and Model Evaluation

export_feature_similarity-methods: Extract and export mutual correlation between features. In familiar: End-to-End Automated Machine Learning and Model Evaluation

Extract and export mutual correlation between features.

Description

Usage

Arguments

Details

Value

Related to export_feature_similarity-methods in familiar...

R Package Documentation

Browse R Packages

We want your feedback!

familiar
End-to-End Automated Machine Learning and Model Evaluation

export_feature_similarity-methods: Extract and export mutual correlation between features.
In familiar: End-to-End Automated Machine Learning and Model Evaluation