batch_snf_subsamples: Run SNF clustering pipeline on a list of subsampled data...
In metasnf: Meta Clustering with Similarity Network Fusion

batch_snf_subsamples

R Documentation

Run SNF clustering pipeline on a list of subsampled data lists

Description

Run SNF clustering pipeline on a list of subsampled data lists

Usage

batch_snf_subsamples(
  dl_subsamples,
  sc,
  processes = 1,
  return_sim_mats = FALSE,
  sim_mats_dir = NULL
)

Arguments

`dl_subsamples`	A list of subsampled data lists. This object is generated by the function `batch_snf_subsamples()`.
`sc`	An `snf_config` class object which stores all sets of hyperparameters used to transform data in dl into a cluster solutions. See `?settings_df` or https://branchlab.github.io/metasnf/articles/settings_df.html for more details.
`processes`	Specify number of processes used to complete SNF iterations `1` (default) Sequential processing: function will iterate through the `settings_df` one row at a time with a for loop. This option will not make use of multiple CPU cores, but will show a progress bar. `2` or higher: Parallel processing will use the `future.apply::future_apply` to distribute the SNF iterations across the specified number of CPU cores. If higher than the number of available cores, a warning will be raised and the maximum number of cores will be used. `max`: All available cores will be used.
`return_sim_mats`	If TRUE, function will return a list where the first element is the solutions data frame and the second element is a list of similarity matrices for each row in the sol_df. Default FALSE.
`sim_mats_dir`	If specified, this directory will be used to save all generated similarity matrices.

Value

By default, returns a one-element list: cluster_solutions, which is itself a list of cluster solution data frames corresponding to each of the provided data list subsamples. Setting the parameters return_sim_mats and return_solutions to TRUE will turn the result of the function to a three-element list containing the corresponding solutions data frames and final fused similarity matrices of those cluster solutions, should you require these objects for your own stability calculations.

Examples


my_dl <- data_list(
    list(subc_v, "subcortical_volume", "neuroimaging", "continuous"),
    list(income, "household_income", "demographics", "continuous"),
    list(pubertal, "pubertal_status", "demographics", "continuous"),
    uid = "unique_id"
)

sc <- snf_config(my_dl, n_solutions = 5, max_k = 40)

my_dl_subsamples <- subsample_dl(
    my_dl,
    n_subsamples = 20,
    subsample_fraction = 0.85
)

batch_subsample_results <- batch_snf_subsamples(
    my_dl_subsamples,
    sc
)

metasnf documentation built on June 8, 2025, 12:47 p.m.