snf_config | R Documentation |
snf_config()
constructs an SNF config object which inherits from classes
snf_config
and list
. This object is used to store all settings
required to transform data stored in a data_list
class object into a
space of cluster solutions by SNF. The SNF config object contains the
following components:
1. A settings data frame (inherits from settings_df
and data.frame
).
Data frame that stores SNF-specific hyperparameters and information
about feature selection and weighting, SNF schemes, clustering
algorithms, and distance metrics. Each row of the settings data frame
corresponds to a distinct cluster solution.
2. A clustering algorithms list (inherits from clust_fns_list
and
list
), which stores all clustering algorithms that the settings
data frame can point to.
3. A distance metrics list (inherits from dist_metrics_list
and
list
), which stores all distance metrics that the settings data
frame can point to.
4. A weights matrix (inherits from weights_matrix
, matrix
, and
array
'), which stores the feature weights to use prior to distance
calculations. Each column of the weights matrix corresponds to a
different feature in the data list and each row corresponds to a
different row in the settings data frame.
snf_config(
dl = NULL,
sdf = NULL,
dfl = NULL,
cfl = NULL,
wm = NULL,
n_solutions = 0,
min_removed_inputs = 0,
max_removed_inputs = length(dl) - 1,
dropout_dist = "exponential",
min_alpha = NULL,
max_alpha = NULL,
min_k = NULL,
max_k = NULL,
min_t = NULL,
max_t = NULL,
alpha_values = NULL,
k_values = NULL,
t_values = NULL,
possible_snf_schemes = c(1, 2, 3),
clustering_algorithms = NULL,
continuous_distances = NULL,
discrete_distances = NULL,
ordinal_distances = NULL,
categorical_distances = NULL,
mixed_distances = NULL,
snf_input_weights = NULL,
snf_domain_weights = NULL,
retry_limit = 10,
cnt_dist_fns = NULL,
dsc_dist_fns = NULL,
ord_dist_fns = NULL,
cat_dist_fns = NULL,
mix_dist_fns = NULL,
automatic_standard_normalize = FALSE,
use_default_dist_fns = FALSE,
clust_fns = NULL,
use_default_clust_fns = FALSE,
weights_fill = "ones"
)
dl |
A nested list of input data from |
sdf |
A |
dfl |
A |
cfl |
A |
wm |
A |
n_solutions |
Number of rows to generate for the settings data frame. |
min_removed_inputs |
The smallest number of input data frames that may be randomly removed. By default, 0. |
max_removed_inputs |
The largest number of input data frames that may be randomly removed. By default, this is 1 less than all the provided input data frames in the data list. |
dropout_dist |
Parameter controlling how the random removal of input data frames should occur. Can be "none" (no input data frames are randomly removed), "uniform" (uniformly sample between min_removed_inputs and max_removed_inputs to determine number of input data frames to remove), or "exponential" (pick number of input data frames to remove by sampling from min_removed_inputs to max_removed_inputs with an exponential distribution; the default). |
min_alpha |
The minimum value that the alpha hyperparameter can have.
Random assigned value of alpha for each row will be obtained by uniformly
sampling numbers between |
max_alpha |
The maximum value that the alpha hyperparameter can have.
See |
min_k |
The minimum value that the k hyperparameter can have.
Random assigned value of k for each row will be obtained by uniformly
sampling numbers between |
max_k |
The maximum value that the k hyperparameter can have.
See |
min_t |
The minimum value that the t hyperparameter can have.
Random assigned value of t for each row will be obtained by uniformly
sampling numbers between |
max_t |
The maximum value that the t hyperparameter can have.
See |
alpha_values |
A number or numeric vector of a set of possible values
that alpha can take on. Value will be obtained by uniformly sampling the
vector. Cannot be used in conjunction with the |
k_values |
A number or numeric vector of a set of possible values
that k can take on. Value will be obtained by uniformly sampling the
vector. Cannot be used in conjunction with the |
t_values |
A number or numeric vector of a set of possible values
that t can take on. Value will be obtained by uniformly sampling the
vector. Cannot be used in conjunction with the |
possible_snf_schemes |
A vector containing the possible snf_schemes to uniformly randomly select from. By default, the vector contains all 3 possible schemes: c(1, 2, 3). 1 corresponds to the "individual" scheme, 2 corresponds to the "domain" scheme, and 3 corresponds to the "twostep" scheme. |
clustering_algorithms |
A list of clustering algorithms to uniformly randomly pick from when clustering. When not specified, randomly select between spectral clustering using the eigen-gap heuristic and spectral clustering using the rotation cost heuristic. See ?clust_fns_list for more details on running custom clustering algorithms. |
continuous_distances |
A vector of continuous distance metrics to use when a custom dist_fns_list is provided. |
discrete_distances |
A vector of categorical distance metrics to use when a custom dist_fns_list is provided. |
ordinal_distances |
A vector of categorical distance metrics to use when a custom dist_fns_list is provided. |
categorical_distances |
A vector of categorical distance metrics to use when a custom dist_fns_list is provided. |
mixed_distances |
A vector of mixed distance metrics to use when a custom dist_fns_list is provided. |
snf_input_weights |
Nested list containing weights for when SNF is used to merge individual input measures (see ?generate_snf_weights) |
snf_domain_weights |
Nested list containing weights for when SNF is used to merge domains (see ?generate_snf_weights) |
retry_limit |
The maximum number of attempts to generate a novel row.
This function does not return matrices with identical rows. As the range of
requested possible settings tightens and the number of requested rows
increases, the risk of randomly generating a row that already exists
increases. If a new random row has matched an existing row |
cnt_dist_fns |
A named list of continuous distance metric functions. |
dsc_dist_fns |
A named list of discrete distance metric functions. |
ord_dist_fns |
A named list of ordinal distance metric functions. |
cat_dist_fns |
A named list of categorical distance metric functions. |
mix_dist_fns |
A named list of mixed distance metric functions. |
automatic_standard_normalize |
If TRUE, will automatically use standard normalization prior to calculation of any numeric distances. This parameter overrides all other distance functions list-related parameters. |
use_default_dist_fns |
If TRUE, prepend the base distance metrics (euclidean distance for continuous, discrete, and ordinal data and gower distance for categorical and mixed data) to the resulting distance metrics list. |
clust_fns |
A list of named clustering functions |
use_default_clust_fns |
If TRUE, prepend the base clustering algorithms (spectral_eigen and spectral_rot, which apply spectral clustering and use the eigen-gap and rotation cost heuristics respectively for determining the number of clusters in the graph) to clust_fns. |
weights_fill |
String indicating what to populate generate rows with. Can be "ones" (default; fill matrix with 1), "uniform" (fill matrix with uniformly distributed random values), or "exponential" (fill matrix with exponentially distributed random values). |
An snf_config
class object.
# Simple random config for 5 cluster solutions
input_dl <- data_list(
list(anxiety, "anxiety", "behaviour", "ordinal"),
list(depress, "depressed", "behaviour", "ordinal"),
uid = "unique_id"
)
my_sc <- snf_config(
dl = input_dl,
n_solutions = 5
)
# specifying possible K range
my_sc <- snf_config(
dl = input_dl,
n_solutions = 5,
min_k = 20,
max_k = 40
)
# Random feature weights across from uniform distribution
my_sc <- snf_config(
dl = input_dl,
n_solutions = 5,
min_k = 20,
max_k = 40,
weights_fill = "uniform"
)
# Specifying custom pre-built clustering and distance functions
# - Random alternation between 2-cluster and 5-cluster solutions
# - When continuous or discrete data frames are being processed,
# randomly alternate between standardized/normalized Euclidean
# distance vs. regular Euclidean distance
my_sc <- snf_config(
dl = input_dl,
n_solutions = 5,
min_k = 20,
max_k = 40,
weights_fill = "uniform",
clust_fns = list(
"two_cluster_spectral" = spectral_two,
"five_cluster_spectral" = spectral_five
),
cnt_dist_fns = list(
"euclidean" = euclidean_distance,
"std_nrm_euc" = sn_euclidean_distance
),
dsc_dist_fns = list(
"euclidean" = euclidean_distance,
"std_nrm_euc" = sn_euclidean_distance
)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.