View source: R/nTARP_complete_with_contextual_variable.R
| nTARP_complete_solution_with_contextual_variable | R Documentation |
#' @keywords internal
nTARP_complete_solution_with_contextual_variable(
data,
number_of_projections,
withinss_threshold,
ids,
contextual_variable,
minimum_cluster_size_percent
)
data |
Numeric matrix — dataset to be clustered using 'nTARP' |
number_of_projections |
Numeric — number of random projections for 'nTARP' to try for each run (usually 1000 to start) |
withinss_threshold |
Numeric — maximum value defining what a "quality cluster" is, based on the solution's normalized within-cluster sum of squares (typically 0.36) |
ids |
Numeric or character vector — identifying labels for individuals in the clusters |
contextual_variable |
Vector of integers or characters — variable to use as the basis for comparing clusters |
minimum_cluster_size_percent |
Numeric — minimum size allowable for a cluster to be further bisected (as a percentage) |
Repeatedly applies 'nTARP' to iteratively bisect a dataset until a minimum cluster size threshold is reached, using a contextual variable to select optimal splits.
At each step, the algorithm evaluates candidate splits based on improvements in class purity of the contextual variable (e.g., Gini reduction). The split that maximizes purity gain is retained.
The process continues recursively, bisecting the largest eligible cluster, until no resulting cluster meets the user-defined minimum size threshold.
A list containing: (1) Complete solutions (i.e., outputs from the 'nTARP' function), (2) Clusters with the best gains identified using the 'pull_best_solution_and_gain' function, (3) Within-cluster sum of squares for each solution, (4) Gains for each solution.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.