nTARP_complete_solution_with_contextual_variable: Run nTARP repeatedly in a bisecting fashion (using contextual...

View source: R/nTARP_complete_with_contextual_variable.R

nTARP_complete_solution_with_contextual_variableR Documentation

Run nTARP repeatedly in a bisecting fashion (using contextual variable)

Description

#' @keywords internal

Usage

nTARP_complete_solution_with_contextual_variable(
  data,
  number_of_projections,
  withinss_threshold,
  ids,
  contextual_variable,
  minimum_cluster_size_percent
)

Arguments

data

Numeric matrix — dataset to be clustered using 'nTARP'

number_of_projections

Numeric — number of random projections for 'nTARP' to try for each run (usually 1000 to start)

withinss_threshold

Numeric — maximum value defining what a "quality cluster" is, based on the solution's normalized within-cluster sum of squares (typically 0.36)

ids

Numeric or character vector — identifying labels for individuals in the clusters

contextual_variable

Vector of integers or characters — variable to use as the basis for comparing clusters

minimum_cluster_size_percent

Numeric — minimum size allowable for a cluster to be further bisected (as a percentage)

Details

Repeatedly applies 'nTARP' to iteratively bisect a dataset until a minimum cluster size threshold is reached, using a contextual variable to select optimal splits.

At each step, the algorithm evaluates candidate splits based on improvements in class purity of the contextual variable (e.g., Gini reduction). The split that maximizes purity gain is retained.

The process continues recursively, bisecting the largest eligible cluster, until no resulting cluster meets the user-defined minimum size threshold.

Value

A list containing: (1) Complete solutions (i.e., outputs from the 'nTARP' function), (2) Clusters with the best gains identified using the 'pull_best_solution_and_gain' function, (3) Within-cluster sum of squares for each solution, (4) Gains for each solution.


nTARP documentation built on March 20, 2026, 5:09 p.m.