initial_parameter_optimization: Parameter Space Sampling and Optimization Functions for...
In topolow: Force-Directed Euclidean Embedding of Dissimilarity Data

initial_parameter_optimization

R Documentation

Parameter Space Sampling and Optimization Functions for topolow

Description

Performs parameter optimization using Latin Hypercube Sampling (LHS) combined with k-fold cross-validation. Parameters are sampled from specified ranges using maximin LHS design to ensure good coverage of parameter space. Each parameter set is evaluated using k-fold cross-validation to assess prediction accuracy. To calculate one NLL per set of parameters, the function uses a pooled errors approach which combine all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points

Note: As of version 2.0.0, this function returns log-transformed parameters directly, eliminating the need to call log_transform_parameters() separately.

Usage

initial_parameter_optimization(
  dissimilarity_matrix,
  mapping_max_iter = 1000,
  relative_epsilon,
  convergence_counter,
  scenario_name,
  N_min,
  N_max,
  k0_min,
  k0_max,
  c_repulsion_min,
  c_repulsion_max,
  cooling_rate_min,
  cooling_rate_max,
  num_samples = 20,
  max_cores = NULL,
  folds = 20,
  verbose = FALSE,
  write_files = FALSE,
  output_dir
)

Arguments

`dissimilarity_matrix`	Matrix. Input dissimilarity matrix. Must be square and symmetric.
`mapping_max_iter`	Integer. Maximum number of optimization iterations for each map.
`relative_epsilon`	Numeric. Convergence threshold for relative change in error.
`convergence_counter`	Integer. Number of iterations below threshold before declaring convergence.
`scenario_name`	Character. Name for output files and job identification.
`N_min`, `N_max`	Integer. Range for the number of dimensions parameter.
`k0_min`, `k0_max`	Numeric. Range for the initial spring constant parameter.
`c_repulsion_min`, `c_repulsion_max`	Numeric. Range for the repulsion constant parameter.
`cooling_rate_min`, `cooling_rate_max`	Numeric. Range for the cooling rate parameter.
`num_samples`	Integer. Number of LHS samples to generate. Default: 20.
`max_cores`	Integer. Maximum number of cores for parallel processing. Default: NULL (uses all but one).
`folds`	Integer. Number of cross-validation folds. Default: 20.
`verbose`	Logical. Whether to print progress messages. Default: FALSE.
`write_files`	Logical. Whether to save results to a CSV file. Default: FALSE.
`output_dir`	Character. Directory for output files. Required if `write_files` is TRUE.

Details

Initial Parameter Optimization using Latin Hypercube Sampling

The function performs these steps:

Generates LHS samples in the parameter space (original scale for sampling).
Creates k-fold splits of the input data.
For each parameter set, it trains the model on each fold's training set and evaluates on the validation set, calculating a pooled MAE and NLL across all folds.
Computations are run locally in parallel.
NEW: Automatically log-transforms the final results for direct use with adaptive sampling.

Value

A data.frame containing the log-transformed parameter sets and their performance metrics. Columns include: log_N, log_k0, log_cooling_rate, log_c_repulsion, Holdout_MAE, and NLL.

Note

Breaking Change in v2.0.0: This function now returns log-transformed parameters directly. The returned data frame has columns log_N, log_k0, log_cooling_rate, log_c_repulsion instead of the original scale parameters. This eliminates the need to call log_transform_parameters() separately before using run_adaptive_sampling().

Breaking Change in v2.0.0: The parameter distance_matrix has been renamed to dissimilarity_matrix. Please update your code accordingly.

Examples


# This example can exceed 5 seconds on some systems.
# 1. Create a simple synthetic dataset for the example
synth_coords <- matrix(rnorm(60), nrow = 20, ncol = 3)
dist_mat <- coordinates_to_matrix(synth_coords)

# 2. Run the optimization on the synthetic data
results <- initial_parameter_optimization(
  dissimilarity_matrix = dist_mat,
  mapping_max_iter = 100,
  relative_epsilon = 1e-3,
  convergence_counter = 2,
  scenario_name = "test_opt_synthetic",
  N_min = 2, N_max = 5,
  k0_min = 1, k0_max = 10,
  c_repulsion_min = 0.001, c_repulsion_max = 0.05,
  cooling_rate_min = 0.001, cooling_rate_max = 0.02,
  num_samples = 4,
  max_cores = 1,  # Avoid parallel processing in check environment
  verbose = FALSE
)

topolow documentation built on Aug. 31, 2025, 1:07 a.m.