initial_parameter_optimization: Parameter Space Sampling and Optimization Functions for...

View source: R/adaptive_sampling.R

initial_parameter_optimizationR Documentation

Parameter Space Sampling and Optimization Functions for topolow

Description

Performs parameter optimization using Latin Hypercube Sampling (LHS) combined with k-fold cross-validation. Parameters are sampled from specified ranges using maximin LHS design to ensure good coverage of parameter space. Each parameter set is evaluated using k-fold cross-validation to assess prediction accuracy. To calculate one NLL per set of parameters, the function uses a pooled errors approach which combine all validation errors into one set, then calculate a single NLL. This approach has two main advantages: 1- It treats all validation errors equally, respecting the underlying error distribution assumption 2- It properly accounts for the total number of validation points

Note: As of version 2.0.0, this function returns log-transformed parameters directly, eliminating the need to call log_transform_parameters() separately.

Usage

initial_parameter_optimization(
  dissimilarity_matrix,
  mapping_max_iter = 1000,
  relative_epsilon,
  convergence_counter,
  scenario_name,
  N_min,
  N_max,
  k0_min,
  k0_max,
  c_repulsion_min,
  c_repulsion_max,
  cooling_rate_min,
  cooling_rate_max,
  num_samples = 20,
  max_cores = NULL,
  folds = 20,
  verbose = FALSE,
  write_files = FALSE,
  output_dir
)

Arguments

dissimilarity_matrix

Matrix. Input dissimilarity matrix. Must be square and symmetric.

mapping_max_iter

Integer. Maximum number of optimization iterations for each map.

relative_epsilon

Numeric. Convergence threshold for relative change in error.

convergence_counter

Integer. Number of iterations below threshold before declaring convergence.

scenario_name

Character. Name for output files and job identification.

N_min, N_max

Integer. Range for the number of dimensions parameter.

k0_min, k0_max

Numeric. Range for the initial spring constant parameter.

c_repulsion_min, c_repulsion_max

Numeric. Range for the repulsion constant parameter.

cooling_rate_min, cooling_rate_max

Numeric. Range for the cooling rate parameter.

num_samples

Integer. Number of LHS samples to generate. Default: 20.

max_cores

Integer. Maximum number of cores for parallel processing. Default: NULL (uses all but one).

folds

Integer. Number of cross-validation folds. Default: 20.

verbose

Logical. Whether to print progress messages. Default: FALSE.

write_files

Logical. Whether to save results to a CSV file. Default: FALSE.

output_dir

Character. Directory for output files. Required if write_files is TRUE.

Details

Initial Parameter Optimization using Latin Hypercube Sampling

The function performs these steps:

  1. Generates LHS samples in the parameter space (original scale for sampling).

  2. Creates k-fold splits of the input data.

  3. For each parameter set, it trains the model on each fold's training set and evaluates on the validation set, calculating a pooled MAE and NLL across all folds.

  4. Computations are run locally in parallel.

  5. NEW: Automatically log-transforms the final results for direct use with adaptive sampling.

Value

A data.frame containing the log-transformed parameter sets and their performance metrics. Columns include: log_N, log_k0, log_cooling_rate, log_c_repulsion, Holdout_MAE, and NLL.

Note

Breaking Change in v2.0.0: This function now returns log-transformed parameters directly. The returned data frame has columns log_N, log_k0, log_cooling_rate, log_c_repulsion instead of the original scale parameters. This eliminates the need to call log_transform_parameters() separately before using run_adaptive_sampling().

Breaking Change in v2.0.0: The parameter distance_matrix has been renamed to dissimilarity_matrix. Please update your code accordingly.

See Also

euclidean_embedding for the core optimization algorithm.

Examples


# This example can exceed 5 seconds on some systems.
# 1. Create a simple synthetic dataset for the example
synth_coords <- matrix(rnorm(60), nrow = 20, ncol = 3)
dist_mat <- coordinates_to_matrix(synth_coords)

# 2. Run the optimization on the synthetic data
results <- initial_parameter_optimization(
  dissimilarity_matrix = dist_mat,
  mapping_max_iter = 100,
  relative_epsilon = 1e-3,
  convergence_counter = 2,
  scenario_name = "test_opt_synthetic",
  N_min = 2, N_max = 5,
  k0_min = 1, k0_max = 10,
  c_repulsion_min = 0.001, c_repulsion_max = 0.05,
  cooling_rate_min = 0.001, cooling_rate_max = 0.02,
  num_samples = 4,
  max_cores = 1,  # Avoid parallel processing in check environment
  verbose = FALSE
)



topolow documentation built on Aug. 31, 2025, 1:07 a.m.