Euclidify: Automatic Euclidean Embedding with Parameter Optimization

View source: R/core.R

EuclidifyR Documentation

Automatic Euclidean Embedding with Parameter Optimization

Description

A user-friendly wrapper function that automatically optimizes parameters and performs Euclidean embedding on a dissimilarity matrix. This function handles the entire workflow from parameter optimization to final embedding.

Usage

Euclidify(
  dissimilarity_matrix,
  output_dir,
  ndim_range = c(2, 10),
  k0_range = c(0.1, 20),
  cooling_rate_range = c(1e-04, 0.1),
  c_repulsion_range = c(1e-04, 1),
  n_initial_samples = 50,
  n_adaptive_samples = 150,
  max_cores = NULL,
  folds = 20,
  mapping_max_iter = 500,
  clean_intermediate = TRUE,
  verbose = "standard",
  fallback_to_defaults = FALSE,
  save_results = FALSE
)

Arguments

dissimilarity_matrix

Square symmetric dissimilarity matrix. Can contain NA values for missing measurements and threshold indicators (< or >).

output_dir

Character. Directory for saving optimization files and results. Required - no default.

ndim_range

Integer vector of length 2. Range for number of dimensions (minimum, maximum). Default: c(2, 10)

k0_range

Numeric vector of length 2. Range for initial spring constant (minimum, maximum). Default: c(0.1, 15)

cooling_rate_range

Numeric vector of length 2. Range for cooling rate (minimum, maximum). Default: c(0.001, 0.07)

c_repulsion_range

Numeric vector of length 2. Range for repulsion constant (minimum, maximum). Default: c(0.001, 0.4)

n_initial_samples

Integer. Number of samples for initial parameter optimization. Default: 100

n_adaptive_samples

Integer. Number of samples for adaptive refinement. Default: 250

max_cores

Integer. Maximum number of cores to use. Default: NULL (auto-detect)

folds

Integer. Number of cross-validation folds. Default: 20

mapping_max_iter

Integer. Maximum iterations for final embedding. Half this value is used for parameter search. Default: 1000

clean_intermediate

Logical. Whether to remove intermediate files. Default: TRUE

verbose

Character. Verbosity level: "off" (no output), "standard" (progress updates), or "full" (detailed output including from internal functions). Default: "standard"

fallback_to_defaults

Logical. Whether to use default parameters if optimization fails. Default: TRUE

save_results

Logical. Whether to save the final positions as CSV. Default: FALSE

Value

A list containing:

positions

Matrix of optimized coordinates

est_distances

Matrix of estimated distances

mae

Mean absolute error

optimal_params

List of optimal parameters found, including cross-validation MAE during optimization

optimization_summary

Summary of the optimization process

data_characteristics

Summary of input data characteristics

runtime

Total runtime in seconds

Examples

# Example 1: Basic usage with small matrix
test_data <- data.frame(
object = rep(paste0("Obj", 1:4), each = 4),
reference = rep(paste0("Ref", 1:4), 4),
score = sample(c(1, 2, 4, 8, 16, 32, 64, "<1", ">12"), 16, replace = TRUE)
)
dist_mat <- list_to_matrix(
  data = test_data,  # Pass the data frame, not file path
  object_col = "object",
  reference_col = "reference",
  value_col = "score",
  is_similarity = TRUE
)
## Not run: 
# Note: output_dir is required for actual use
result <- Euclidify(
  dissimilarity_matrix = dist_mat,
  output_dir = tempdir()  # Use temp directory for example
)
coordinates <- result$positions

## End(Not run)

# Example 2: Using custom parameter ranges
## Not run: 
result <- Euclidify(
  dissimilarity_matrix = dist_mat,
  output_dir = tempdir(),
  n_initial_samples = 10,
  n_adaptive_samples = 7,
  verbose = "off"
)

## End(Not run)

# Example 3: Handling missing data
dist_mat_missing <- dist_mat
dist_mat_missing[1, 3] <- dist_mat_missing[3, 1] <- NA
## Not run: 
result <- Euclidify(
  dissimilarity_matrix = dist_mat_missing,
  output_dir = tempdir(),
  n_initial_samples = 10,
  n_adaptive_samples = 7,
  verbose = "off"
)

## End(Not run)

# Example 4: Using threshold indicators
dist_mat_threshold <- dist_mat
dist_mat_threshold[1, 2] <- ">2"
dist_mat_threshold[2, 1] <- ">2"
## Not run: 
result <- Euclidify(
  dissimilarity_matrix = dist_mat_threshold,
  output_dir = tempdir(),
  n_initial_samples = 10,
  n_adaptive_samples = 7,
  verbose = "off"
)

## End(Not run)

# Example 5: Parallel processing with custom cores
## Not run: 
result <- Euclidify(
  dissimilarity_matrix = dist_mat,
  output_dir = tempdir(),
  max_cores = 4,
  n_adaptive_samples = 100,
  save_results = TRUE  # Save positions to CSV
)

## End(Not run)


topolow documentation built on Aug. 31, 2025, 1:07 a.m.