gesearch_control: Control parameters for gesearch

View source: R/gesearch_control.R

gesearch_controlR Documentation

Control parameters for gesearch

Description

Creates a control object specifying algorithm parameters for gesearch.

Usage

gesearch_control(
  retain_by = c("probability", "proportion"),
  percentile_type = 7L,
  tune = FALSE,
  number = 10L,
  p = 0.75,
  stagnation_limit = 5L,
  allow_parallel = TRUE,
  blas_threads = 1L
)

Arguments

retain_by

A character string specifying how training observations are selected at each iteration:

"probability"

(default) Retains observations with errors below a percentile estimated from a given probability. More robust to outliers.

"proportion"

Retains a fixed proportion of observations with lowest errors.

percentile_type

An integer between 1 and 9 specifying the quantile algorithm when retain_by = "probability". Passed to the type argument of quantile. Default is 7. Ignored when retain_by = "proportion".

tune

A logical indicating whether to tune regression parameters (e.g., number of PLS components) via cross-validation at each iteration. Increases computation time substantially. Default is FALSE.

number

An integer specifying the number of groups for leave-group-out cross-validation. Default is 10. This is used for validating the final models built the samples found. When tune = TRUE, this value is also used in the internal CV for tuning regression parameters at each iteration.

p

A numeric value in (0, 1) specifying the proportion of observations per group in leave-group-out cross-validation. Default is 0.75. When tune = TRUE, this value is also used in the internal CV for tuning regression parameters at each iteration.

stagnation_limit

An integer specifying the maximum number of consecutive iterations with no change in gene pool size before early termination. Prevents infinite loops when target size cannot be reached. Default is 5.

allow_parallel

A logical indicating whether to enable parallel processing for internal resampling and calibration. The parallel backend must be registered by the user. Default is TRUE.

blas_threads

An integer specifying the number of BLAS threads to use during computation. Default is 1, which avoids multi-threaded OpenBLAS overhead on Linux. Requires RhpcBLASctl. See Details.

Details

Retention strategies

When retain_by = "probability" (default), observations with errors below a percentile threshold are retained. The percentile is computed using quantile with probs set to the retain value from gesearch. This approach is more robust when outlier observations have extreme error values.

When retain_by = "proportion", a fixed fraction of observations (specified by the retain argument in gesearch) with the lowest associated errors are kept at each iteration.

Cross-validation for tuning

When tune = TRUE, leave-group-out cross-validation is used to select optimal regression parameters at each iteration. The number argument controls how many CV groups are formed, and p controls the proportion of observations in each group.

BLAS threading

On Linux systems with multi-threaded OpenBLAS, the default thread count can cause significant overhead for algorithms that perform many small matrix operations (like the iterative PLS fits in gesearch). Setting blas_threads = 1 (the default) eliminates this overhead.

This setting requires the RhpcBLASctl package. If not installed, the parameter is ignored and a message is displayed. The original thread count is restored when gesearch completes.

Value

A list of class "gesearch_control" containing the specified parameters.

Author(s)

Leonardo Ramirez-Lopez

References

Lobsey, C.R., Viscarra Rossel, R.A., Roudier, P., Hedley, C.B. 2017. rs-local data-mines information from spectral libraries to improve local calibrations. European Journal of Soil Science 68:840-852.

See Also

gesearch, quantile

Examples

# Default parameters (probability-based retention)
gesearch_control()

# Proportion-based retention
gesearch_control(retain_by = "proportion")

# Enable parameter tuning with custom CV settings
gesearch_control(tune = TRUE, number = 5, p = 0.8)


resemble documentation built on April 21, 2026, 1:07 a.m.