liblex: Build a precomputed library of localised experts using...
In resemble: Similarity Retrieval and Local Learning for Spectral Chemometrics

liblex

R Documentation

Build a precomputed library of localised experts using memory-based learning

Description

Constructs a library of local predictive models based on memory-based learning (MBL). For each anchor observation, a local regression model is fitted using its nearest neighbors from the reference set. This implementation is based on the methods proposed in Ramirez-Lopez et al. (2026b).

Usage

liblex(Xr, Yr, neighbors,
       diss_method = diss_pca(ncomp = ncomp_by_opc()),
       fit_method = fit_wapls(min_ncomp = 3, max_ncomp = 15),
       anchor_indices = NULL, gh = TRUE, group = NULL,
       control = liblex_control(), verbose = TRUE, ...)

## S3 method for class 'liblex'
predict(object, newdata, diss_method = NULL,
        weighting = c("gaussian", "tricube", "triweight", "triangular",
                      "quartic", "parabolic", "cauchy", "none"),
        adaptive_bandwidth = TRUE, reliability_weighting = TRUE,
        range_prediction_limits = FALSE, residual_cutoff = NULL,
        enforce_indices = NULL, probs = c(0.05, 0.25, 0.5, 0.75, 0.95),
        verbose = TRUE, allow_parallel = TRUE, blas_threads = 1L, ...)
        
## S3 method for class 'liblex'
plot(x, ...)

Arguments

`Xr`	A numeric matrix of predictor variables with dimensions `⁠n × p⁠` (observations in rows, variables in columns). Column names are required.
`Yr`	A numeric vector or single-column matrix of length `n` containing the response variable values corresponding to `Xr`. Missing values are allowed (see Details).
`neighbors`	A neighbor selection object specifying how to select neighbors. Use `neighbors_k()` for fixed k-nearest neighbors or `neighbors_diss()` for dissimilarity threshold-based selection.
`diss_method`	For `liblex`: either a `⁠diss_⁠` object specifying the dissimilarity method, or a precomputed numeric dissimilarity matrix. If a `⁠diss_⁠` object (e.g., `diss_pca()`, `diss_pls()`, `diss_correlation()`, `diss_euclidean()`), dissimilarities are computed internally. Additional parameters (centering, scaling, number of components) are controlled within the object. If a matrix is provided: When `anchor_indices = NULL`: must be square (`⁠n × n⁠`) with zeros on the diagonal. When `anchor_indices` is specified (length `m`): must have dimensions `⁠n × m⁠`, with `diss_method[anchor_indices, ]` having zeros on the diagonal. Default is `diss_pca(ncomp = ncomp_by_opc())`. For `predict.liblex`: a dissimilarity method object created by `diss_*()` constructors. If not provided, uses the method stored in `object`. Required if `object` was built with a precomputed dissimilarity matrix.
`fit_method`	A `local_fit` object specifying the local regression method. Currently supported: `fit_pls()` and `fit_wapls()`. `fit_gpr()` is not yet supported. Default is `fit_wapls(min_ncomp = 3, max_ncomp = 15)`.
`anchor_indices`	An optional integer vector specifying row indices of `Xr` around which to build local models. If `NULL` (default), models are built for all observations. See Details.
`gh`	Logical indicating whether to compute the GH distance (Mahalanobis distance in PLS score space) for each anchor observation. Default is `TRUE`.
`group`	An optional factor assigning group labels to observations in `Xr`. Used for leave-group-out validation to avoid pseudo-replication. When an observation is selected for validation, all observations from the same group are excluded from model fitting. Default is `NULL` (each observation is its own group).
`control`	A list of control parameters created by `liblex_control()`. Default is `liblex_control()`.
`verbose`	Logical indicating whether to display progress messages. Default is `TRUE`.
`...`	Additional arguments (currently unused).
`object`	A fitted object of class `"liblex"`, created by `liblex` (for `predict`).
`newdata`	A numeric matrix or data frame containing new predictor values. Must include all predictors used in `object`.
`weighting`	Character string specifying the kernel weighting function applied to neighbours when combining predictions. Options are: `"gaussian"` (default), `"tricube"`, `"triweight"`, `"triangular"`, `"quartic"`, `"parabolic"`, `"cauchy"`, or `"none"` (equal weights). See Details for kernel definitions.
`adaptive_bandwidth`	Logical indicating whether to use adaptive bandwidth for kernel weighting. When `TRUE` (default), the bandwidth is set to the maximum dissimilarity within the neighborhood of each observation, so weights adapt to local density. When `FALSE`, a fixed global bandwidth is used across all predictions, which may result in uniform weights in sparse regions or overly concentrated weights in dense regions.
`reliability_weighting`	Logical indicating whether to weight expert predictions by their estimated reliability. When `TRUE` (default), the contribution of each experrt is additionally weighted by the inverse of its cross-validation residual variance, giving more influence to models that performed well during fitting. When `FALSE`, only dissimilarity-based kernel weights are used.
`probs`	A numeric vector of probabilities in `[0, 1]` for computing weighted quantiles of expert predictions. Default is `c(0.05, 0.25, 0.5, 0.75, 0.95)`.
`range_prediction_limits`	Logical. If `TRUE`, predictions falling outside the 5th–95th percentile range of neighbour response values are clipped to those limits. Default is `FALSE`.
`residual_cutoff`	Numeric threshold for excluding models. Models with absolute residuals exceeding this value are penalized during neighbour selection. Default is `NULL` (no exclusion).
`enforce_indices`	Optional integer vector specifying model indices that must always be included in each prediction neighborhood. These models are assigned the minimum dissimilarity of the neighborhood to ensure selection. Default is `NULL` (no enforced models).
`allow_parallel`	Logical indicating whether parallel computation is permitted if a backend is registered. Default is `TRUE`.
`blas_threads`	Integer specifying the number of BLAS threads to use. Default is `1L`. Requires the RhpcBLASctl package for thread control.
`x`	An object of class `"liblex"` as returned by `liblex()`.

Details

By default, local models are constructed for all n observations in the reference set. Alternatively, specify a subset of m observations (m < n) via anchor_indices to reduce computation.

Each local model uses neighbors selected from the full reference set, but models are only built for anchor observations. This is useful for large datasets where building models for all observations is computationally prohibitive.

When dissimilarity methods depend on Yr (e.g., PLS-based distances), the response values of anchor observations are excluded during dissimilarity computation for efficiency. However, anchor response values are always used when fitting local models.

The number of anchors must not exceed 90% of nrow(Xr); to build models for all observations, use anchor_indices = NULL.

Relationship between anchors and neighborhood size

The neighbors argument controls the neighborhood size (k) used both for fitting local models and for retrieving experts during prediction. When anchor_indices is specified, the number of available experts equals the number of anchors. If max(k) exceeds the number of anchors and tuning selects a large optimal k, prediction will retrieve fewer experts than specified. For reliable predictions, ensure the number of anchors is at least as large as the maximum k value being evaluated.

Missing values in Yr

Missing values in Yr are permitted. Observations with missing response values can still serve as neighbors but are excluded from model fitting as target observations.

GH distance

The GH distance is computed independently from diss_method using a PLS projection with optimized component selection. This provides a measure of how far each observation lies from the center of the reference set in the PLS score space.

Validation and tuning

When control$mode = "validate" or control$tune = TRUE, nearest-neighbor cross-validation is performed. For each anchor observation, its nearest neighbor is excluded, a model is fitted on remaining neighbors, and the excluded neighbor's response is predicted. This provides validation statistics for parameter selection.

Prediction

For each observation in newdata, the predict method:

Computes dissimilarities to anchor observations (or their neighbourhood centres) stored in object.
Selects the k nearest neighbours based on the optimal k determined during model fitting.
Applies kernel weighting based on dissimilarity.
Combines expert predictions using weighted averaging.

Kernel weighting functions

The weighting functions follow Cleveland and Devlin (1988). Let d be the normalised dissimilarity (scaled to [0, 1] within the neighbourhood when adaptive_bandwidth = TRUE). The available kernels are:

"gaussian": w = \exp(-d^2)
"tricube": w = (1 - d^3)^3
"triweight": w = (1 - d^2)^3
"triangular": w = 1 - d
"quartic": w = (1 - d^2)^2
"parabolic": w = 1 - d^2
"cauchy": w = 1 / (1 + d^2)
"none": w = 1 (equal weights)

Value

For liblex: A list of class "liblex" (when control$mode = "build") or "liblex_validation" (when control$mode = "validate") containing:

dissimilarity: List containing the dissimilarity method and matrix.
fit_method: Fit constructor from fit_method.
gh: If gh = TRUE, a list with GH distances and the PLS projection.
results: Data frame of validation statistics for each parameter combination (if validation was performed).
best: The optimal parameter combination based on control$metric.
optimal_params: List with optimal k and ncomp values.
residuals: Residuals from predictions using optimal parameters.
coefficients: (Build mode only) List of regression coefficients: B0 (intercepts), B (slopes).
vips: (Build mode only) Variable importance in projection scores.
selectivity_ratios: (Build mode only) Selectivity ratios for each predictor.
scaling: (Build mode only) Centering and scaling vectors for prediction.
neighborhood_stats: Statistics (response quantiles) for each neighborhood size.
anchor_indices: The anchor indices used.
neighbors: The object passed to neighbors.

For predict.liblex: A list with the following components:

predictions: A data frame containing:
- pred: Weighted mean predictions.
- pred_sd: Weighted standard deviation of expert predictions.
- q*: Weighted quantiles at probabilities specified by probs.
- gh: Global Mahalanobis distance (if computed during fitting).
- min_yr: Minimum response value (5th percentile) across neighbours.
- max_yr: Maximum response value (95th percentile) across neighbours.
- below_min: Logical indicating prediction below min_yr.
- above_max: Logical indicating prediction above max_yr.
neighbors: A list with:
- indices: Matrix of neighbour indices (models) for each observation.
- dissimilarities: Matrix of corresponding dissimilarity scores.
expert_predictions: A list with:
- weights: Matrix of kernel weights applied to each expert.
- predictions: Matrix of raw predictions from each expert.
- weighted: Matrix of weighted predictions from each expert.

Author(s)

Leonardo Ramirez-Lopez

References

Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403), 596–610.

Naes, T., Isaksson, T., & Kowalski, B. (1990). Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry, 62(7), 664–673.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. (2013). The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma, 195-196, 268-279.

Ramirez-Lopez, L., Metz, M., Lesnoff, M., Orellano, C., Perez-Fernandez, E., Plans, M., Breure, T., Behrens, T., Viscarra Rossel, R., & Peng, Y. (2026b). Rethinking local spectral modelling: From per-query refitting to model libraries. Analytica Chimica Acta, under review.

Rajalahti, T., Arneberg, R., Berven, F.S., Myhr, K.M., Ulvik, R.J., Kvalheim, O.M. (2009). Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemometrics and Intelligent Laboratory Systems, 95(1), 35-48.

Examples

## Not run: 
library(prospectr)
data(NIRsoil)

# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
  detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc))),
  m = 1, p = 1, w = 7
)

# Missing values in the response are allowed
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_y <- NIRsoil$Ciso[NIRsoil$train == 1]
test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_y  <- NIRsoil$Ciso[NIRsoil$train == 0]

# Build library
model_library <- liblex(
  Xr = train_x,
  Yr = train_y,
  neighbors = neighbors_k(c(30, 40)),
  diss_method = diss_correlation(ws = 27, scale = TRUE),
  fit_method = fit_wapls(
    min_ncomp = 4,
    max_ncomp = 17,
    scale = FALSE,
    method = "mpls"
  ),
  control = liblex_control(tune = TRUE)
)

# Visualise neighborhood centroids and samples to predict
matplot(
  as.numeric(colnames(model_library$scaling$local_x_center)),
  t(test_x),
  col = rgb(1, 0, 0, 0.3),
  lty = 1,
  type = "l",
  xlab = "Wavelength (nm)",
  ylab = "First derivative detrended absorbance"
)
matlines(
  as.numeric(colnames(model_library$scaling$local_x_center)),
  t(model_library$scaling$local_x_center),
  col = rgb(0, 0, 1, 0.3),
  lty = 1,
  type = "l"
)
grid(lty = 1)
legend(
  "topright",
  legend = c("Samples to predict", "Neighborhood centroids"),
  col = c(rgb(1, 0, 0, 0.8), rgb(0, 0, 1, 0.8)),
  lty = 1,
  lwd = 2,
  bty = "n"
)

# Predict new observations
y_hat_liblex <- predict(model_library, test_x)

# Predicted versus observed values
lims <- range(y_hat_liblex$predictions$pred, test_y, na.rm = TRUE)
plot(
  y_hat_liblex$predictions$pred,
  test_y,
  pch = 16,
  col = rgb(0, 0, 0, 0.5),
  xlab = "Predicted",
  ylab = "Observed",
  xlim = lims,
  ylim = lims
)
abline(a = 0, b = 1, col = "red")
grid(lty = 1)

## run liblex in parallel (requires a parallel backend, e.g., doParallel)
library(doParallel)
n_cores <- min(2, parallel::detectCores() - 1)
clust <- makeCluster(n_cores)
registerDoParallel(clust)

model_library2 <- liblex(
  Xr = train_x,
  Yr = train_y,
  neighbors = neighbors_k(c(30, 40)),
  fit_method = fit_wapls(min_ncomp = 4, max_ncomp = 17, method = "simpls")
)

y_hat_liblex2 <- predict(model_library2, test_x)
registerDoSEQ()
try(stopCluster(clust))

## End(Not run)

resemble documentation built on April 21, 2026, 1:07 a.m.

resemble index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

resemble
Similarity Retrieval and Local Learning for Spectral Chemometrics

liblex: Build a precomputed library of localised experts using...
In resemble: Similarity Retrieval and Local Learning for Spectral Chemometrics

Build a precomputed library of localised experts using memory-based learning

Description

Usage

Arguments

Details

Relationship between anchors and neighborhood size

Missing values in Yr

GH distance

Validation and tuning

Prediction

Kernel weighting functions

Value

Author(s)

References

See Also

Examples

Related to liblex in resemble...

R Package Documentation

Browse R Packages

We want your feedback!

resemble Similarity Retrieval and Local Learning for Spectral Chemometrics

liblex: Build a precomputed library of localised experts using... In resemble: Similarity Retrieval and Local Learning for Spectral Chemometrics

Build a precomputed library of localised experts using memory-based learning

Description

Usage

Arguments

Details

Relationship between anchors and neighborhood size

Missing values in Yr

GH distance

Validation and tuning

Prediction

Kernel weighting functions

Value

Author(s)

References

See Also

Examples

Related to liblex in resemble...

R Package Documentation

Browse R Packages

We want your feedback!

resemble
Similarity Retrieval and Local Learning for Spectral Chemometrics

liblex: Build a precomputed library of localised experts using...
In resemble: Similarity Retrieval and Local Learning for Spectral Chemometrics