| liblex | R Documentation |
Constructs a library of local predictive models based on memory-based learning (MBL). For each anchor observation, a local regression model is fitted using its nearest neighbors from the reference set. This implementation is based on the methods proposed in Ramirez-Lopez et al. (2026b).
liblex(Xr, Yr, neighbors,
diss_method = diss_pca(ncomp = ncomp_by_opc()),
fit_method = fit_wapls(min_ncomp = 3, max_ncomp = 15),
anchor_indices = NULL, gh = TRUE, group = NULL,
control = liblex_control(), verbose = TRUE, ...)
## S3 method for class 'liblex'
predict(object, newdata, diss_method = NULL,
weighting = c("gaussian", "tricube", "triweight", "triangular",
"quartic", "parabolic", "cauchy", "none"),
adaptive_bandwidth = TRUE, reliability_weighting = TRUE,
range_prediction_limits = FALSE, residual_cutoff = NULL,
enforce_indices = NULL, probs = c(0.05, 0.25, 0.5, 0.75, 0.95),
verbose = TRUE, allow_parallel = TRUE, blas_threads = 1L, ...)
## S3 method for class 'liblex'
plot(x, ...)
Xr |
A numeric matrix of predictor variables with dimensions |
Yr |
A numeric vector or single-column matrix of length |
neighbors |
A neighbor selection object specifying how to select
neighbors. Use |
diss_method |
For
Default is For |
fit_method |
A |
anchor_indices |
An optional integer vector specifying row indices
of |
gh |
Logical indicating whether to compute the GH distance
(Mahalanobis distance in PLS score space) for each anchor observation.
Default is |
group |
An optional factor assigning group labels to observations in
|
control |
A list of control parameters created by |
verbose |
Logical indicating whether to display progress messages.
Default is |
... |
Additional arguments (currently unused). |
object |
A fitted object of class |
newdata |
A numeric matrix or data frame containing new predictor
values. Must include all predictors used in |
weighting |
Character string specifying the kernel weighting function
applied to neighbours when combining predictions. Options are:
|
adaptive_bandwidth |
Logical indicating whether to use adaptive
bandwidth for kernel weighting. When |
reliability_weighting |
Logical indicating whether to weight expert
predictions by their estimated reliability. When |
probs |
A numeric vector of probabilities in |
range_prediction_limits |
Logical. If |
residual_cutoff |
Numeric threshold for excluding models. Models with
absolute residuals exceeding this value are penalized during neighbour
selection. Default is |
enforce_indices |
Optional integer vector specifying model indices that
must always be included in each prediction neighborhood. These models are
assigned the minimum dissimilarity of the neighborhood to ensure selection.
Default is |
allow_parallel |
Logical indicating whether parallel computation is
permitted if a backend is registered. Default is |
blas_threads |
Integer specifying the number of BLAS threads to use.
Default is |
x |
An object of class |
By default, local models are constructed for all n observations in the
reference set. Alternatively, specify a subset of m observations
(m < n) via anchor_indices to reduce computation.
Each local model uses neighbors selected from the full reference set, but models are only built for anchor observations. This is useful for large datasets where building models for all observations is computationally prohibitive.
When dissimilarity methods depend on Yr (e.g., PLS-based distances), the
response values of anchor observations are excluded during dissimilarity
computation for efficiency. However, anchor response values are always
used when fitting local models.
The number of anchors must not exceed 90% of nrow(Xr); to build models
for all observations, use anchor_indices = NULL.
The neighbors argument controls the neighborhood size (k) used both
for fitting local models and for retrieving experts during prediction.
When anchor_indices is specified, the number of available experts equals
the number of anchors. If max(k) exceeds the number of anchors and
tuning selects a large optimal k, prediction will retrieve fewer experts
than specified. For reliable predictions, ensure the number of anchors is
at least as large as the maximum k value being evaluated.
Missing values in Yr are permitted. Observations with missing response
values can still serve as neighbors but are excluded from model fitting
as target observations.
The GH distance is computed independently from diss_method using a PLS
projection with optimized component selection. This provides a measure of
how far each observation lies from the center of the reference set in the
PLS score space.
When control$mode = "validate" or control$tune = TRUE, nearest-neighbor
cross-validation is performed. For each anchor observation, its nearest
neighbor is excluded, a model is fitted on remaining neighbors, and the
excluded neighbor's response is predicted. This provides validation
statistics for parameter selection.
For each observation in newdata, the predict method:
Computes dissimilarities to anchor observations (or their
neighbourhood centres) stored in object.
Selects the k nearest neighbours based on the optimal
k determined during model fitting.
Applies kernel weighting based on dissimilarity.
Combines expert predictions using weighted averaging.
The weighting functions follow Cleveland and Devlin (1988). Let d be
the normalised dissimilarity (scaled to [0, 1] within the neighbourhood
when adaptive_bandwidth = TRUE). The available kernels are:
"gaussian": w = \exp(-d^2)
"tricube": w = (1 - d^3)^3
"triweight": w = (1 - d^2)^3
"triangular": w = 1 - d
"quartic": w = (1 - d^2)^2
"parabolic": w = 1 - d^2
"cauchy": w = 1 / (1 + d^2)
"none": w = 1 (equal weights)
For liblex: A list of class "liblex" (when
control$mode = "build") or "liblex_validation" (when
control$mode = "validate") containing:
dissimilarity: List containing the dissimilarity method
and matrix.
fit_method: Fit constructor from fit_method.
gh: If gh = TRUE, a list with GH distances and the PLS
projection.
results: Data frame of validation statistics for each
parameter combination (if validation was performed).
best: The optimal parameter combination based on
control$metric.
optimal_params: List with optimal k and ncomp values.
residuals: Residuals from predictions using optimal
parameters.
coefficients: (Build mode only) List of regression
coefficients: B0 (intercepts), B (slopes).
vips: (Build mode only) Variable importance in projection
scores.
selectivity_ratios: (Build mode only) Selectivity ratios
for each predictor.
scaling: (Build mode only) Centering and scaling vectors
for prediction.
neighborhood_stats: Statistics (response quantiles) for
each neighborhood size.
anchor_indices: The anchor indices used.
neighbors: The object passed to neighbors.
For predict.liblex: A list with the following components:
predictions: A data frame containing:
pred: Weighted mean predictions.
pred_sd: Weighted standard deviation of expert
predictions.
q*: Weighted quantiles at probabilities specified by
probs.
gh: Global Mahalanobis distance (if computed during
fitting).
min_yr: Minimum response value (5th percentile) across
neighbours.
max_yr: Maximum response value (95th percentile) across
neighbours.
below_min: Logical indicating prediction below
min_yr.
above_max: Logical indicating prediction above
max_yr.
neighbors: A list with:
indices: Matrix of neighbour indices (models) for each
observation.
dissimilarities: Matrix of corresponding dissimilarity
scores.
expert_predictions: A list with:
weights: Matrix of kernel weights applied to each expert.
predictions: Matrix of raw predictions from each expert.
weighted: Matrix of weighted predictions from each expert.
Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83(403), 596–610.
Naes, T., Isaksson, T., & Kowalski, B. (1990). Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry, 62(7), 664–673.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. (2013). The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex datasets. Geoderma, 195-196, 268-279.
Ramirez-Lopez, L., Metz, M., Lesnoff, M., Orellano, C., Perez-Fernandez, E., Plans, M., Breure, T., Behrens, T., Viscarra Rossel, R., & Peng, Y. (2026b). Rethinking local spectral modelling: From per-query refitting to model libraries. Analytica Chimica Acta, under review.
Rajalahti, T., Arneberg, R., Berven, F.S., Myhr, K.M., Ulvik, R.J., Kvalheim, O.M. (2009). Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemometrics and Intelligent Laboratory Systems, 95(1), 35-48.
liblex_control() for control parameters, neighbors_k() for neighborhood
specification, diss_pca(), diss_pls(), diss_correlation() for
dissimilarity methods, fit_pls(), fit_wapls() for fitting methods.
## Not run:
library(prospectr)
data(NIRsoil)
# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc))),
m = 1, p = 1, w = 7
)
# Missing values in the response are allowed
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_y <- NIRsoil$Ciso[NIRsoil$train == 1]
test_x <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_y <- NIRsoil$Ciso[NIRsoil$train == 0]
# Build library
model_library <- liblex(
Xr = train_x,
Yr = train_y,
neighbors = neighbors_k(c(30, 40)),
diss_method = diss_correlation(ws = 27, scale = TRUE),
fit_method = fit_wapls(
min_ncomp = 4,
max_ncomp = 17,
scale = FALSE,
method = "mpls"
),
control = liblex_control(tune = TRUE)
)
# Visualise neighborhood centroids and samples to predict
matplot(
as.numeric(colnames(model_library$scaling$local_x_center)),
t(test_x),
col = rgb(1, 0, 0, 0.3),
lty = 1,
type = "l",
xlab = "Wavelength (nm)",
ylab = "First derivative detrended absorbance"
)
matlines(
as.numeric(colnames(model_library$scaling$local_x_center)),
t(model_library$scaling$local_x_center),
col = rgb(0, 0, 1, 0.3),
lty = 1,
type = "l"
)
grid(lty = 1)
legend(
"topright",
legend = c("Samples to predict", "Neighborhood centroids"),
col = c(rgb(1, 0, 0, 0.8), rgb(0, 0, 1, 0.8)),
lty = 1,
lwd = 2,
bty = "n"
)
# Predict new observations
y_hat_liblex <- predict(model_library, test_x)
# Predicted versus observed values
lims <- range(y_hat_liblex$predictions$pred, test_y, na.rm = TRUE)
plot(
y_hat_liblex$predictions$pred,
test_y,
pch = 16,
col = rgb(0, 0, 0, 0.5),
xlab = "Predicted",
ylab = "Observed",
xlim = lims,
ylim = lims
)
abline(a = 0, b = 1, col = "red")
grid(lty = 1)
## run liblex in parallel (requires a parallel backend, e.g., doParallel)
library(doParallel)
n_cores <- min(2, parallel::detectCores() - 1)
clust <- makeCluster(n_cores)
registerDoParallel(clust)
model_library2 <- liblex(
Xr = train_x,
Yr = train_y,
neighbors = neighbors_k(c(30, 40)),
fit_method = fit_wapls(min_ncomp = 4, max_ncomp = 17, method = "simpls")
)
y_hat_liblex2 <- predict(model_library2, test_x)
registerDoSEQ()
try(stopCluster(clust))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.