Last update: 21.10.2020
Think Globally, Fit Locally (Saul and Roweis, 2003)
resemble package provides high-performing functionality for
data-driven modeling (including local modeling), nearest neighbor search and
orthogonal projections in spectral data.
Check the package vignette(s)!
The core functionality of the package can be summarized into the following functions:
mbl: implements memory-based learning (MBL) for modeling and predicting
continuous response variables. For example, it can be used to reproduce the
famous LOCAL algorithm proposed by Shenk et al. (1997). In general, this
function allows you to easily customize your own MBL regression-prediction
dissimilarity: Computes dissimilarity matrices based on various methods
(e.g. Euclidean, Mahalanobis, cosine, correlation, moving correlation,
Spectral information divergence, principal components dissimilarity and partial
least squares dissimilarity).
ortho_projection: A function for dimensionality reduction using either
principal component analysis or partial least squares (a.k.a projection to
search_neighbors: A function to efficiently retrieve from a reference set
the k-nearest neighbors of another given data set.
During the recent lockdown we invested some of our free time to come up
with a new version of our package. This new
resemble 2.0 comes with MAJOR
improvements and new functions! For these improvements major changes were
required. The most evident changes are in the function and argument names.
These have been now adapted to properly follow the
tydiverse style guide. A number of changes have
been implemented for the sake of computational efficiency. These changes are
New interesing functions and fucntionality are also available, for example,
mbl() function now allows sample spiking, where a
set of reference observations can be forced to be included in the neighborhhoods
of each sample to be predicted. The
serach_neighbors() function efficiently
retrieves from a refence set the k-nearest neighbors of another given data set.
dissimilarity() function computes dissimilarity matrices based on various
If you want to install the package and try its functionality, it is very simple,
just type the following line in your
If you do not have the following packages installed, it might be good to update/install them first
install.packages('Rcpp') install.packages('RcppArmadillo') install.packages('foreach') install.packages('iterators')
Note: Apart from these packages we stronly recommend to download and install
This is important for obtaining the proper C++ toolchain that might be needed
You can also install the development version of
resemble directly from github
resemble you should be also able to run the following lines:
library(resemble) library(tidyr) library(prospectr) data(NIRsoil) # Proprocess the data NIRsoil <- NIRsoil[NIRsoil$CEC %>% complete.cases(),] wavs <- as.numeric(colnames(NIRsoil$spc)) NIRsoil$spc_p <- NIRsoil$spc %>% standardNormalVariate() %>% resample(wavs, seq(min(wavs), max(wavs), by = 11)) %>% savitzkyGolay(p = 1, w = 5, m = 1) # split into calibration/training and test train_x <- NIRsoil$spc_p[as.logical(NIRsoil$train), ] train_y <- NIRsoil$CEC[as.logical(NIRsoil$train)] test_x <- NIRsoil$spc_p[!as.logical(NIRsoil$train), ] test_y <- NIRsoil$CEC[!as.logical(NIRsoil$train)] # Use MBL as in Ramirez-Lopez et al. (2013) sbl <- mbl( Xr = train_x, Yr = train_y, Xu = test_x, k = seq(50, 130, by = 20), method = local_fit_gpr(), control = mbl_control(validation_type = "NNv") ) sbl plot(sbl) get_predictions(sbl) ```` <p align="center"> <img src="./man/figures/mbl.png" width="80%"> </p> Figure 1. Standard plot of the results of the __`mbl`__ function. [`resemble`](http://l-ramirez-lopez.github.io/resemble/) implements functions dedicated to non-linear modelling of complex visible and infrared spectral data based on memory-based learning (MBL, _a.k.a_ instance-based learning or local modelling in the chemometrics literature). The package also includes functions for: computing and evaluate spectral dissimilarity matrices, projecting the spectra onto low dimensional orthogonal variables, spectral neighbor search, etc. ## Memory-based learning (MBL) To expand a bit more the explanation on the `mbl` function, let's define first the basic input data: * __Reference (training) set__: Dataset with *n* reference samples (e.g. spectral library) to be used in the calibration of spectral models. Xr represents the matrix of samples (containing the spectral predictor variables) and Yr represents a response variable corresponding to Xr. * __Prediction set__ : Data set with _m_ samples where the response variable (Yu) is unknown. However it can be predicted by applying a spectral model (calibrated by using Xr and Yr) on the spectra of these samples (Xu). To predict each value in Yu, the `mbl` function takes each sample in Xu and searches in Xr for its _k_-nearest neighbours (most spectrally similar samples). Then a (local) model is calibrated with these (reference) neighbours and it immediately predicts the correspondent value in Yu from Xu. In the function, the _k_-nearest neighbour search is performed by computing spectral dissimilarity matrices between observations. The `mbl` function offers the following regression options for calibrating the (local) models: __`'gpr'`__: Gaussian process with linear kernel. __`'pls'`__: Partial least squares. __`'wapls'`__: Weighted average partial least squares (Shenk et al., 1997). Figure 2 illustrates the basic steps in MBL for a set of five observations. <p align="center"> <img src="./vignettes/MBL.gif" width="50%"> </p> Figure 2. Example of the main steps in memory-based learning for predicting a response variable in five different observations based on set of p-dimesnional variables. ## Citing the package Simply type and you will get the info you need:
citation(package = "resemble") ```
2020.04: Tsakiridis et al. (2020),
used the optmal principal components dissimilarity method implemented in
resemble in combination with convolutional neural networks for simultaneous prediction of soil properties from vis-NIR spectra.
2019-04: Tziolas et al. (2019), used
resemble to investigate on improved MBL methods for quantitative predictions
of soil properties using NIR spectroscopy and geographical information.
2019.03,08: Tsakiridis et al. (2019a) and Tsakiridis et al. (2019b),
compared several machine learning methods for predictive soil spectroscopy and
show that MBL
resemble offers highly competive results.
2020.01: Sanderman et al., (2020) used
resemble for the prediction of soil health indicatorsin the United States.
2019-03: Another paper using
resemble... I published a scientific paper were we used
memory-based learning (MBL) for digital soil mapping. Here we use MBL to remove
local calibration outliers rather than using this approach to overcome the typical
complexity of large spectral datasets. (Ramirez‐Lopez, L., Wadoux, A. C.,
Franceschini, M. H. D., Terra, F. S., Marques, K. P. P., Sayão, V. M., &
Demattê, J. A. M. (2019). Robust soil mapping at the farm scale with vis–NIR
spectroscopy. European Journal of Soil Science. 70, 378–393).
2019-01: In this scientific paper
resemble to model MIR spectra from a continental soil spectral library
in United States. (Dangal, S.R., Sanderman, J., Wills, S. and Ramirez-Lopez,
L., 2019. Accurate and Precise Prediction of Soil Properties from a Large
Mid-Infrared Spectral Library. Soil Systems, 3(1), p.11).
2019-03: Jaconi et al. (2019) implemented a memory-based learning algorithm (using
resemble) to conduct
accurate NIR predictions of soil texture at National scale in Germany.
(Jaconi, A., Vos, C. and Don, A., 2019. Near infrared spectroscopy as an easy
and precise method to estimate soil texture. Geoderma, 337, pp.906-913).
2018-12: Chen, et al. (2018)
implemented a memory-based learning algorithm (using
resemble) to improve the
accuracy of NIR predictions of soil organic matter in China.
(Hong, Y., Chen, S., Liu, Y., Zhang, Y., Yu, L., Chen, Y., Liu, Y., Cheng, H.
and Liu, Y. 2019. Combination of fractional order derivative and memory-based learning
algorithm to improve the estimation accuracy of soil organic matter by visible
and near-infrared spectroscopy. Catena, 174, pp.104-116).
2018-11: In this recent scientific paper the authors used
resemble to predict the chemoical composition of Common Beans in
Spain. (Rivera, A., Plans, M., Sabaté, J., Casañas, F., Casals, J., Rull, A., &
Simó, J. (2018). The Spanish core collection of common beans (Phaseolus
vulgaris L.): an important source of variability for breeding chemical
composition. Frontiers in Plant Science, 9).
2018-07: Another use-case of
resemble is presented by Gholizadeh et al.(2018) for a soil science
application in Czech Republic. (Gholizadeh, A., Saberioon, M., Carmon, N.,
Boruvka, L. and Ben-Dor, E., 2018. Examining the Performance of PARACUDA-II
Data-Mining Engine versus Selected Techniques to Model Soil Carbon from
Reflectance Spectra. Remote Sensing, 10(8), p.1172).
2018-01: Dotto, et al. (2018) have implemented memory-based learning with
resemble to accurately
predict soil organic Carbon at a region in Brazil. (Dotto, A. C., Dalmolin,
R. S. D., ten Caten, A., & Grunwald, S. (2018). A systematic study on the
application of scatter-corrective and spectral-derivative preprocessing for
multivariate prediction of soil organic carbon by Vis-NIR spectra. Geoderma,
2017-11: Here the authors predicted brix values in differet food products using memory-based learning implemented with
(Kopf, M., Gruna, R., Längle, T. and Beyerer, J., 2017, March. Evaluation and
comparison of different approaches to multi-product brix calibration in
near-infrared spectroscopy. In OCM 2017-Optical Characterization of Materials-conference proceedings (p. 129). KIT Scientific Publishing).
2016-05: In this scientific paper the authors sucesfully used
resemble to predict soil organic carbon content at
national scale in France. (Clairotte, M., Grinand, C., Kouakoua, E., Thébault, A.,
Saby, N. P., Bernoux, M., & Barthès, B. G. (2016). National calibration of soil
organic carbon concentration using diffuse infrared reflectance spectroscopy.
Geoderma, 276, 41-52).
2016-04: This paper shows some interesting results on applying memory-based learning to predict soil properties.
2016-04: In some recent entries of this blog,
the author shows some exmaples on the use
2016-02: As promised,
resemble 1.2 (alma-de-coco) is now available on CRAN.
2016-01: The version 1.2 (alma-de-coco) has been submitted to CRAN and is available from the github repository!
2015-11: A pre-release of the version 1.2.0 (126.96.36.19900 alma-de-coco) is
resemble is now faster! Some critical functions (e.g. pls and
gaussian process regressions were re-written in C++ using
Rcpp. This time the
new version will be available at CRAN very soon!.
2015-11 Well, the version 1.1.3 was never released on CRAN since we decided to carry out major improvements in terms of computational performance.
2014-10: A pre-release of the version 1.1.3 of the package is already available at this website. We hope it will be available at CRAN very soon!
2014-06: Check this video where a renowned NIR scientist talks about local calibrations.
2014-04: A short note on the resemble and prospectr packages was published in this newsletter. There we provide some examples on representative subset selection and on how to reproduce the LOCAL and spectrum-based learner algorithms. In those examples the dataset of the Chemometric challenge of 'Chimiométrie 2006' (included in the prospectr package) is used.
2014-03: The package released on CRAN!
Lobsey, C. R., Viscarra Rossel, R. A., Roudier, P., & Hedley, C. B. 2017. rs-local data-mines information from spectral libraries to improve local calibrations. European Journal of Soil Science, 68(6), 840-852.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Saul, L. K., & Roweis, S. T. 2003. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of machine learning research, 4(Jun), 119-155.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy, 5, 223-232.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.