R/serrsBayes.R

#' Bayesian modelling and quantification of Raman spectroscopy
#'
#' This R package implements sequential Monte Carlo (SMC) algorithms for fitting a
#' generalised additive mixed model (GAMM) to Raman spectra. These multivariate observations are highly collinear
#' and lend themselves to a reduced-rank representation.  The GAMM separates the
#' hyperspectral signal into three components: a sequence of Lorentzian or Gaussian peaks; a
#' smoothly-varying baseline; and zero-mean, additive white noise. The parameters of each component of
#' the model are estimated iteratively using SMC. The posterior distributions of the parameters
#' given the observed spectra are represented as a population of weighted particles.
#' 
#' Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser.
#' The pattern of peaks in a Raman spectrum corresponds to the vibrational modes of the molecule. The shift in
#' wavenumber of the photons is proportional to the change in energy state, which is reflected in the locations
#' of the peaks. Surface-enhanced Raman scattering (SERS) is a technique that amplifies the Raman
#' signal using metallic substrates, such as nanoparticles. The laser can also be tuned to the resonant frequency
#' of the molecule, which is known as surface-enhanced resonance Raman scattering (SERRS). Under controlled experimental
#' conditions, the amplitudes of the peaks are linearly related to the concentration of the molecule, from the limit
#' of detection (LOD) up to monolayer coverage of the nanoparticle surface.
#'
#' The GAMM represents the peaks and baseline as continuous functions. The background fluorescence is modelled
#' using a penalised cubic spline, while the peaks are an additive mixture of squared exponential (Gaussian) or Lorentzian
#' (Cauchy) kernels:
#' \deqn{Y = \sum_{m=1}^M \alpha_{i,m}B_m(\nu_j) + \sum_{p=1}^P s(\nu_j | l_p, A_p, \phi_p) + \epsilon_{i,j}}
#' where \eqn{Y} is a matrix of hyperspectral observations \eqn{y_{i,j}} that have been discretised at wavenumbers \eqn{\nu_j};
#' \eqn{B_m} are the \eqn{M} spline basis functions with coefficients \eqn{\alpha_{i,m}}; \eqn{s(\nu_j | l_p, A_p, \phi_p)} 
#' are the radial basis functions for each peak, with location \eqn{l_p}, amplitude \eqn{A_p}, and scale \eqn{\phi_p} parameters.
#' \eqn{\epsilon_{i,j}} is assumed to be zero mean, additive white noise with constant variance \eqn{\sigma^2}.
#' 
#' This model-based approach accounts for differences in resolution and experimental conditions,
#' enabling comparison and alignment of heterogeneous spectra. The relationship between concentration
#' and peak intensity can be quantified by fitting a Bayesian functional regression:
#' \deqn{A_p = c_i \beta_p}
#' where \eqn{c_i} is the nanomolar (nM) concentration of the molecule in the \eqn{i}th spectrum,
#' \eqn{c_{LOD} < c_i <= c_{MLC}}. The regression model produces highest posterior density (HPD) intervals for the
#' limit of detection of each peak. A consistent, unbiased estimate of the model evidence (also known as the marginal likelihood)
#' is also computed. This can be used to evaluate whether Gaussian or Lorentzian peaks are a better fit to the data.
#' 
#' @author
#' M. T. Moores, J. Carson & M. Girolami
#' 
#' Maintainer: Matt Moores <mmoores@gmail.com>
#' 
#' @references
#' Moores, Gracie, Carson, Faulds, Graham & Girolami "Bayesian modelling and quantification of Raman spectroscopy," \href{https://arxiv.org/abs/1604.07299}{arXiv preprint}
#' @examples
#' # simulate some data with known parameter values
#' wavenumbers <- seq(700,1400,by=2)
#' spectra <- matrix(nrow=1, ncol=length(wavenumbers))
#' peakLocations <- c(840,  960, 1140, 1220, 1290)
#' peakAmplitude <- c(11500, 2500, 4000, 3000, 2500)
#' peakScale <- c(10, 15, 20, 10, 12)
#' signature <- weightedLorentzian(peakLocations, peakScale, peakAmplitude, wavenumbers)
#' baseline <- 1000*cos(wavenumbers/200) + 2*wavenumbers
#' spectra[1,] <- signature + baseline + rnorm(length(wavenumbers),0,200)
#' plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity")
#' lines(wavenumbers, baseline, col=2, lty=4)
#' lines(wavenumbers, baseline + signature, col=4, lty=2)
#' 
#' # fit the model using SMC
#' lPriors <- list(scale.mu=log(11.6) - (0.4^2)/2, scale.sd=0.4, bl.smooth=10^11, bl.knots=50,
#'                 beta.mu=5000, beta.sd=5000, noise.sd=200, noise.nu=4)
#'
#' \dontrun{
#' ## takes approx. 1 minute for 100 SMC iterations with 10,000 particles
#' result <- fitSpectraSMC(wavenumbers, spectra, peakLocations, lPriors)
#' plot.ts(result$ess, xlab="SMC iterations", ylab="ESS")
#' 
#' # sample 200 particles from the posterior distribution
#' samp.idx <- sample.int(length(result$weights), 200, prob=result$weights)
#' plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity")
#' for (pt in samp.idx) {
#'   bl.est <- result$basis %*% result$alpha[,1,pt]
#'   lines(wavenumbers, bl.est, col="#C3000009")
#'   lines(wavenumbers, bl.est + result$expFn[pt,], col="#0000C309")
#' }
#' }
#' @useDynLib serrsBayes
#' @import RcppEigen
#' @importFrom Rcpp evalCpp
#' @exportPattern "^[[:alpha:]]+"
#' @docType package
#' @name serrsBayes
NULL

.onUnload <- function (libpath) {
  library.dynam.unload("serrsBayes", libpath)
}
mooresm/serrsBayes documentation built on July 2, 2021, 7:36 a.m.