Spectral Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  fig.pos = 'H',
  comment = "#>"
)
library(webshot)
library(plotly) 

Package description

Infrared, near-infrared and Raman spectroscopic data measured during chemical reactions, provide structural fingerprints by which molecules can be identified and quantified. The application of these spectroscopic techniques as inline process analytical tools (PAT), provides the (pharma-)chemical industry with novel tools, allowing to monitor their chemical processes, resulting in a better process understanding through insight in reaction rates, mechanistics, stability, etc. Data can be read into R via the generic spc-format, which is generally supported by spectrometer vendor software. Versatile pre-processing functions are available to perform baseline correction by linking to the 'baseline' package; noise reduction via the 'signal' package; as well as time alignment, normalization, differentiation, integration and interpolation. Implementation based on the S4 object system allows storing a pre-processing pipeline as part of a spectral data object, and easily transferring it to other datasets. Interactive plotting tools are provided based on the 'plotly' package. Non-negative matrix factorization (NMF) has been implemented to perform multivariate analyses on individual spectral datasets or on multiple datasets at once. NMF provides a parts-based representation of the spectral data in terms of spectral signatures of the chemical compounds and their relative proportions. The functionality to read in spc-files was adapted from the 'hyperSpec' package.

In this vignette we review the package functionality from loading data to NMF-analysis.

Basic functionality

SpectraInTime-class

Spectral data are represented as en S3-class which containt a data matrix together with meta data such as wavelengths, time points and preprocessing steps executed on the data. An artificial example is added in the package by which we will illustrate the basic functionality.

library( spectralAnalysis )
spectralEx                 <-  getSpectraInTimeExample( )
str( spectralEx)

You can extract slots via specific methods:

dim( spectralEx)
getExperimentName(spectralEx)
getExtraInfo( spectralEx )
getStartTime( spectralEx )
getTimePoints( spectralEx )
getTimePoints( spectralEx , timePointsAlt = TRUE , timeUnit = "seconds" ) 
getTimePoints( spectralEx , timeUnit = "minutes" )
getTimePoints( spectralEx , timeUnit = "hours" )
getUnits( spectralEx )
getTimePoints( spectralEx , timePointsAlt = TRUE ) 
spectra       <-  getSpectra( spectralEx )
dim( spectra )

One can modify slot using specific methods:

setTimePointsAlt( spectralEx  )  <-  getTimePoints( spectralEx )  -  200 

Using these methods, will avoid you to make some errors since validity testing is included.

setTimePointsAlt( spectralEx  )  <-  getTimePoints( spectralEx ) * 5 

to get more information on the SpectraIntime class and its methods:

?SpectraInTime

Reading in data

The function loadAllSPCFiles() allows to read in all spectra data into a list:

allSPCFiles     <- loadAllSPCFiles(directoryFiles)

Subsetting

Different subsetting methods have been defined:

print( r( 2.5 , 10.8) )
print( r( c(1 , 3 , 2 , 6 , 9 , 3 ) ) )

print( e( 1, 3 ,5 ,6 ,7 ,8 ) )

Note that r() and e() work similar to the usual c() function:

# range subsetting 
spectralEx                      <-   getSpectraInTimeExample()
spectraSubset                   <-  spectralEx[ r( 1000 , 30000 ) , r(130 , 135 ) ]
getTimePoints( spectraSubset )
getSpectralAxis(  spectraSubset )
spectraTimeSubset               <-  spectralEx[ r( 1000 , 30000 ) ,  ]
spectraWavelengthSubset         <-  spectralEx[  ,  r(130 , 135 ) ]

# other types of subsetting 
# logical
spectraSubsetLogical   <-  spectralEx[ getTimePoints( spectralEx ) > 300  ,
    getSpectralAxis( spectralEx ) <= 500 ]
subsetLogTimeAlt       <-  spectralEx[ 
    getTimePoints( spectralEx , timePointsAlt = TRUE ) > 0 ,
    getSpectralAxis( spectralEx ) <= 500 ]

# integer

subsetInteger                   <-  spectralEx[ c( 1, 5, 10)  , c( 4 , 4 , 4 , 8 , 16) ] 

# closest element matching 

spectraSubsetElem               <-  spectralEx[ e( 1.234 , 3.579 ) ,
    e( 200.001 , 466.96  ) , timeUnit = "hours" ]
getTimePoints( spectraSubsetElem , timeUnit = "hours" )
getSpectralAxis( spectraSubsetElem  )

Summary

summarySpec       <-  summary( spectralEx )
str( summarySpec )

Graphics

One spectraInTime

4 different visualization options are implemented:

data = getSpectraInTimeExample()
library( plotly )
plot( x =  data[ e( 1 , 2 , 3) , , timeUnit = "hours" ] , type = "time" , timeUnit = "hours" , timePointsAlt = FALSE ) 
plot( x =  data[ , e( seq( 200 , 400 , 50 ) ) ] , type = "spectralAxis" , timeUnit = "minutes" , timePointsAlt = TRUE ) 
plot( x = data , type = "contour" , nColors = 200 , colors = "C" , timeUnit = "seconds", timePointsAlt = TRUE ) 

Note: 3D-plot not run to save space:

plot( x =  data , type = "3D" , timeUnit = "hours" , timePointsAlt = FALSE ) 

note that subsetting with element matching is useful for the time and wavelength views.

List of spectra in time

We have also 2 options to plot a list of SpectraInTime-objects directly.

Line plot:

listOfSpectra     <-  getListOfSpectraExample()
plot( listOfSpectra , times = 1:3 , timeUnit = "hours" , colors = "A" ) 

and contour plot:

plot( listOfSpectra , timeUnit = "hours" , colors = "C" , type = "contour" )

Time alignment

Time alignment of SpectraInTime refers to translating and possibly subsetting the secondary time axis. Such that different experiments have a comparable time origin.

Information to align SpectraInTime is contained in processTimes object for one experiment:

processTimes        <-  getProcessTimesExample() 
processTimes

and processTimesFrame for multiple experiments:

processTimesFrame   <-  getProcessTimesFrameExample()
processTimesFrame

There are 3 methods to align a SpectraInTime:

spectra             <-  getSpectraInTimeExample()
listOfSpectra       <-  getListOfSpectraExample()
pathProcessTimes    <-  getPathProcessTimesExample()

ex1    <-  timeAlign( x = spectra , y = processTimes ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex2    <-  timeAlign( x = listOfSpectra , y = processTimesFrame ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE )
ex3    <-  timeAlign( x = listOfSpectra , y = pathProcessTimes ,
    cutCooling = TRUE , cutBeforeMinTemp = TRUE  , timeFormat =  "%Y-%m-%d %H:%M:%OS" )

Real IR example

We used 2 example spectra from a chemical synthesis monitored by an IR-probe to illustrate preprocessing and NMF-analyis.

exampleData1       <-  readSpectra( system.file( "exampleData/exampleExperiment1.txt" ,
        package = "spectralAnalysis") ) 
plot( exampleData1, type = "contour" )

Pre-processing

Pre-processing is applied to correct for any physical influences such as light scattering. It allows for better interpretation and more precise modeling. Usually several pre-processing steps are applied after each other and the ideal pre-processing procedure depends on the technology and the measured system.

Selection of time and wavelength range

Scientist can often indicate a relevant wavelength range. In this manner we can reduce the number of covariates.

dim( exampleData1 )
wavelengthRange         <-  r ( 800 ,  1625 )
spectralDataSelect      <-  exampleData1[  r( 0 , 5 ) , wavelengthRange , timeUnit = "hours" ]
plot( spectralDataSelect , type =  "contour" ) 

Baseline correction

Baseline correction implies fitting a global function for each measured spectra and subtracting it from the data.

timesToShow           <-  e( 0.5 , 5 )
plot( spectralDataSelect[ timesToShow , , timeUnit = "hours"] , type = "time" ) 
spectralDataBaseline        <-  baselineCorrect( spectralDataSelect ,
    method = 'modpolyfit', degree = 4 )
plot( spectralDataBaseline[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

smoothing

Smoothing or filtering in the wavelength domain can reduce measurement noise, the default smoothing is the Savitky-golay filter, which uses a local polynomial approximation, and which can be also be used to calculate derivatives in the wavelength domain. This can be useful to better detect peaks, especially in NIR-spectra where peaks are not clearly distinguishable otherwise.

spectralDataSmooth       <-  smooth( spectralDataBaseline , window = 5 )
plot( spectralDataSmooth[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

smoothing has little influence here, because the measurements contain little noise.

spectralDataDerivative   <-  smooth(  spectralDataBaseline , derivative = 1  )
plot( spectralDataDerivative[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Normalization

Normalization standardizes the intensities by:

spectralDataNormalized   <-  normalize( spectralDataSmooth , method = "integration"   )
plot( spectralDataNormalized[ timesToShow , , timeUnit = "hours"] , type = "time" ) 

Scatter correction

Multiplicative scatter correction (MSC) removes scatter effects by regressing a spectra in the wavelength domain against a reference spectra and substracting the found intercept and dividing by the slope for each spectra

?scatterCorrrect

Transfer preprocessing steps

preprocessing steps can be replicated on new data:

allSpectraDataProcessed    <-  lapply( listOfSpectra , preprocess , with = spectralDataSmooth  )

NMF-analysis

We will perform NMF-ananalysis on the smoothed and baseline corrected spectra for the time range: 0 - 5 hours assuning 3 chemical components.

spectralExSelect  <-  spectralDataSmooth[ r( 0 , 5 ) , , timeUnit = "hours" ] 
nmfResult    <-  spectralNMF( spectralExSelect , rank = 3 , subsamplingFactor = 5 )
nmfObject    <-  getDimensionReduction( nmfResult , type = "NMF")$NMF
nmfTrends    <-  t( NMF::coef( nmfObject ) )
matplot( nmfTrends , type = "l" , x = getTimePoints( spectralExSelect , timeUnit = "hours"  ) , xlab = "time in hours"  )

NMF-analysis is a useful exploratory tool to get insight in the chemical process.

Including pure-component spectra can increase the quality of NMF-analysis.

see

?spectralNMF


Try the spectralAnalysis package in your browser

Any scripts or data that you put into this service are public.

spectralAnalysis documentation built on Jan. 11, 2023, 5:15 p.m.