# Introduction to eemR In eemR: Tools for Pre-Processing Emission-Excitation-Matrix (EEM) Fluorescence Data

library(ggplot2)
library(dplyr)
library(tidyr)
library(eemR)
library(plot3D)

theme_set(theme_bw(base_size = 8))


## Introduction

Dissolved organic matter (DOM) plays a central role in the functioning of aquatic ecosystems. For example, characteristics of the DOM pool (quantity and quality) determine underwater light climate [@Kirk1994], the composition of aquatic microbial communities [@Foreman2003;@Kritzberg2006a] and the carbon cycling on local to global scales [@Cole2007]. Chemically, the DOM pool is complex (> 1500 compounds) and analytical methods used to characterize it are relatively complex, time-consuming and costly [@Benner2002;@Seitzinger2005;@Fellman2010]. This situation called for the development of rapid and cost effective characterization techniques. Because optical properties of DOM can be related to its chemical properties, optical techniques such as fluorescence spectroscopy have been developed and rapidly adopted by the community to characterize the DOM pool in aquatic ecosystems [@Coble1990;@Coble1996;@McKnight2001].

The seminal paper of [@Stedmon2003a] put at the forefront the use of parallel factor analysis (PARAFAC) to aid the characterization of fluorescent DOM. Briefly, this three-way technique allows the decomposition of complex DOM fluorescence signals contained in the excitation-emission matrix (EEM, Fig. 1) into a set of individual chemical components and provides estimations of their relative contribution to the total fluorescence [@Bro1997;@Fellman2010;@Stedmon2003a].

file <- system.file("extdata/cary/scans_day_1", "sample3.csv", package = "eemR")

eem <- eem_read(file, import_function = "cary")

persp3D(
x = eem[[1]]$ex, y = eem[[1]]$em,
z = t(eem[[1]]$x), theta = -37.5, phi = 20, facets = TRUE, xlab = "Excitation (nm)", ylab = "Emisison (nm)", zlab = "Fluorescence (A.U.)", ticktype = "detailed", box = TRUE, expand = 0.5 )  The PARAFAC model is described as [@Bro1997;@Harshman1970]: (@parafac) $$x_{ijk} = \sum_{f=1}^{F} a_{ij}b_{jf}c_{kf} + e_{ijk}$$ where$i = 1, ..., I$;$j = 1, ..., J$;$k = 1, ..., K$,$x_{ijk}$is the intensity of fluorescence of the the$i^{th}$sample at the$j^{th}$emission wavelength at the$k^{th}$excitation wavelength.$a_{ij}$is directly proportional to the concentration of the$f^{th}$component in the sample$i$. Although PARAFAC gained a lot of attention in environmental sciences, it is also widely used in other research fields such as medical, pharmaceutical, food, social and information sciences [@Murphy2013]. Until today, more than 1850 published scientific papers relying on PARAFAC have been identified on Web of Science. Although PARAFAC was made easier using the \pkg{drEEM} MATLAB toolbox [@Murphy2013], preprocessing of EEMs prior to the analysis is still not straightforward. EEM preprocessing is an important part of PARAFAC since it aims to correct any systematic bias in the measurements and to remove signal unrelated to DOM fluorescence [@Murphy2013]. Biased models can be produced if these steps are not conducted carefully (see @Hiriart-Baer2008 where scattering fluorescence signals have been modeled and wrongly interpreted). Such data processing is cumbersome as it involves many steps [@Stedmon2008;@Murphy2013] which are usually executed by hand or within in-house scripting and therefore prone to introduce errors. Another important drawback limiting effective preprocessing of EEMs arise from the wide variety of file formats provided by the different manufacturers of spectrofluorometers that makes data importation difficult to generalize. Possibly reflecting these difficulties, it was recently pointed out that characterization of DOM using fluorescence spectroscopy is still not routinely included in ecological studies [@Fellman2010]. Given the increasing interest for fluorescence spectroscopy in ecology, tools are needed to unify the main preprocessing steps needed for further analyzes such as PARAFAC or metric calculations. The purpose of the \pkg{eemR} \proglang{R} package is to provide a rapid and an elegant interface to perform preprocessing of EEMs as well as to extract common fluorescence-based metrics proposed in the literature to obtain quantitative information about the DOM pool. This paper presents theoretical and mathematical background of the main PARAFAC preprocessing steps and metric calculations with concrete code examples. ## Fluorescence of DOM: theoretical and mathematical background Let us define$X$, an EEM of fluorescence intensities measured along a vector of excitation wavelengths ($ex$) at emission wavelengths ($em$). Usually,$ex$and$em$vary, respectively, between 200-500 nm and 220-600 nm (Fig. 1).$X_{ex, em}$denotes the fluorescence intensity measured at excitation$ex$and emission$em$(ex.:$X_{250, 400}$). The following sections present the main correction steps for fluorescence data aiming to correct any systematic bias in the measurements and remove signal unrelated to fluorescence prior to any analysis. | Correction | Description | |------------------|:--------------------------------:| | Blank subtraction | Subtract a pure water sample blank from the fluorescence data to help the removal of Raman and Rayleigh scattering peaks. | | Scattering removal | Remove the the so-called scattering bands caused by first and second order of Raman and Rayleigh scattering. | | Inner-filter effect correction | Correct for reabsorption of light occurring at both the excitation and emission wavelengths during measurement. | | Raman normalization | Remove the dependency of fluorescence intensities from the measuring equipments thus allowing cross-study comparisons. | ### Scattering correction Rayleigh and Raman scattering are optical processes by which some of the incident energy can be absorbed and converted into vibrational and rotational energy [@Lakowicz2006]. The resulting scattered energy produce the so-called scattering bands which are visually easily identifiable (Figs. 1 and 2). Given that both types of scattering are repeated across EEMs, it is important to remove such artifacts prior to analysis [@Bahram2006;@Zepp2004]. file <- system.file("extdata/cary/scans_day_1", "sample3.csv", package = "eemR") x_raw <- eem_read(file, recursive = TRUE, import_function = "cary") x_cor <- eem_remove_scattering(x_raw, "rayleigh", 1, 12) x_cor <- eem_remove_scattering(x_cor, "raman", 1, 5) ex <- 350 em <- x_raw[[1]]$em

em_raw <- x_raw[[1]]$x[, which(x_raw[[1]]$ex == 350)]
em_cor <- x_cor[[1]]$x[, which(x_cor[[1]]$ex == 350)]

df <- data.frame(em, em_raw, em_cor)
df$em_raman <- df$em_raw
df$em_raman[df$em <= 375] <- NA
df$em_rayleigh <- df$em_raw
df$em_rayleigh[df$em > 375] <- NA

ggplot(df, aes(x = em)) +
geom_line(aes(y = em_rayleigh, color = "Rayleigh scattering"),
size = 0.75,
na.rm = TRUE
) +
geom_line(aes(y = em_raman, color = "Raman scattering"),
size = 0.75,
na.rm = TRUE
) +
geom_line(aes(y = em_cor, color = "Fluorescence signal"),
size = 0.75,
na.rm = TRUE
) +
labs(color = "") +
xlab("Emission (nm)") +
ylab("Fluorescence (A.U)") +
scale_color_manual(values = c("black", "#D55E00", "#0072B2")) +
theme(legend.key = element_blank()) +
theme(
legend.justification = c(1, 1),
legend.position = c(0.9, 0.9)
)


First order of Rayleigh scattering is defined as the region where emission is equal to excitation ($em = ex$) causing a diagonal band in the EEM (Fig. 1) whereas the second order of Rayleigh scattering occurs at two times the emission wavelength of the primary peak ($em = 2ex$). For water, Raman scattering occurs at a wavenumber 3 600 $cm^{-1}$ (or $3.6 \times 10^{10} nm^{-1}$) lower than the incident excitation wavenumber [@Lakowicz2006]. Mathematically, first order Raman scattering is defined as follow:

(@raman1) $$\text{Raman}_{\text{1st}} = -\frac{ex}{0.00036 ex - 1}$$

where $ex$ is the incident excitation wavelength (nm). Second order Raman scattering is then simply defined as:

(@raman2) $$\text{Raman}_{\text{2nd}} = -\frac{2ex}{0.00036 ex - 1}$$

Different interpolation techniques have been proposed to eliminate scattering [@Zepp2004;@Bahram2006]. However, it is a common practice to simply remove the scattering-bands by inserting missing values (Fig. 3) at the corresponding positions [@Murphy2013;@Stedmon2008].

### Inner-filter effect correction

The inner-filter effect (IFE) is an optical phenomenon of reabsorption of emitted light and occurs particularly in highly concentrated samples (Fig. 4). IFE is known to cause underestimation of fluorescence intensities especially at shorter wavelengths and even to alter the shape and the positioning of fluorescence spectra by shifting peak positions toward lower wavelengths (Fig. 4) with increasing concentration [@Mobed1996;@Kothawala2013]. However, it was shown that the loss of fluorescence due to IFE could be estimated from absorbance spectra measured on the same sample using Equation (@ife) [@Ohno2002;@Parker1957]:

(@ife) $$X_0 = \frac{X}{10^{-b(A_{ex} + A_{em})}}$$

where $X_0$ is the fluorescence in the absence of IFE, $X$ is the measured fluorescence intensity, $b$ is half the cuvette pathlength (usually 0.5 cm) for excitation and emission absorbance, $A_{ex}$ is the absorbance at the excitation wavelength $ex$ and $A_{em}$ the absorbance at the emission wavelength $em$ (Fig. 4B).

file <- system.file("extdata/cary/scans_day_1/", "sample3.csv",
package = "eemR"
)

eem <- eem_read(file, recursive = FALSE, import_function = "cary")

eem <- eem_remove_scattering(eem, "rayleigh", 1, 10)
eem <- eem_remove_scattering(eem, "raman", 1, 10)

data("absorbance")

eem_scatter_removed <- eem_inner_filter_effect(eem, absorbance, 1)

jet.colors <- colorRampPalette(c(
"#00007F",
"blue",
"#007FFF",
"cyan",
"#7FFF7F",
"yellow",
"#FF7F00",
"red",
"#7F0000"
))

par(mfrow = c(2, 2), mar = c(4, 4, 1, 2) + 0.1, oma = c(0, 0, 0, 2))

plot3D::image2D(
y = eem[[1]]$em, x = eem[[1]]$ex,
z = t(eem[[1]]$x), xlab = "Excitation (nm.)", ylab = "Emission (nm.)", legend.lab = "IFE correction factors", col = jet.colors(255) ) legend("topleft", expression(bold("A")), text.font = 1, cex = 1.5, bty = "n", adj = c(1.5, 0.25) ) plot(absorbance$wavelength,
absorbance$sample3, type = "l", lwd = 2, col = "red", xlab = "Wavelength (nm)", ylab = "Absorbance" ) legend("topleft", expression(bold("B")), text.font = 1, cex = 1.5, bty = "n", adj = c(1.5, 0.25) ) plot3D::image2D( y = eem[[1]]$em,
x = eem[[1]]$ex, z = t(ife), xlab = "Excitation (nm.)", ylab = "Emission (nm.)", legend.lab = "IFE correction factors", col = rev(jet.colors(255)) ) legend("topleft", expression(bold("C")), text.font = 1, cex = 1.5, bty = "n", adj = c(1.5, 0.25) ) plot3D::image2D( y = eem_scatter_removed[[1]]$em,
x = eem_scatter_removed[[1]]$ex, z = t(eem_scatter_removed[[1]]$x),
xlab = "Excitation (nm.)",
ylab = "Emission (nm.)",
legend.lab = "IFE correction factors",
col = jet.colors(255)
)

legend("topleft",
expression(bold("D")),
text.font = 1,
cex = 1.5,
bty = "n",
)


It was recently shown that IFE corrected algebraically was not appropriate when total absorbance, defined as $A_{\text{total}} = A_{\text{ex}} + A_{\text{em}}$ (see Equation (@ife)), is greater than 1.5 [@Kothawala2013]. Under this circumstance, a two-fold dilution of the sample has been recommended. If this happen, a warning message will be displayed by the package during the correction process.

### Raman calibration

The same DOM sample measured on different spectrofluorometers (or even the same but with different settings) can give important differences in fluorescence intensities [@Lawaetz2009;@Coble1993]. The purpose of the Raman calibration is to remove the dependency of fluorescence intensities on the measuring equipment, thus allowing cross-study comparisons. Given that the Raman peak position of a water sample is located at a fixed position, [@Lawaetz2009] proposed to use the Raman integral of a blank-water sample measured the same day as the EEM to perform calibration. Moreover, the area of the Raman peak ($A_{\text{rp}}$, Fig. 5) is defined as the area of the emission profile between 371 and 428 nm at a fixed excitation of 350 nm [@Lawaetz2009].

load("blank.rda")

raman <- blank[[1]]$x[, which(blank[[1]]$ex == 350)]

df <- data.frame(em = blank[[1]]$em, raman) shade <- rbind(c(371, 0), subset(df, em >= 371 & em <= 428), c(df[nrow(df), "X"], 0)) ggplot(df, aes(x = em, y = raman)) + geom_line(size = 0.2, na.rm = TRUE) + xlab("Emission (nm)") + ylab("Fluorescence (A.U)") + geom_polygon(data = shade, aes(em, raman)) + xlim(300, 450) + annotate("text", x = 400, y = 1, label = "A[rp]", size = 3, parse = TRUE) + annotate("text", x = 390, y = 7.5, label = "First order Rayleigh scattering", size = 3 )  Mathematically, the value of$A_{\text{rp}}$is calculated using the following integral (Equation(@arp)): (@arp) $$A_{\text{rp}} = \int\limits_{\lambda_{\text{em}371}}^{\lambda_{\text{em}428}} W_{350, \lambda} d\lambda$$ where$W_{350, \lambda}$is the fluorescence intensity of a pure water sample (preferably deionized and ultraviolet exposed, @Lawaetz2009) at excitation$ex = 350$nm and at emission$em = \lambda$nm. Each values of the EEM$X$are then normalized using the scalar value of$A_{\text{rp}}$accordingly to Equation (@raman_normalisation): (@raman_normalisation) $$X_0 = \frac{X}{A_{\text{rp}}}$$ where$X_0$is the normalized EEM with fluorescence intensities now expressed as Raman Units (R.U.),$X$are the unnormalized measured fluorescence intensities and$A_{\text{rp}}$is the Raman peak area. ### Metrics A wide range of different metrics obtained from EEMs have been proposed to characterize the DOM pool in aquatic ecosystems. These metrics extract quantitative information in specific regions (wavelengths) in EEMs. The following sections present an overview of the principal metrics supported by the package. #### Coble's peaks The following table presents the five major fluorescent components identified by [@Coble1996] in marine EEMs. Peaks B and T represent protein-like compounds (tyrosine and tryptophane), peaks A and C are indicators of humic-like components whereas peak M was associated to marine humic-like fluorescence. | Peak | Ex (nm) | Em (nm) | |------ |--------- |--------- | | B | 275 | 310 | | T | 275 | 340 | | A | 260 | 380-460 | | M | 312 | 380-420 | | C | 350 | 420-480 | #### Fluorescence, humification and biological indices Three main indices have been proposed to trace the diagnostic state of the DOM pool in aquatic ecosystems. The fluorescence index (FI) was shown to be a good indicator of the general source and aromaticity of DOM in lakes, streams and rivers [@McKnight2001]. This index is calculated as the ratio of fluorescence at emission 450 nm and 500 nm, at fixed excitation of 370 nm (Equation (@fi)). (@fi) $$\text{FI} = \frac{X_{370, 450}}{X_{370, 500}}$$ The humification index (HIX) is a measure of the complexity and the aromatic nature of DOM [@Ohno2002]. HIX calculated as the ratio of the sum of the fluorescence between 435 and 480 nm and between 300 and 345 nm at a fixed excitation of 254 nm (Equation (@hix)). (@hix) $$\text{HIX} = \frac{\sum\limits_{em = 435}^{480} X_{254, em}}{\sum\limits_{em = 300}^{345} X_{254, em}}$$ The biological index (BIX) is a measure to characterize biological production of DOM [@Huguet2009]. BIX is calculated at excitation 310 nm, by dividing the fluorescence intensity emitted at emission 380 nm and at 430 nm (Equation (@bix)). (@bix) $$\text{BIX} = \frac{X_{310, 380}}{X_{310, 430}}$$ ## R code and study case Main preprocessing steps using the eemR package are illustrated using a subset of three EEMs from [@Massicotte2011EA]. Briefly, these EEMs (see Fig. 1 for an example) have been sampled in the St. Lawrence River, one of the largest rivers in North America. Fluorescence matrices of DOM were measured on a Cary Eclipse spectrofluorometer (Varian, Mississauga, Ontario, Canada) over excitation wavelengths between 220 and 450 nm (5-nm increment) and emission wavelengths between 230 and 600 nm (2-nm increment). All functions from the package start with the prefix 'eem_'. library(eemR) ls("package:eemR")  ### Data importation and plotting Importation of EEMs into \proglang{R} is done using the eem_read() function. Given that fluorescence files are dependent on the spectrofluorometer used, \pkg{eemR} will determine automatically from which manufacturer the files are from and load them accordingly. file <- system.file("extdata/cary/scans_day_1", package = "eemR") eems <- eem_read(file, import_function = "cary")  The generic summary() function displays useful information such as: (1) the wavelength ranges used in both emission and excitation modes, (2) the manufacturer from which the file was read and (3) the state of the EEM which indicate which corrections have been applied. summary(eems)  A surface plot of EEMs is made using the plot(x, which = 1) function where which is the index of the EEM to be plotted (see Fig. 3). plot(eems, which = 3)  Interactive plots using a simple shiny app can be lunched to interactively browse EEMs. plot(eems, interactive = TRUE)  ### Blank subtraction Subtraction of a water blank from the measured samples may help to reduce scattering [@Murphy2013;@Stedmon2008]. In \pkg{eemR}, this is done using the eem_remove_blank(eem, blank) function where eem is a list of EEMs and blank is a water blank. file <- system.file("extdata/cary/scans_day_1", "nano.csv", package = "eemR") blank <- eem_read(file, import_function = "cary") eems <- eem_remove_blank(eems, blank)  ### Raman and Rayleigh scattering removal Scattering removal (Equation (@raman1) and Equation (@raman2)) is performed using the eem_remove_scattering(eem, type, order, width) function where eem is a list of EEMs, type is the scattering type (raman or rayleigh), order is the order of the scattering (1 or 2) and width the width in nanometers of the slit windows to be removed. In the following example, only first order and Raman and Rayleigh scattering are removed using a bandwidth of 10 nm (Fig. 3). eems <- eem_remove_scattering(eems, "rayleigh", 1, 10) %>% eem_remove_scattering("raman", 1, 10) plot(eems, which = 3)  ### Inner-filter effect correction IFE correction requires the use of absorbance data (Equation (@ife)). For each EEM, an absorbance spectra must be supplied. The easiest way to provide absorbance is to use a data frame with column names matching EEMs names. In the following data frame, the first column represents the wavelengths at which absorbance have been measured whereas the remaining columns are absorbance spectra for sample1, sample2 and sample3. data("absorbance") head(absorbance)  Note that EEM names can be obtained using the eem_sample_names() function. eem_names(eems)  IFE correction is performed using the eem_inner_filter_effect(eem, absorbance, pathlength) function where eem is a list of EEMs, absorbance is a data frame containing absorbance spectra and pathlength is the absorbance cuvette pathlength expressed in$cm$(Fig. 4B). For each EEM contained in eem, the ranges spanned by the IFE correction factors and total absorbance$A_{\text{total}}$(Equation (@ife)) are displayed to the user. This can serve as diagnostic tool to determine if the mathematical correction was the appropriate method to use to handle IFE. eems <- eem_inner_filter_effect( eem = eems, absorbance = absorbance, pathlength = 1 ) plot(eems, which = 3)  Fig. 4 presents intermediate results obtained for the correction of sample3. Note the nonlinearity of the correction with higher effect at lower wavelengths (bottom-left corner in panel C). The corrected EEM is presented in Fig. 4D which is the result of the operation of dividing matrix in 4A by 4C. ### Raman normalization The last step of the correction process consist to calibrate fluorescence intensities using the Raman scatter peak of water [@Lawaetz2009]. This is performed using the eem_raman_normalisation(eem, blank) function where eem is a list of EEMs and blank is a water blank measured the same day. Here, the same water-blank is used for the three EEMs. Note that the value of the Raman area ($A_{\text{rp}}\$, Equation(@arp)) is printed.

eems <- eem_raman_normalisation(eems, blank)

plot(eems, which = 3)


At this stage, all corrections have been performed and EEMs are ready to be exported into MATLAB for PARAFAC analysis. The state of the EEMs can be verified using the summary() function.

summary(eems)


### Exporting to MATLAB

The drEEM MATLAB toolbox [@Murphy2013] used to perform PARAFAC analysis requires data in a specific format (structure). The eem_export_matlab(file, ...) function can be used to export corrected EEMs into a PARAFAC ready format. The first file argument is the mat file where to export the structure and the second argument ... is one or more eem object.

eem_export_matlab("myfile.mat", eems)


Once exported, one can simply import the generated mat file in MATLAB using load('myfile.mat');.

### Metric extraction

Coble's peaks can be extracted using the eem_coble_peaks(eem) function. Note that for peaks A, M, C, the maximum fluorescence intensity in the range of emission region is returned.

file <- system.file("extdata/cary/scans_day_1", package = "eemR")
eems <- eem_read(file, import_function = "cary")

eem_coble_peaks(eems, verbose = FALSE)


Fluorescence (FI), humification (HIX) and biological (BIX) indices can be extracted as follow.

eem_fluorescence_index(eems, verbose = FALSE)

eem_humification_index(eems, verbose = FALSE)

eem_biological_index(eems, verbose = FALSE)


It should be noted that different excitation and emission wavelengths are often used to measure EEMs. Hence, it is possible to have mismatch between measured wavelengths and wavelengths used to calculate specific metrics. In these circumstances, EEMs are interpolated using the the \pkg{pracma} package [@Borchers2015]. A message warning the user will be displayed if data interpolation is performed. This behavior can be controlled using the verbose = TRUE/FALSE parameter.

## Using R pipeline

Note that it is also possible to use the magrittr pipe line with most functions from eemR.

library(magrittr)

file <- system.file("extdata/cary/scans_day_1/", package = "eemR")
file %>%
eem_read(recursive = TRUE, import_function = "cary") %>%
eem_raman_normalisation() %>%
eem_remove_scattering(type = "raman", order = 1, width = 10) %>%
eem_remove_scattering(type = "rayleigh", order = 1, width = 10) %>%
plot(2)


## Conclusion

\pkg{eemR} provides a flexible interface for manipulating and preprocessing fluorescence matrices based on theoretical and mathematical foundations of fluorescence spectroscopy [@Lakowicz2006]. Furthermore, this \proglang{R} package removes the drawbacks associated with EEM manipulation (dependent on spectrofluorometer manufacturer) and unifies the most important steps involved in EEM preparation in order to correct and remove systematic bias in fluorescence measurements. This will likely contribute to promote the use of fluorescence spectroscopy in various fields.

## Try the eemR package in your browser

Any scripts or data that you put into this service are public.

eemR documentation built on June 27, 2019, 5:08 p.m.