knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ftirsr) library(dplyr) library(ggplot2) library(tidyr) library(pls)
The ftisr
package provides easy access and user-friendly methods for researchers analyzing Biogenic Silica (BSi). (BSi) is used as a proxy for past temperatures in High Arctic settings. Typically, greater amounts of BSi in sediment cores indicate warmer temperatures. Recently paleoclimatologists have begun to use Fourier Transform Infrared (FTIR) spectroscopy to collect information on BSi. However, the offloaded relative absorbency data makes comparison with other proxies difficult. This project is intended to develop an R package to facilitate analysis of multivariate-length samples using a model and observations in collaboration with the Smith College PLSR SDS Capstone group.
We draw motivation from this previous project. Although packages that contain methods for PLSR already exist in the R universe, none provide the context and specific methods necessary for the implementation of such methods on a calibrated model specifically based on predicting BSi percentages.
# If you haven't installed the remotes package yet, do so: # install.packages("remotes") remotes::install_github("sds270-s22/ftirsr")
# Load package library(ftirsr)
In order to install the ftirsr
package, you will need the following R packages:
dplyr
magrittr
readr
purrr
janitor
fs
tidyr
tibble
pls
knitr
ggplot2
Included in this package are two datasets. Both were sourced from the referenced Capstone Group.
alaska.csv
: A dataset containing absorbance spectra for 103 Alaska lake sediment core samples across 1882 wavenumbers.greenland.csv
: A dataset containing absorbance spectra for 28 Greenland lake sediment core samples across 1882 wavenumbers.The end-user format is tidy (long), with 4 variables:
sample_id
: Core Sample ID referring to location and content of samplebsi
: Percentage of Biogenic Silica in Sample, determined by a wet chemical processwavenumber
: Wavenumber value (units = reciprocal centimeters)absorbance
: Absorbance levelsAlthough these datasets are presented in tidy-format, we leave the option available for a non-tidy format (wide) data set. Both of these datasets may be referred to as Wet Chemistry data.
Anyone who is interested in exploring geology via spectroscopy and BSi. The methods documented within this package are intended to ease the technical aspects of this field.
ftirs
: The ftirs
class attribute allows users to designate objects within their code as ftirs
objects. An ftirs
object is a modified dataframe that is properly formatted such that the dataframe may be used in a PLSR model.read_ftirs
: A method that generates a tidy data frame binding multiple FTIRS samples together. This function takes in a directory file path, reads in each file within the directory, and returns an ftirs
object.
read_wet_chem
: A method that reads and attaches Wet Chemistry data to the FTIRS object. This function is called in read_ftirs()
via the optional wet_chem_path
argument. This method takes in an optional filepath to singular Wet Chemistry Data file to be included in the FTIRS dataframe and the corresponding FTIRS dataframe to have the Wet Chemistry Data attached to. It outputs the modified dataframe.
pivot_longer.ftirs
: A method that pivots a wide, non-tidy FTIRS dataframe to a long, tidy format. This method takes in an ftirs
object that is a wide, non-tidy FTIRS dataframe.
pivot_wider.ftirs
: A method that pivots the FTIRS dataframe to wide, non-tidy format, necessary for input into a PLSR model. This method takes a long-format ftirs
object and returns a wide, non-tidy formatted ftirs
object.
is.ftirs
: A method that checks if an object is an ftirs
object.
as.ftirs
: A method that will attach the ftirs
attribute to an object.
predict.ftirs
: A method that will predict BSi content based on the contained ftirs
model with inputted data.
interpolate_ftirs
: A method to interpolate an fitrs
dataset. This method takes in a series of vectors, wavenumber
, absorbance
, and interpolate_vec
and returns a bound dataframe of these interpolated vectors.
arctic_mod
: A function that returns the PLSR model used by predict.ftirs()
to predict BSi percentages from testing data.
read_ftirs_file
: A function that generates a tibble from a single FTIRS sample. This function takes in a single file path as an input and returns an ftirs
object. This is a function meant to primarily be used only by developers.
read_wet_chem
: A function that reads and attaches Wet Chemistry data to the FTIRS object.
How can we predict the amount of Biogenic Silica within organic samples given a series of data obtained from FTIR spectroscopy?
In this example, we are interpolating the samples onto the vector of rounded wavenumbers used in our model.
my_data <- read_ftirs(dir_path = "samples") head(my_data)
But, we don't have to interpolate! The default is interpolate = TRUE
, but we can set it to interpolate = FALSE
.
# This shows how to read a directory without interpolation # Note the difference in wavenumbers my_data_no_interp <- read_ftirs(dir_path = "samples", interpolate = FALSE) head(my_data_no_interp)
We can include Wet Chemistry data in our ftirs
dataframe. This is necessary to use this data to train a PLS model. (Note: to predict, we don't want to include BSi in our ftirs
dataframe).
# All we have to do is add a path to the file containing the Wet Chemistry data my_data_wet_chem <- read_ftirs(dir_path = "samples", wet_chem_path = "wet-chem-data.csv")
It is also possible to read a single sample file using read_ftirs_file()
.
one_sample <- read_ftirs_file(single_filepath = "samples/FISK-10.0.csv") head(one_sample)
# Data must be in the ftirs wide format in order to use predict method my_data_wide <- my_data %>% pivot_wider() head(my_data_wide[1:8])
# We could get this same result by calling format = "wide" while reading in the samples my_data_wide <- read_ftirs(dir_path = "samples", format = "wide") head(my_data_wide[1:8])
# Data must have the ftirs class to call the correct predict.ftirs() method is.ftirs(my_data_wide)
# Call predict preds <- predict(my_data_wide)
preds
# We can specify the number of components we want to see, as this is inherited from the predict.mvr method # Here we are predicting the same values as above, but choosing only 4 components preds <- predict(my_data_wide, ncomp = 4) preds
# If we want to see the details of the training model, we can call arctic_mod() mod <- arctic_mod() summary(mod)
ggplot(preds[1,], aes(x = FISK.10, y = bsi))
The functionality demonstrated above are just a few common examples to help a user get started using our package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.