This tutorial introduces the isorunN2O packages and provides examples of how it works and can be used. Please install isorunN2O
following the instructions on GitHub, the newest version of this tutorial can always be loaded up as a vignette directly in R by calling vignette("N2O_data_reduction_tutorial")
in the command line.
The package includes an example data set (test_run
) to work with for testing and demonstration purposes. Because the original data files would be too big to include, it is stored as the cached compacted data set that the iso_save()
command from the isoreader package creates from the raw data files. When you run this on your own data sets, simply change the root_folder
to point to where you keep all your data (absolute path or relative to your current working directory), e.g. root_folder <- file.path("MAT", "results")
and the run_folders
to all the run folders you want to read in (can be one or multiple). The first time you load your own data using isoreader it may take a few minutes (speed it up using the parallel = TRUE
parameter) but afterwards it will always be fast because it is already cached. The following data was read using isoreader version r packageVersion("isoreader")
.
library(isoreader) root_folder <- system.file("extdata", package = "isorunN2O") run_folders <- "test_run" # could be multiple, e.g. run_folders <- c("run1", "run2") iso_files <- iso_read_continuous_flow(run_folders, root = root_folder) %>% # filter out files that have reading errors iso_filter_files_with_problems()
The iso_files
variable now holds all your raw data, you can look at the names of all the loaded files by running the following (here only the first 5 for brevity, also note that we're using the %>%
pipe operator to pass output from one function to the next, which might look a little strange at first but makes it more readable further on):
names(iso_files) %>% head(n=5)
# install isoprocessor from github if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("isoprocessor", quietly = TRUE)) devtools::install_github("isoverse/isoprocessor")
You can use the file names to take a look at specific chromatograms using the functionality provided in the isoprocessor package (version r packageVersion("isoprocessor")
).
library(isoprocessor) iso_files$`MAT25392080_P02E_run02_Conditioner-0000.dxf` %>% iso_plot_continuous_flow_data(color = data, panel = NULL)
If you'd like to explore the chromatograms (or really any of the data extracted from the raw data files) more, you can visually explore some of the core functionality of isoreader and isoprocessor using the isoviewer (version r packageVersion("isoprocessor")
) graphical user interface with the command below (use the Close
button in the GUI to return to the interactive R session):
# install isoviewr if not if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("isoprocessor", quietly = TRUE)) devtools::install_github("isoverse/isoviewer")
isoviewer::iso_start_viewer()
Data processing makes use of generic data table and isotope functionality provided by the dplyr, isoreader and isoprocessor packages as well as N2O specific tools implemented in the isorunN2O package.
library(dplyr) library(isoreader) library(isoprocessor) library(isorunN2O)
In the first step, parse the file info from the sequence (take a look at what's available with iso_get_file_info(iso_files)
), pull out the peak table from the iso files (with the file info), then focus only the N2O peak (no need for the references), and focus on the main columns we are interested in. Everything is chained together with the pipe %>%
operator for better readability.
df.raw <- iso_files %>% # extract all sample information (former parse_file_names) iso_mutate_file_info( folder = basename(dirname(file_path)), date = file_datetime, analysis = Analysis, run_number = parse_integer(Row), # category is the first part of the `Identifier 1` category = extract_word(`Identifier 1`, include_underscore = TRUE, include_dash = TRUE), # name is the full `Identifier 1` value name = `Identifier 1`, # valume information is stored in `Identifier 2` volume = parse_number(`Identifier 2`) ) %>% # aggregate peak table iso_get_vendor_data_table(include_file_info = everything()) %>% # select N2O peak select_N2O_peak( c(360, 370)) %>% # select all relevant columns select_columns(folder:volume, area = `Intensity All`, d45 = `d 45N2O/44N2O`, d46 = `d 46N2O/44N2O`)
Now to get a sense for what the data looks like, let's look at the first couple of rows. To look at the complete data frame, you can always call View(df.raw)
or double click on the name in the Environment tab on the upper right.
df.raw %>% head(n=5)
To check the category
makeup of your run, make use of some handy dplyr functionality:
df.raw %>% group_by(category) %>% tally()
Additionally, isoprocessor provides convenience functions for inspecting the data including iso_summarize_data_table()
. Formatting options for data tables are provided by the function kable
from the knitr package, which we'll use here to get column style output.
df.raw %>% group_by(category) %>% # summarize area, d45 and d46 for each category iso_summarize_data_table(area, d45, d46) %>% # format for easier display knitr::kable()
To further hone in on different data groups, simply modify the group_by
:
df.raw %>% group_by(category, name) %>% iso_summarize_data_table(area, d45, d46, cutoff = 3)%>% knitr::kable()
For a visual first look at the data, you can use versatile ?iso_plot_data
function, which generates a ggplot
:
library(ggplot2) df.raw %>% iso_plot_data( x = run_number, y = d45, points = TRUE, # shape = 21 with fill makes data points with black borders shape = 21, fill = category, panel = category ~ . )
or a little bit more elaborate specifying in more detail how to color/fill and panel the overview plot:
df.raw %>% iso_plot_data( x = run_number, y = d45, points = TRUE, size = area, shape = 21, fill = c(type = ifelse(category %in% c("IAEA-NO3", "USGS-34"), name, category)), panel = factor(category, levels = c("N2O", "IAEA-NO3", "USGS-34")) ~ . )
or as an interactive plot (mouse-over information and zooming), which is a little easier for data exploration (make_interactive()
makes the last plot interactive by default):
library(plotly) ggplotly(dynamicTicks = TRUE)
From the first look it is clear that there are couple of things we need to consider, there is one sample that was marked as questionable during injection (#68) which we'd like to exclude for now, there were also a couple of samples that were controls rather than standards and should go into their own category. Lastly, it appears there is some drift so we will want to evaluate that.
df.cat <- df.raw %>% change_category(run_number == 68, "excluded") %>% change_category(name %in% c("IAEA-NO3 37 uM ctrl", "USGS-34 37 uM ctrl"), "control")
The evaluate_drift
function provides a number of different strategies for evaluating drift using different correction methods, here we're trying a polynomial fit (method = "loess"
) and are correcting with the standards as well as N2O. We also want to see a summary plot of the drift using plot = TRUE
(the default), which will plot the drift polynomials on top of the original data (normalized to average isotope values in each group) and the residuals after applying the correction. For details look at the ?evaluate_drift
help. The drift correction stores the drift corrected values in d45.drift
and d46.drift
.
df.drift <- df.cat %>% evaluate_drift( d45, d46, correct = TRUE, plot = TRUE, correct_with = category %in% c("USGS-34", "IAEA-NO3", "N2O"), method = "loess" )
Let's take a quick look how we're doing after drift correction:
df.drift %>% iso_plot_data( x = run_number, y = d45, points = TRUE, shape = 21, fill = category, panel = factor(category, levels = c("N2O", "IAEA-NO3", "USGS-34")) ~ . )
Now that we're drift corrected, time to switch to $\delta^{15}N$ and $\delta^{18}O$ space and calibrate against our standards.
We're doing the O17 correction here (instead of before the drift) but it is a matter of discussion whether drift correction or O17 correction should be applied first. The O17 correction introduces new columns d15.raw
and d18.raw
.
df.O17 <- df.drift %>% correct_N2O_for_17O(d45.drift, d46.drift) %>% # no longer need these columns now that we're in d15 and d18 space select_columns(-d45, -d45.drift, -d46, -d46.drift)
Last steps are calculating the background (see ?calculate_background
), calculating concentrations (see ?calculate_concentrations
) and then calibrating $\delta^{15}N$ and $\delta^{18}O$ (see ?calibrate_d15
and ?calibrate_d18
). Note that the background calculation is not currently used for calibration since only multi-point calibration is implemented but it's a good check to see its value.
df.cal <- df.O17 %>% calculate_background(area) %>% calculate_concentrations(area, volume, conc_pattern = "(\\d+)uM", standards = category %in% c("USGS-34", "IAEA-NO3")) %>% calibrate_d15(d15.raw, standards = c(`USGS-34` = -1.8, `IAEA-NO3` = 4.7)) %>% calibrate_d18(d18.raw, cell_volume = 1.5, standards = c(`USGS-34` = -27.93, `IAEA-NO3` = 25.61))
At the end of the data processing, there are a couple of ways to summarize the data, including the iso_summmarize_data_table()
introduced earlier (here used to compare the raw vs. calibrated values with different groupings), but also generate_parameter_table
, which summarizes all the parameters recorded from the data processing calls:
df.cal %>% group_by(category) %>% iso_summarize_data_table(cutoff = 3, d15.raw, d15.cal, d18.raw, d18.cal) %>% arrange(desc(n)) %>% knitr::kable() df.cal %>% generate_parameter_table() %>% knitr::kable()
And of course visually, e.g. in an interactive plot with additional mouseover info using the label
parameter and iso_format
function:
df.cal %>% iso_plot_data( x = run_number, y = d15.cal, points = TRUE, size = amount, shape = 21, fill = c(type = ifelse(category %in% c("IAEA-NO3", "USGS-34"), name, category)), panel = factor(category, levels = c("N2O", "IAEA-NO3", "USGS-34")) ~ ., label = c(info = iso_format( NULL = name, d15 = round(d15.cal, 2), d18 = round(d18.cal, 2), amount = round(amount, 3) )) ) %>% ggplotly(dynamicTicks = TRUE, tooltip = c("fill", "label"))
And some simpler single data plots (using filter
from the dplyr pakcage) to look specifically at the samples and DPR control.
df.cal %>% filter(category %in% c("P02E", "DPR")) %>% iso_plot_data( x = run_number, y = c(d15.cal, d18.cal), points = TRUE, shape = 21, fill = c(info = paste(category, panel)) ) %>% ggplotly(dynamicTicks = TRUE)
Additional customization is also possible using ggplot
functionality, for example can use the shape
parameter for symbol differentiation, and to visualize all the standards' key values in separate panels, can make use of facet_wrap
:
plot <- df.cal %>% filter(category %in% c("IAEA-NO3", "USGS-34")) %>% iso_plot_data( x = run_number, y = c(amount, d15.cal, d18.cal), points = TRUE, shape = name, fill = name, # use multiple shapes, define with scale_shape_manual label = c(info = iso_format( `#` = run_number, d15 = round(d15.cal, 2), d18 = round(d18.cal, 2) )) ) + facet_wrap(panel ~ category, scales = "free", ncol = 2) + scale_shape_manual(values = 21:25) ggplotly(p = plot, dynamicTicks = TRUE, tooltip = c("fill", "label"))
At any point during the process, if you like to export data as excel, this is easy with iso_export_data_to_excel
(here using a couple of filter options in select
to skip parameters and raw values, and using arrange
to sort the data):
df.cal %>% select(-starts_with("p."), -ends_with(".raw"), -ends_with(".drift"), d15 = d15.cal, d18 = d18.cal) %>% arrange(category, name) %>% iso_export_data_to_excel(filepath = "export.xlsx")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.