The purpose of this document is to demonstrate a simple workflow for analyzing FTICR MS data from two dissolved organic matter (DOM) samples.
# the global options don't seem to apply to all code chunks: https://community.rstudio.com/t/how-to-eliminate-library-warnings-and-messages-using-knitr/42071/4 knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) options(digits = 8)
library(MSanalyzeNOM) library(tidyverse)
The get_sample_data()
function reads the Excel workbooks exported by PetroOrg, and computes various useful values like the nominal (i.e., average) carbon oxidation state (NOSC). It returns a list for each sample containing the sample name, assigned formulas, "no hits" and other useful information.
# if no filepath is specified, opens file explorer DOM_sample_1 <- get_sample_data() DOM_sample_1$sample_name <- "DOM 1"
The View()
command can be used to display any "spreadsheet" contained in the list, and the resulting table is sortable.
# use View command to display list components View(DOM_sample_1$assigned_formulas)
DOM_sample_4 <- get_sample_data() DOM_sample_4$sample_name <- "DOM 4"
The following section combines both samples into one "spreadsheet" for use in exploring or plotting them together. Because it is not contained in a list, it can be displayed as a sortable table with a click.
combined_DOM_formulas <- bind_rows("DOM 1" = DOM_sample_1$assigned_formulas, "DOM 4" = DOM_sample_4$assigned_formulas, .id = "sample_name") %>% reorder_class_hetero() # ensures consistent order for bar plots containing multiple samples
The plot_mass_spectrum()
command plots a mass spectrum with centroided values for all detected m/z, including assigned formulas and "no hits", based on their relative abundances (max = 100).
Type "?plot_mass_spectrum" for more details (e.g., how to specify an mz range).
plot_mass_spectrum(DOM_sample_1$all_detected_ions, plot_title = DOM_sample_1$sample_name)
plot_mass_spectrum(DOM_sample_4$all_detected_ions, plot_title = DOM_sample_4$sample_name)
The three really abundant peaks can be examined with a sortable table, as described above, or displayed with a few simple tidy
commands. Note that the top 3 formulas are SO3 containing formulas from the same CH2 homologous series (with identical Kendrick mass defects and z values).
#compare to DOM_sample_1 DOM_sample_4$assigned_formulas %>% select(class_hetero, chem_formula, mz, rel_abund, HtoC, ModAI, KMD_CH2, z_CH2) %>% arrange(desc(rel_abund)) %>% head()
The following is an image of C18H28SO3, a common linear alkylbenzene sulfonate (LAS) surfactant found in detergents. They are not extensively branched, but dicarboxylic acids often form during biodegradation, suggesting at least one branch.
Is it aromatic or aliphatic? Technically, both, but there are twice as many aliphatic C (12) as aromatic C (6). As a result, it's H/C ratio is > 1.5.
Using a van Krevelen diagram, which compares the H/C and O/C ratios of all assigned formulas, it's easy to see where the LAS plot appear relative to other assigned formulas.
plot_VK_25perc_groups(DOM_sample_4$assigned_formulas, plot_title = DOM_sample_4$sample_name)
The diagonal line in the van Krevelen diagram is derived from the modified aromaticity index, which essentially determines the density of double bonds in each assigned formula, and estimates the degree of aromaticity after assuming that 50% of the O atoms will form double bonds (e.g. carbonyl and carboxyl groups). According to NMR, these groups are common in DOM.
DOM_sample_4$assigned_formulas %>% dplyr::filter(ModAI == 0.5) %>% plot_VK_flat(plot_title = DOM_sample_4$sample_name)
The same plot command can be used to separate the assigned formulas by elemental class using the panel
and var_panel
arguments.
plot_VK_25perc_groups(DOM_sample_1$assigned_formulas, plot_title = DOM_sample_1$sample_name, panel = TRUE, var_panel = "class_element")
In addition, the same plot command can be used with the combined "spreadsheet" to plot both samples side-by-side. Note that the LAS compounds are also prominent in the DOM 1 sample, even though they didn't clearly stand out.
plot_VK_25perc_groups(combined_DOM_formulas, plot_title = "", panel = TRUE, var_panel = "sample_name")
A simple command will remove common contaminants (e.g., LAS) from the analysis. The result can be saved as a new spreadsheet in the sample list to preserve the original data.
# review common contaminants DOM_sample_4$new_assigned_formulas <- DOM_sample_4$assigned_formulas %>% remove_common_contaminants()
Now, the LAS are gone from each sample, and all relative and percent abundances have been recomputed.
DOM_sample_4$new_assigned_formulas %>% dplyr::select(class_hetero, chem_formula, mz, ppm_error, rel_abund, KMD_CH2, z_CH2) %>% dplyr::arrange(dplyr::desc(rel_abund)) %>% head()
After the common contaminants are removed from both samples, they can be replotted side-by-side.
DOM_sample_1$new_assigned_formulas <- DOM_sample_1$assigned_formulas %>% remove_common_contaminants() combined_DOM_formulas_new <- bind_rows("DOM 1" = DOM_sample_1$new_assigned_formulas, "DOM 4" = DOM_sample_4$new_assigned_formulas, .id = "sample_name") %>% reorder_class_hetero()
plot_VK_25perc_groups(combined_DOM_formulas_new, plot_title = "", panel = TRUE, var_panel = "sample_name")
Similar plotting commands are also available for Kroll diagrams, which plot NOSC as a function of C number. NOSC values are highly correlated with the O/C ratios in van Krevelen diagrams, but Kroll diagrams also offer a size dimension in the form of the C number.
plot_Kroll_25perc_groups(combined_DOM_formulas_new, plot_title = "", panel = TRUE, var_panel = "sample_name")
A few simple tidy
commands can be used to display summary details about the different elemental classes. The CHNO class is definitely more abundant in sample 4.
DOM_sample_1$new_assigned_formulas %>% group_by(class_element) %>% summarize(`num formulas` = n(), `% abund` = sum(perc_abund))
DOM_sample_4$new_assigned_formulas %>% group_by(class_element) %>% summarize(`num formulas` = n(), `% abund` = sum(perc_abund))
The CHNO classes in the respective samples can be visualized by separating the elemental classes when plotting each sample. The increased abundances in the DOM 4 sample are clearly evident in the van Krevelen diagram.
plot_VK_25perc_groups(DOM_sample_1$new_assigned_formulas, plot_title = DOM_sample_1$sample_name, panel = TRUE, var_panel = "class_element")
plot_VK_25perc_groups(DOM_sample_4$new_assigned_formulas, plot_title = DOM_sample_1$sample_name, panel = TRUE, var_panel = "class_element")
Using a few more tidy
commands, it is also possible plot the CHNO components of the two samples side-by-side, while recomputing the CHNO formulas' relative abundances to increase resolution. Here, it is clear that the CHNO compounds in the "Mostly Aromatic" region have expanded.
combined_DOM_formulas_new %>% filter(N > 0, S == 0) %>% compute_perc_abund() %>% # recomputes perc_abund for filtered data only get_25perc_groups() %>% # creates new 25perc_groups for filtered data only plot_VK_25perc_groups(plot_title = "CHNO", panel = TRUE, var_panel = "sample_name")
The same commands can be used with the Kroll plotting functions. Here, it is also clear that the lower molecular weight (lower C number) CHNO formulas are increasing.
combined_DOM_formulas_new %>% filter(N > 0, S == 0) %>% compute_perc_abund() %>% # recomputes perc_abund for filtered data only get_25perc_groups() %>% # creates new 25perc_groups for filtered data only plot_Kroll_25perc_groups(plot_title = "CHNO", panel = TRUE, var_panel = "sample_name")
Bar charts can also be used to examine shifts in CHNO content. There is a plot command for single samples. Because there are so many heteroatom classes, it is beneficial to focus on a subset of the classes, which can be accomplished with a few commands.
DOM_sample_1$new_assigned_formulas %>% filter(N >= 0, N < 4, S == 0) %>% compute_perc_abund() %>% # recomputes perc_abund for filtered data only group_by(class_hetero) %>% summarize(perc_abund = sum(perc_abund)) %>% plot_hetero_class_single()
In addition, there is a separate command for comparing multiple samples side-by_side. Comparing the CHNO formulas from DOM 1 and DOM 4, it becomes apparent that there is a shift to smaller O numbers and larger N numbers, together with the shift to smaller C numbers.
# easier to see by exporting and resizing the image combined_DOM_formulas_new %>% filter(N > 0, N < 4, S == 0) %>% group_by(sample_name) %>% compute_perc_abund() %>% # recomputes perc_abund for filtered data by sample group_by(sample_name, class_hetero) %>% summarize(perc_abund = sum(perc_abund)) %>% # summary data by hetero_class and sample plot_hetero_class_multiple()
Using similar tidy
commands, it is also possible to see the percentage change in N > 1 formulas from DOM 1 to DOM 4.
combined_DOM_formulas_new %>% filter(N > 0, N < 4, S == 0) %>% group_by(sample_name) %>% compute_perc_abund() %>% # recomputes perc_abund for filtered data by sample group_by(sample_name, N) %>% summarize(perc_abund = sum(perc_abund)) %>% # summary data by N number and sample arrange(N)
When it's desirable to contrast two samples from different experimental treatments or environmental conditions, a differential analysis can be performed to determine which formulas have increased in abundance, and which formulas have decreased in abundance, from one sample to the other.
The compute_2sample_peak_differences()
function performs this analysis by subtracting the "sample 1" from "sample 2", such that positive values reflect greater abundances in sample 2, and negative values reflect greater abundances in sample 1. The function takes a dataframe containing the combined samples, with their names specified in a sample_name
column. If the combined dataframe contains more than two samples, integer values can be used to specify the locations of the respective samples in a list of the sample names. The function also prints a message to confirm which samples are being compared.
Finally, the compute_2sample_peak_differences()
function returns a new dataframe containing the joined formulas so that the differences can be visualized or examined in greater detail.
# the first sample is subtracted from the second # values > 0 are greater rel abund in sample 2 # values < 0 are greater rel abund in sample 1 comparison_DOM_formulas <- compute_2sample_peak_differences(combined_DOM_formulas_new)
After the two-sample comparison has been perfomred, differential van Krevelen and Kroll diagrams can be created.
plot_VK_differential(comparison_DOM_formulas, plot_title = "DOM 4 minus DOM 1")
Using a few more tidy
commands, it is also possible plot only the CHNO components of the two samples. Here, it is clear that the CHNO compounds in the "Mostly Aromatic" region have increased in abundance.
combined_DOM_formulas_new %>% filter(N > 0, N < 4, S == 0) %>% group_by(sample_name) %>% compute_100norm_rel_abund() %>% # recomputes rel_abund for filtered data by sample compute_2sample_peak_differences %>% plot_VK_differential(plot_title = "CHNO - DOM 4 minus DOM 1")
The same approach can be used to plot a differential Kroll diagram, where it is clear that smaller, more oxidized CHNO formulas (e.g. more aromatic) are more abundant.
combined_DOM_formulas_new %>% filter(N > 0, N < 4, S == 0) %>% group_by(sample_name) %>% compute_100norm_rel_abund() %>% # recomputes rel_abund for filtered data by sample compute_2sample_peak_differences %>% plot_Kroll_differential(plot_title = "CHNO - DOM 4 minus DOM 1")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.