In Smith-Group/fitnmr: FitNMR multidimensional spectrum analysis

knitr::opts_knit$set(
  global.par = TRUE
)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = FALSE,
  fig.retina = 2
)
#knitr::opts_chunk$set(optipng = "-o7 -strip all")
#knitr::knit_hooks$set(optipng = knitr::hook_optipng)
options(digits=5, width=90)

FitNMR enables fully automated peak fitting of 2D NMR spectra. This document shows how the 2D fitting can be done with very little manual code entry using three convenience scripts, fit_peaks_2d.R, refit_peaks_2d.R, and assign_peaks_2d.R, that are included with FitNMR. For a description of how a similar workflow can be accomplished using R code and more details about algorithms, see the "Automated 2D Peak Fitting Code" document. The directory containing the three demo scripts that will be used here can be found by running the following from within R:

system.file("demo", package="fitnmr")

For each spectrum, first use NMRPipe to produce an *.ft2 file. The data must only be apodized with the SP function using an exponent of 1 or 2. In addition, use the EXT function to extract the smallest region that contains all the relevant peaks to be fit. Eliminating noise and other extraneous peaks speeds up the processing and simplifies the output.

pdf <- function(file, width=7, height=7, pointsize=12) {

    if (file == "fit_iterations.pdf") {
        file <- "fit_iterations_%02i.pdf"
        width <- 3.4
        height <- 3.4
        pointsize <- 10
    }
    file <- sub(".pdf$", ".png", file)
    width <- min(width, 7)
    height <- min(height, 7)

    grDevices::png(file, width, height, "in", pointsize, res=192)
}

update_cex <- function(file, cex) {

    script_lines <- readLines(file)
    script_lines <- sub("^cex <- 0.2", paste("cex <-", cex), script_lines)
    writeLines(script_lines, file)
}

Fitting Initial Set of Peaks

The fit_peaks_2d.R script handles the initial fitting of peaks. You would typically use this script to find an initial set of peaks on the highest signal-to-noise spectrum in your dataset. The script can be customized by creating a copy then editing the parameters listed at the top prior to running it in R.

In a typical workflow, you would create a directory to perform the initial fitting in, then copy the reference spectrum into that directory. That spectrum must have .ft2 as the extension, otherwise the script won't be able to find it. Ordinarily you would do this manually, but here we will do so in code. The following creates a fit directory and copies the first T~1~ spectrum into it.

t1_dir <- system.file("extdata", "t1", package="fitnmr")
t1_ft2_filenames <- list.files(t1_dir, pattern=".ft2")

dir.create("fit", showWarnings=FALSE)
file.copy(file.path(t1_dir, t1_ft2_filenames[1]), "fit")

Next we need to make a copy of the fit_peaks_2d.R script, which may be easiest to do within R. (Note: if you want to copy the script into the current working directory, change "fit" to "." in the line below.)

file.copy(system.file("demo", "fit_peaks_2d.R", package="fitnmr"), "fit")

update_cex(file.path("fit", "fit_peaks_2d.R"), 0.6)

You can then customize any of the options at the top of fit_peaks_2d.R:

fit_peaks_2d_lines <- readLines(file.path("fit", "fit_peaks_2d.R"))
library_idx <- grep("^library", fit_peaks_2d_lines)
cat(fit_peaks_2d_lines[seq(1, library_idx-3)], sep="\n")

In this case, we will use the script unmodified. The fit_peaks_2d.R script must be run from within the fit directory and the code to do this is shown below. (Note: usually you should have already set the working directory to be fit. If so, omit the setwd() calls below.) As the script runs, a summary of the fitting progress will be automatically printed to the screen.

setwd("fit")
source("fit_peaks_2d.R")
setwd("..")

Output Files

Four output files are created by fit_peaks_2d.R:

noise_histograms.pdf
fit_iterations.pdf
fit_spectra.pdf
fit_volume.csv

list.files("fit")

noise_histograms.pdf gives a histogram of the intensity values for each of the input spectra. The standard deviation of a Gaussian fit to that histogram is used to determine the noise level.

knitr::include_graphics(file.path("fit", "noise_histograms.png"))

fit_iterations.pdf shows each iteration of the fitting, with the data shown in black and the fit contours shown in red. Negative contours are shown with lighter colors. The starting peak position is indicated with a green circle at the highest intensity in the spectrum after previous fits have been subtracted. The blue points show the positions of the peaks in the modeled doublet.

The title text in parentheses gives the reason the next peak was rejected: either the p-value for the peak was greater than f_alpha or the peak had 0 volume. The first line of text under each peak gives the peak number and fraction of the total volume in that iteration. The second line gives the F-test p-value for that peak.

All peaks in a given iteration are fit with the same peak shape parameters, including the peak position (omega0) and doublet scalar coupling (sc). Those peaks are all assigned to the same fitting group (fit) and continue to share the same peak shape parameters in subsequent fit optimizations. This parameter sharing can be important for peaks that are highly overlapped both for accuracy and stability of the fitting algorithm. If desired, you can manually modify the peak grouping in the peak list comma separated value (CSV) file to change which peaks share parameters.

The first four pages of fit_iterations.pdf are shown here:

knitr::include_graphics(file.path("fit", sprintf("fit_iterations_%02i.png", 1:4)))

fit_spectra.pdf gives the fit spectra. The doublet peak positions are shown with semi-transparent blue dots that are scaled in size such that the area is proportional to the peak volume. The first line of text under each peak gives the peak number and the fit group number separated by a colon. The second line gives the F-test p-value for that peak. By default, the lowest contour of this spectrum is set to four times the noise level, but that can be changed using the plot_noise_cutoff script parameter in 'fit_peaks_2d.R'. Furthermore, the size of the labels can be controlled using the cex (i.e. character expansion) script parameter.

knitr::include_graphics(file.path("fit", "fit_spectra.png"))

The fit_volume.csv file contains all the identified peaks, the fitting group for each peak (which in this case is the iteration it was found in), the peak position/shape parameters, and a column for the volume found in each spectrum. Only one spectrum has been fit here so only one column of volumes, '1.ft2', is shown.

csv_table <- read.csv(file.path("fit", "fit_volume.csv"), check.names = FALSE)
csv_table[,"f_pvalue"] <- sprintf("%0.2e", csv_table[,"f_pvalue"])
knitr::kable(csv_table)

Refining the Initial Fit

The initial fit is done iteratively so it is usually a good idea to refine this by editing the fitting groups, deleting extraneous peaks, and then doing a simultaneous refit of all peaks together with the refit_peaks_2d.R script. We will do this in another directory called refine, to which the first spectrum and the refit_peaks_2d.R script should be copied:

dir.create("refine", showWarnings=FALSE)
file.copy(file.path(t1_dir, t1_ft2_filenames[1]), "refine")
file.copy(system.file("demo", "refit_peaks_2d.R", package="fitnmr"), "refine", overwrite=TRUE)

update_cex(file.path("refine", "refit_peaks_2d.R"), 0.6)

Any of the options at the top of refit_peaks_2d.R can then be customized:

refit_peaks_2d_lines <- readLines(file.path("refine", "refit_peaks_2d.R"))
library_idx <- grep("^library", refit_peaks_2d_lines)
cat(refit_peaks_2d_lines[seq(1, library_idx-3)], sep="\n")

However, in this case the script will be used unmodified. The next step is to copy the peak list from the fit directory to a file called start_volume.csv in the new refine directory:

file.copy(file.path("fit", "fit_volume.csv"), file.path("refine", "start_volume.csv"), overwrite=TRUE)

Once that file has been copied, you can then manually change the peak groups and delete rows for any peaks you want to discard. In this case, that will be done with the following R code, which puts peaks 5 and 7 into a new fitting group, and removes peaks 4, 14, and 16:

input_table <- read.csv(file.path("refine", "start_volume.csv"), check.names=FALSE)
input_table[c(5,7),"fit"] <- max(input_table[,"fit"])+1
input_table <- input_table[-c(4,14,16),]
write.csv(input_table, file.path("refine", "start_volume.csv"), row.names=FALSE)

The final step is to call the refit_peaks_2d.R script from within the refine directory:

setwd("refine")
source("refit_peaks_2d.R")
setwd("..")

The refit_peaks_2d.R script first optimizes the volumes, keeping all other parameters fixed, then optimizes the volumes along with whatever other variables the user specifies in the script. Also, unlike the refinement done in "Automated 2D Peak Fitting Code", refit_peaks_2d.R only fits a single spectrum at a time. This avoids two problems:

For many experiments in which a series of spectra are acquired, different amounts of applied power can result in subtle shifts of the temperature/peak positions. Especially for overlappling peaks, this can cause systematic variations in the volumes. Allowing each spectrum to have slightly different peak positions within a small region around the starting values helps avoid this problem.
When many spectra are fit simultaneously, the memory usage for FitNMR can grow very large. This will hopefully be fixed in a future update.

Output Files

The refit_peaks_2d.R script creates two files for each spectrum:

*_fit.pdf
*_volume.csv

list.files("refine")

In practice, each '*' will be replaced by the spectrum filename.

*_fit.pdf shows the overall spectrum as described above. In addition, it also shows the constraints on the omega0 parameters using gray rectangles, the size of which is determined by the omega0_r2_factor parameter. The central peak position, shown as a small blue dot, is constrained to be within that gray rectangle.

knitr::include_graphics(file.path("refine", "1_fit.png"))

*_volume.csv contains all the identified peaks, along with a column for the volume found in that spectrum.

csv_table <- read.csv(file.path("refine", "1_volume.csv"), check.names = FALSE)
knitr::kable(csv_table)

Extending the Fit to Other Spectra

To extend the refined fit to a series of other spectra, we will create another directory called extend. We will then copy both T~1~ spectra into it. In addition, the fit parameter CSV file output from the refinement step will be copied to the file start_volume.csv for use as input.

dir.create("extend", FALSE)
file.copy(file.path(t1_dir, t1_ft2_filenames), "extend")
file.copy(file.path("refine", "1_volume.csv"), file.path("extend", "start_volume.csv"), overwrite=TRUE)

We will also need to copy the refit_peaks_2d.R script into the extend directory:

file.copy(system.file("demo", "refit_peaks_2d.R", package="fitnmr"), "extend", overwrite=TRUE)

update_cex(file.path("extend", "refit_peaks_2d.R"), 0.6)
script_lines <- readLines(file.path("extend", "refit_peaks_2d.R"))
script_lines <- sub("mc_cores <- parallel::detectCores\\(\\)", "mc_cores <- 1", script_lines)
writeLines(script_lines, file.path("extend", "refit_peaks_2d.R"))

However, when extending the fit to other spectra, the parameters determining the peak shape (sc and r2) will be fixed, allowing only the volume and peak position to vary. This is done by changing fit_sc and fit_r2 to FALSE at the top of the refit_peaks_2d.R script, which can be done using a text editor. For the purposes of this demonstration, the two parameters are changed with the following code:

script_lines <- readLines(file.path("extend", "refit_peaks_2d.R"))
script_lines <- sub("fit_sc <- TRUE", "fit_sc <- FALSE", script_lines)
script_lines <- sub("fit_r2 <- TRUE", "fit_r2 <- FALSE", script_lines)
writeLines(script_lines, file.path("extend", "refit_peaks_2d.R"))

The part of the script controlling the parameters to be fit should now read:

extend_peaks_2d_lines <- readLines(file.path("extend", "refit_peaks_2d.R"))
fit_omega0_idx <- grep("^fit_omega0", extend_peaks_2d_lines)
cat(extend_peaks_2d_lines[seq(fit_omega0_idx-1, fit_omega0_idx+7)], sep="\n")

Next, run refit_peaks_2d.R from within the extend directory. It will do the refitting for each spectrum on a different CPU core, reducing the total time taken to fit a large number of spectra.

setwd("extend")
source("refit_peaks_2d.R")
setwd("..")

Output Files

As in the refinement step, the script creates two files for each spectrum, whose names are derived from the spectrum filenames. Here the output associated with 2.ft2 is shown, as the output for 1.ft2 is very similar to the previous section.

list.files("extend")

*_fit.pdf shows the overall spectrum. Pay particular attention to where the small blue circle is located in the gray rectangle. You may want to adjust omega0_r2_factor to increase or decrease the stringency of the omega0 constraint, or possibly choose a different spectrum to use for start_volume.csv.

knitr::include_graphics(file.path("extend", "2_fit.png"))

*_volume.csv contains all the identified peaks, along with a column for the volume found in that spectrum.

csv_table <- read.csv(file.path("extend", "2_volume.csv"), check.names = FALSE)
knitr::kable(csv_table)

Transferring Assignments

FitNMR can also transfer previously determined peak assignments onto the peak lists. This will be done in the extend directory used above. For this tutorial, assignments for the subregion of the spectrum are also provided, which should be copied into that directory and given the name assignments.csv using:

file.copy(system.file("extdata", "t1", "assignments.csv", package="fitnmr"), "extend", overwrite=TRUE)

The assignments.csv file should contain a header and three columns. The text in the header is ignored, but the order of the columns must be:

Peak identifier
PPM in dimension 1
PPM in dimension 2

The assignments.csv file used here contains eight peaks:

csv_table <- read.csv(file.path("extend", "assignments.csv"), check.names = FALSE)
knitr::kable(csv_table)

The script that transfers the assignments is called assign_peaks_2d.R and should also be copied into the extend directory:

file.copy(system.file("demo", "assign_peaks_2d.R", package="fitnmr"), "extend", overwrite=TRUE)

There are a number of parameters at the top of that script that can be customized:

assign_peaks_2d_lines <- readLines(file.path("extend", "assign_peaks_2d.R"))
library_idx <- grep("^library", assign_peaks_2d_lines)
cat(assign_peaks_2d_lines[seq(1, library_idx-3)], sep="\n")

update_cex(file.path("extend", "assign_peaks_2d.R"), 0.6)

The assignment transfer uses a FitNMR function called height_assign() that goes through each peak the spectrum in order of decreasing height/volume, and assigns it to the closest position in the assignment list that is within some threshold distance. Internally, the chemical shifts in each dimension are normalized by the range of values in that dimension. The threshold is given in those normalized units, such that a value of 0.1 would allow a difference of no more than 10% of the spectral width in each dimension, or less for diagonally adjacent peaks. The algorithm does not assign multiple peaks to the same identifier.

To help automatically handle differences in referencing between the spectra and input assignments, the algorithm uses height_assign() in two different stages. The first stage uses a larger threshold to handle cases where the referencing is quite different. After that initial assignment, the median offsets between the spectra and input assignments (in each dimension) are used to shift the input assignments. As long as there are enough isolated peaks that can be correctly assigned in the first stage, this is often sufficient to automatically correct the referencing for a more stringent second stage. The thresholds for both stages (thresh_1 and thresh_2) can be tailored individually. In addition, the assigned chemical shifts can be manually adjusted for each stage (using cs_adj_2 and cs_adj_2) to help in cases where the referencing cannot be automatically corrected.

Because this is a subspectrum, the default values for the thresholds do not work as well. Furthermore, while the whole spectrum has enough peaks to automatically determine the correct offset after stage 1, the subspectrum does not, so the proton assignments are manually shifted 0.02 ppm to the left to compensate. For this tutorial, that is done with the following code:

script_lines <- readLines(file.path("extend", "assign_peaks_2d.R"))
script_lines <- sub("thresh_1 <- 0.1", "thresh_1 <- 0.2", script_lines)
script_lines <- sub("cs_adj_2 <- c\\(0, 0\\)", "cs_adj_2 <- c(-0.02, 0)", script_lines)
script_lines <- sub("thresh_2 <- 0.025", "thresh_2 <- 0.075", script_lines)
writeLines(script_lines, file.path("extend", "assign_peaks_2d.R"))

The modified part of the script should now read:

extend_peaks_2d_lines <- readLines(file.path("extend", "refit_peaks_2d.R"))
thresh_1_idx <- grep("^thresh_1", script_lines)
cat(script_lines[seq(thresh_1_idx-1, thresh_1_idx+6)], sep="\n")

The assign_peaks_2d.R script should then be run from within the extend directory:

setwd("extend")
source("assign_peaks_2d.R")
setwd("..")

Output Files

Three output files are generated by the 'assign_peaks_2d.R' script:

assign_stages.pdf
assign_omegas.pdf
assign_volume.csv

list.files("extend")

assign_stages.pdf shows the two stages used for the assignment transfer. The gray points show the chemical shifts of the input assignments after cs_adj_1 has been applied. The peaks matched to those assignments in stage 1 are connected by gray lines. A larger gray oval around each input assignment shows the size of thresh_1. The adjusted input assignment chemical shifts used for stage 2 are shown in green, with smaller ovals indicating a more stringent thresh_2. Any input assignment not matched to a peak has its label drawn in purple. Likewise, any peak not matched to an input assignment is labeled in purple.

knitr::include_graphics(file.path("extend", "assign_stages.png"))

assign_omegas.pdf shows the final results of the assignment transfer, with the identifier of an assigned peak shown above the peak, and unassigned peaks labeled in purple. In addition, the peak positions from all of the fit spectra are shown as very small points, which are difficult to see here and are best viewed by zooming into the PDF generated by this step. An attempt is made to color each peak in a given group differently.

knitr::include_graphics(file.path("extend", "assign_omegas.png"))

assign_volume.csv contains all the peaks, with the assigned identifier in the first column. The peak positions, scalar couplings, and relaxation rates in this table give the median values from all the individual fits. There is also a column for the volume found in each spectrum.

csv_table <- read.csv(file.path("extend", "assign_volume.csv"), check.names = FALSE)
csv_table[,"assignment"] <- as.character(csv_table[,"assignment"])
csv_table[is.na(csv_table[,"assignment"]),"assignment"] <- ""
knitr::kable(csv_table)