In benjbuch/summerrmass: Measure and Analyze MALDI Peaks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)

options(summerr.debug = FALSE)

Introduction

This vignette will outline a specfic use-case of the the summerrmass package:

We extract the peak intensity in MALDI spectra at three selected ions (given by specific m/z ratios) from an array of MALDI measurements.
Since some of these intensities decline in the presence of an inhibitor, which was added in various concentrations, we want to calculate the IC50 in this assay.

The challenges are the following:

The MALDI spectra reside within folders that only identify the "wells", i.e., specific positions on the plate that was placed into the MALDI instrument for measuring. Thus, the paths to each spectrum are "./some_measurement/0_E7/1/1Ref/Analysis.mzXML" or ".../0_E8/..." etc. We don't want to import all the spectra by hand, nor to rename the file paths.

A snapshot of the directory in which the collected data resides. {width=50%}

We have a nice Excel sheet that specifies which compound and concentration was used in each well. We just need to merge this information somehow.

A sample plate layout in an Excel sheet. The top row indicates the concentration of compound that was added. (This vignette illustrates data for "Compound 10" only.) {width=50%}

Let's start:

library(summerr)
library(summerrmass)

Extracting selected peak intensities from MALDI spectra {#sec-extract}

Here, we will use the pre-built pipeline that ships with the package. If you want to learn how to built up a smaller analysis step-by-step, check out the examples in the documentation (or the vignette on the package's basic functions).

To adapt the pipeline to your needs, some arguments need probably to change. maldi_batch is written in such a way that you can not only modify the individual parameters, but even exchange most of the functions that are used for in each step.[^1]

[^1]: Provided that the modified functions return objects of similar type and structure to the original ones.

Here is how the full call to maldi_batch looks like. I have included most of the defaults so that you get an idea of what can be changed. (*) marks arguments which you are most likely to adapt to your own needs.

data_maldi <- maldi_batch(
  path  = NULL,  # will prompt for the direcotry to analyze
  layout_file = NULL,  # will prompt for each sample group
  pivot = "[0-9]_[A-Z]+[0-9]+",  # regex to match the "well" folders (*)
  # pre-processing of spectra
  FUN_spect = maldi_average_by_well,
  MoreArgs_spect = list(pivot = "[0-9]_[A-Z]+[0-9]+", # (*)
                        final_trim_range = c(2420, 2445),  # (*)
                        method_baseline = "SNIP",
                        method_average  = "mean"), 
  # detecting peaks
  FUN_peaks = maldi_find_peaks_by_well,
  MoreArgs_peaks = list(pivot = "[0-9]_[A-Z]+[0-9]+",  # (*)
                        mass_list = c(mC = 2425, hmC = 2441, fC = 2439),  # (*)
                        tolerance_assignment = 0.5,
                        SNR = 3,
                        method = "MAD"),
  # drawing spectra
  MoreArgs_draw = list(ncol = 2, nrow = 6),
  # associating metadata
  FUN_import_layout = import_layout_from_excel,
  MoreArgs_layout = list(sheet = 1,
                         index_row = "2",
                         index_col = "A",
                         data_upper_left = "B3",
                         plate_nrow = 16,
                         plate_ncol = 24,
                         meta_row = list(
                           # metadata stored in rows
                           concentration = "1"),
                         meta_col = list(
                           # metadata stored in columns
                         ))
)

The most important argument the pipeline is the pivot, which is a regular expression to identify a folder in a path that (1) groups the measurements from the same sample, i.e., the same well, and (2) that is used to associate metadata, e.g., treatment conditions, with the sample. You may want to check out ?import_layout_from_paths for the technical details.

Importing the spectra

Spectra are always imported using maldi_import_spectra. This cannot be modified in the pre-built pipeline. There might neither be a need to do so as it simply reads-in the files.

Pre-processing of spectra

We use a pre-built pre-processing pipeline implemented in maldi_average_by_well by default. This mini-pipeline involves:

Removing the baseline from the spectra
Aligning the spectra of ech sample group
Trimming them to the final mass range needed for the analysis[^2]

[^2]: Although it is possible to work with the full spectra, trimming to an appropriate range will considerably speed up the subsequent steps.

As mentioned in the beginning, you may consider to write your own function for pre-processing if you need more control than what is provided by changing some arguments.

Detecting peaks and drawing intermediate spectra

The most valuable piece of information that must be provided is the (named) vector of m/z ions to measure. In this example, we would like to quantify the intensity of three ions: c(mC = 2425, hmC = 2441, fC = 2439).

By default, the workflow will try to identify the correct peaks in a semi-automatic manner. However, there are different options to control this behavior (not shown in the example above); see ?maldi_batch for details.

In the default process, you may be prompted to click on the peaks where there is ambiguity:

A mass spectrum in which no peak is present. {width=300px} A mass spectrum after the user has assigned the correct peak. {width=300px}

It is important that you remember to hit the 'Esc' key (or the 'Finish' button in RStudio) when you have successfully clicked on the correct peak. Only after that, the open circle will fill and the process continue. In the mass spectrum on the right side, there is no peak, so you hit 'Esc' right away. In the left one, there were two peaks within the acceptable range (tolerance_assignment), so you identfiy the correct one and hit 'Esc'.

If you mistakenly clicked a wrong peak, don't worry. By default, you are asked to review all assignments in a temporarily created PDF document using maldi_draw_peaks_by_well and to enter the well identifier (pivot) of those spectra you wish to re-assign manually.

Backing up spectra and peak assignments

Upon successful completion of each step, two .rds files are deposited in the folders you just analyzed, which allows you to easily import a previous analysis when you run maldi_batch on this folder (or a folder enclosing the folder).

Association with a plate layout

In the final step of the pipeline, you will be asked to associate metadata with the spectra. If you want to use always the same file, or a file with the same name for each group of samples (MALDI plates), consider providing the full or relative file name as layout_file (see ?maldi_batch).

A convenient way is to provide the layout as Excel sheet. Please refer to ?import_layout_from_excel for the details and which arguments you should provide to the function via MoreArgs_layout. In case you want to use a different specification, feel free to feed another function to FUN_import_layout = ....

Returned object

Once maldi_batch(...) has finished, data_maldi contains two lists, one with the spectra and one with the peaks and associated layout.

# purge personal information
data_maldi$peaks[[1]]$path_to_files <- stringr::str_replace(data_maldi$peaks[[1]]$path_to_files, attr(data_maldi$spectra[[1]], "dir"), ".")
data_maldi$peaks[[1]]$path_to_group <- stringr::str_replace(data_maldi$peaks[[1]]$path_to_group, attr(data_maldi$spectra[[1]], "dir"), ".")

# import function from data-raw/BobCAT.R
data_maldi$spectra[[1]] <- lapply(data_maldi$spectra[[1]], purge_personal_information, old_path = attr(data_maldi$spectra[[1]], "dir"))

attr(data_maldi$spectra[[1]], "dir") <- "."
attr(data_maldi$peaks[[1]], "dir") <- "."

saveRDS(data_maldi, "data_maldi.rds")

data_maldi <- readRDS("data_maldi.rds")

lengths(data_maldi)  # each list contains the result of 1 group

lapply(data_maldi, lengths)  # 12 spectra 

head(data_maldi$peaks[[1]])

Creating a resume file

With the data at hand, we want to arrange the spectra per compound on a single page in a PDF document. Currently, this needs the user to specify how many spectra there are to arrange and assumes that there is a reasonable order along the wells. In a future release, the accessory function could be more self-reliant/flexible of course.

We go with two for loops here instead of the cherried lapply since it makes the code a bit easier to understand.[^3]

[^3]: Also, in case of something goes wrong in one of the groups, we have the data processed for the other cases and can investigate the source of error interactively as the loop iterators group and content are available in the global environment.

for (group in seq_along(data_maldi$peaks)) {

  pdf(file = file.path(attr(data_maldi$spectra[[group]], "dir"), 
                       "overview_by_compound.pdf"),
      width = 21.5 / 2.54, height = 30.5 / 2.54, paper = "a4")

  for (content in unique(data_maldi$peaks[[group]]$content)) {

    # fill the page, i.e., we know that we want to place exactly 12 spectra

    maldi_draw_peaks_by_well(data_maldi$spectra[[group]], 
                             data_maldi$peaks[[group]][which(
                               data_maldi$peaks[[group]]$content == content), ],
                             highlight_missing_peaks = FALSE, title = content, 
                             ncol = 2, nrow = 6)

  }

  dev.off()  ## close the file

}

Fitting IC50 values {#sec-fitting}

The maldi_find_peaks_by_well function that we have used for peak detection in maldi_batch measures not only the peak intensity as a column .$intensity, but also calculates the fraction a particular peak makes within all quantified peaks in a single (averaged) spectrum as .$percent.

Having added the metadata from the palte layout that specifies the inhibitor concentration, which we find as .$concentration, we can fit a dose-response model (see ?fit_IC50) using the formula percent ~ concentration. We fit this model for each combination of .$content and .$ion.

head(data_maldi$peaks[[1]][, c("content", "ion", "mass", "intensity", "percent")])

(As we did for plotting, we take the same group-wise approach and use simply a for loop.)

library(dplyr, quietly = TRUE)
library(tidyr, quietly = TRUE)

data_ic50 <- list()

for (group in seq_along(data_maldi$peaks)) {

  log_task("fitting IC50 for", sQuote(names(data_maldi$peaks)[[group]]))

  data_ic50[[group]] <- data_maldi$peaks[[group]] %>%
    # ensure variables are numeric
    mutate(concentration = as.numeric(concentration)) %>% 
    # filter out missing values for proper augmenting
    filter(!is.na(percent), !is.na(concentration)) %>%
    # 
    group_by(content, ion) %>% 
    model_cleanly_groupwise(fit_IC50, formula = percent ~ concentration, 
                            newdata = data.frame(concentration = 10^seq(-3, 3, length.out = 100)))

  log_done()

}

Internally, a call is made to broom::tidy and broom::augment to extract all statistical parameters and calculate the fitted curves.

This is how the fit parameters retrieved by broom::tidy look like:

data_ic50[[group]] %>% 
  select(content, ion, tidy) %>% 
  unnest(tidy) %>% 
  head()

And we can do some plotting:

library(ggplot2)

ggplot(data = data_ic50[[group]], aes(x = concentration, color = ion)) +
  # original data points
  geom_point(data = . %>% select(content, ion, data) %>% unnest(data), 
             aes(y = percent)) +
  # fitted data points
  geom_path(data = . %>% select(content, ion, augment_new) %>% unnest(augment_new),
            aes(y = .fitted)) +
  # add IC50 values
  ggrepel::geom_text_repel(
    data = . %>% select(content, ion, tidy) %>% unnest(tidy) %>% filter(term == "IC50"),
    aes(y = 50, x = estimate, label = paste(
      round(estimate, -floor(log10(std.error / 10))), "\U00B1", 
      round(std.error, -floor(log10(std.error / 10)))))
  ) +
  #
  labs(x = "concentration") +
  guides(text = FALSE) +
  #
  scale_color_manual(values = c(mC = "#e31a1c", hmC = "#1f78b4", fC = "#33a02c")) +
  scale_x_log10(limits = c(1, 1e3)) +
  facet_wrap(vars(content)) +
  theme_linedraw(base_size = 12) + theme(
      panel.grid = element_blank()
    )