knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) options(summerr.debug = FALSE)
This vignette will outline a specfic use-case of the the summerrmass
package:
We extract the peak intensity in MALDI spectra at three selected ions (given by specific m/z ratios) from an array of MALDI measurements.
Since some of these intensities decline in the presence of an inhibitor, which was added in various concentrations, we want to calculate the IC50 in this assay.
The challenges are the following:
{width=50%}
{width=50%}
Let's start:
library(summerr) library(summerrmass)
Here, we will use the pre-built pipeline that ships with the package. If you want to learn how to built up a smaller analysis step-by-step, check out the examples in the documentation (or the vignette on the package's basic functions).
To adapt the pipeline to your needs, some arguments need probably to change.
maldi_batch
is written in such a way that you can not only modify the
individual parameters, but even exchange most of the functions that are used for
in each step.[^1]
[^1]: Provided that the modified functions return objects of similar type and structure to the original ones.
Here is how the full call to maldi_batch
looks like. I have included
most of the defaults so that you get an idea of what can be changed.
(*)
marks arguments which you are most likely to adapt to your own needs.
data_maldi <- maldi_batch( path = NULL, # will prompt for the direcotry to analyze layout_file = NULL, # will prompt for each sample group pivot = "[0-9]_[A-Z]+[0-9]+", # regex to match the "well" folders (*) # pre-processing of spectra FUN_spect = maldi_average_by_well, MoreArgs_spect = list(pivot = "[0-9]_[A-Z]+[0-9]+", # (*) final_trim_range = c(2420, 2445), # (*) method_baseline = "SNIP", method_average = "mean"), # detecting peaks FUN_peaks = maldi_find_peaks_by_well, MoreArgs_peaks = list(pivot = "[0-9]_[A-Z]+[0-9]+", # (*) mass_list = c(mC = 2425, hmC = 2441, fC = 2439), # (*) tolerance_assignment = 0.5, SNR = 3, method = "MAD"), # drawing spectra MoreArgs_draw = list(ncol = 2, nrow = 6), # associating metadata FUN_import_layout = import_layout_from_excel, MoreArgs_layout = list(sheet = 1, index_row = "2", index_col = "A", data_upper_left = "B3", plate_nrow = 16, plate_ncol = 24, meta_row = list( # metadata stored in rows concentration = "1"), meta_col = list( # metadata stored in columns )) )
The most important argument the pipeline is the pivot, which is a regular
expression to identify a folder in a path that (1)
groups the measurements from the same sample, i.e., the same well, and
(2) that is used to associate metadata, e.g., treatment
conditions, with the sample. You may want to check out ?import_layout_from_paths
for the technical details.
Spectra are always imported using maldi_import_spectra
. This cannot be modified
in the pre-built pipeline. There might neither be a need to do so as it simply
reads-in the files.
We use a pre-built pre-processing pipeline implemented in maldi_average_by_well
by default. This mini-pipeline involves:
Removing the baseline from the spectra
Aligning the spectra of ech sample group
Trimming them to the final mass range needed for the analysis[^2]
[^2]: Although it is possible to work with the full spectra, trimming to an appropriate range will considerably speed up the subsequent steps.
As mentioned in the beginning, you may consider to write your own function for pre-processing if you need more control than what is provided by changing some arguments.
The most valuable piece of information that must be provided is the (named) vector
of m/z ions to measure. In this example, we would like to quantify the intensity
of three ions: c(mC = 2425, hmC = 2441, fC = 2439)
.
By default, the workflow will try to identify the correct peaks in a semi-automatic
manner. However, there are different options to control this behavior (not shown
in the example above); see ?maldi_batch
for details.
In the default process, you may be prompted to click on the peaks where there is ambiguity:
{width=300px}
{width=300px}
It is important that you remember to hit the 'Esc' key (or the 'Finish' button in
RStudio) when you have successfully clicked on the correct peak. Only after that,
the open circle will fill and the process continue.
In the mass spectrum on the right side, there is no peak, so you hit 'Esc' right
away.
In the left one, there were two peaks within the acceptable range
(tolerance_assignment
), so you identfiy the correct one and hit 'Esc'.
If you mistakenly clicked a wrong peak, don't worry. By default, you are asked
to review all assignments in a temporarily created PDF document using
maldi_draw_peaks_by_well
and to enter the well identifier (pivot) of those
spectra you wish to re-assign manually.
Upon successful completion of each step, two .rds
files are deposited in the
folders you just analyzed, which allows you to easily import a previous analysis
when you run maldi_batch
on this folder (or a folder enclosing the folder).
In the final step of the pipeline, you will be asked to associate metadata with
the spectra. If you want to use always the same file, or a file with the same
name for each group of samples (MALDI plates), consider providing the full or
relative file name as layout_file
(see ?maldi_batch
).
A convenient way is to provide the layout as Excel sheet. Please refer to
?import_layout_from_excel
for the details and which arguments you should
provide to the function via MoreArgs_layout
. In case you want to use a different
specification, feel free to feed another function to FUN_import_layout = ...
.
Once maldi_batch(...)
has finished, data_maldi
contains two lists, one with
the spectra and one with the peaks and associated layout.
# purge personal information data_maldi$peaks[[1]]$path_to_files <- stringr::str_replace(data_maldi$peaks[[1]]$path_to_files, attr(data_maldi$spectra[[1]], "dir"), ".") data_maldi$peaks[[1]]$path_to_group <- stringr::str_replace(data_maldi$peaks[[1]]$path_to_group, attr(data_maldi$spectra[[1]], "dir"), ".") # import function from data-raw/BobCAT.R data_maldi$spectra[[1]] <- lapply(data_maldi$spectra[[1]], purge_personal_information, old_path = attr(data_maldi$spectra[[1]], "dir")) attr(data_maldi$spectra[[1]], "dir") <- "." attr(data_maldi$peaks[[1]], "dir") <- "." saveRDS(data_maldi, "data_maldi.rds")
data_maldi <- readRDS("data_maldi.rds")
lengths(data_maldi) # each list contains the result of 1 group lapply(data_maldi, lengths) # 12 spectra head(data_maldi$peaks[[1]])
With the data at hand, we want to arrange the spectra per compound on a single page in a PDF document. Currently, this needs the user to specify how many spectra there are to arrange and assumes that there is a reasonable order along the wells. In a future release, the accessory function could be more self-reliant/flexible of course.
We go with two for
loops here instead of the cherried lapply
since it makes
the code a bit easier to understand.[^3]
[^3]: Also, in case of something goes wrong in one of the groups, we have the data
processed for the other cases and can investigate the source of error interactively
as the loop iterators group
and content
are available in the global environment.
for (group in seq_along(data_maldi$peaks)) { pdf(file = file.path(attr(data_maldi$spectra[[group]], "dir"), "overview_by_compound.pdf"), width = 21.5 / 2.54, height = 30.5 / 2.54, paper = "a4") for (content in unique(data_maldi$peaks[[group]]$content)) { # fill the page, i.e., we know that we want to place exactly 12 spectra maldi_draw_peaks_by_well(data_maldi$spectra[[group]], data_maldi$peaks[[group]][which( data_maldi$peaks[[group]]$content == content), ], highlight_missing_peaks = FALSE, title = content, ncol = 2, nrow = 6) } dev.off() ## close the file }
The maldi_find_peaks_by_well
function that we have used for peak detection in
maldi_batch
measures not only the peak intensity as a column .$intensity
, but
also calculates the fraction a particular peak makes within all quantified peaks
in a single (averaged) spectrum as .$percent
.
Having added the metadata from the palte layout that specifies the inhibitor
concentration, which we find as .$concentration
, we can fit a dose-response
model (see ?fit_IC50
) using the formula percent ~ concentration
.
We fit this model for each combination of .$content
and .$ion
.
head(data_maldi$peaks[[1]][, c("content", "ion", "mass", "intensity", "percent")])
(As we did for plotting, we take the same group-wise approach and use simply a for
loop.)
library(dplyr, quietly = TRUE) library(tidyr, quietly = TRUE) data_ic50 <- list() for (group in seq_along(data_maldi$peaks)) { log_task("fitting IC50 for", sQuote(names(data_maldi$peaks)[[group]])) data_ic50[[group]] <- data_maldi$peaks[[group]] %>% # ensure variables are numeric mutate(concentration = as.numeric(concentration)) %>% # filter out missing values for proper augmenting filter(!is.na(percent), !is.na(concentration)) %>% # group_by(content, ion) %>% model_cleanly_groupwise(fit_IC50, formula = percent ~ concentration, newdata = data.frame(concentration = 10^seq(-3, 3, length.out = 100))) log_done() }
Internally, a call is made to broom::tidy
and broom::augment
to extract all
statistical parameters and calculate the fitted curves.
This is how the fit parameters retrieved by broom::tidy
look like:
data_ic50[[group]] %>% select(content, ion, tidy) %>% unnest(tidy) %>% head()
And we can do some plotting:
library(ggplot2) ggplot(data = data_ic50[[group]], aes(x = concentration, color = ion)) + # original data points geom_point(data = . %>% select(content, ion, data) %>% unnest(data), aes(y = percent)) + # fitted data points geom_path(data = . %>% select(content, ion, augment_new) %>% unnest(augment_new), aes(y = .fitted)) + # add IC50 values ggrepel::geom_text_repel( data = . %>% select(content, ion, tidy) %>% unnest(tidy) %>% filter(term == "IC50"), aes(y = 50, x = estimate, label = paste( round(estimate, -floor(log10(std.error / 10))), "\U00B1", round(std.error, -floor(log10(std.error / 10))))) ) + # labs(x = "concentration") + guides(text = FALSE) + # scale_color_manual(values = c(mC = "#e31a1c", hmC = "#1f78b4", fC = "#33a02c")) + scale_x_log10(limits = c(1, 1e3)) + facet_wrap(vars(content)) + theme_linedraw(base_size = 12) + theme( panel.grid = element_blank() )
Quite neat.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.