MS spectral data using sequence data (mainly for collagen)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(bacollite)

Introduction

A lot of ZOOMS work involves repeated analysis of large batches of data, usually presented in the form of datasets for individual MALDI plates. In order to repeatedly apply bacollite analysis to such datasets, we have developed a function called procplate that automates this analysis. The idea is simple: each plate has a corresponding .csv file that describes the file naming convention, the sample ID, and the position of the three replicates, the procplate function takes this data in as an argument, but it also takes the name of the analysis function as another argument, and thus applies the function to the data in each row of the csv.

This vignette covers how to set up an analysis to use procplate. We commence with a simple example, showing how to generate average spectra, and then move on to show how to use procplate to carry out the analysis in the companion vignette "SI Aligning samples to peptides".

generating an average spectrum for each sample on a MALDI plate

This can now be done with three lines of code:

require(bacollite)
pd <- read.csv("platemap.csv",stringsAsFactors = F)
avg <- pplate(pd,averageMSD,path="rawdata")

Going through this line by line:

require(bacollite) loads the bacollite package
pd <- read.csv("platemap.csv",stringsAsFactors = F) loads the job from a file called "platemap.csv" that describes the arrangement of data on the platemap (see below for details). The stringsAsFactors=F statement tells R not to try and do anything clever with non-numeric values in the file.
avg <- pplate(pd,averageMSD,path="rawdata") does the processing of the plate data. There are three things inside the brackets: pd, the data object; averageMSD, the function which does the averaging, and path=rawdata, indicating that the platemap data is in the sub-folder called "rawdata".

The platemap file - called "platemap.csv" in the example above - is best illustrated by example:

froot,sampleID,spot1,spot2,spot3,manualID
20210617_Camilla_,L4HF.40.F1,A1,A4,A7
20210617_Camilla_,A419,A2,A5,A8
20210617_Camilla_,A420,A3,A6,A9

The first line of the file describes the names of each column of data. These are:

froot: this is the part of the filename for each spot that is common to the three spots. You can see that in this example, this is the same for all samples on the plate, i.e. 20210617_Camilla.
sampleID

NOTE: There are many other options for generating the pd data object

Worked example using bacollite data

Averaged spectra are useful for a "first pass" visual examination of a sample, and bacollite now provides this functionality. We are going to use this as an example task for processing a whole plates worth of data. However, since such datasets are large, it is unfeasible for bacollite to come bundled with data for a whole plate. So we'll have to fudge an example, imaining that a

Loading a reference data set

The bacollite package comes with some data files that we can use during this tutorial, but we need some R code to find where they are on our system. Let's get R to tell us the folder that the data is in:

fpath <- system.file("extdata",package="bacollite")

We'll be using the fpath variable in what follows. With your own data, you would know where the samples are on your file system and you'd do something like fpath <- "c:\mydata\"

The MALDI data files that we are using are called:

20150326_SF_YG65_94_I11.txt  
20150326_SF_YG65_94_I14.txt  
20150326_SF_YG65_94_I17.txt

You can see that the last part of the file name before the ".txt" is the spot reference, and there are three spots: I11, I14 and I17. We need this information to use the load.sample function to load the three samples into R, like this:

library(bacollite)
froot = sprintf("%s/20150326_SF_YG65_94_",fpath)
sample <- load.sample(froot = froot,spots = c("I11","I14","I17"),name="folio 42")

Species ID for a whole plate

Explainer: Passing functions into functions as arguments

The ability of bacollite to process a plate with a range of different outputs is possible because R allows one function to be passed into another function as an argument. This is fairly advanced topic, and if you just want to process your data then you don't need to worry about this, but for the curious, here's a very simple example of how this is done.

Let's start with a function that adds two numbers together:

myadd <- function(a,b){
  return(a + b)
}
myadd(7,8)

Then let's suppose we wanted a function that multiplied the output of any function by another number. here, we want to apply this to a range of different function outputs. How might we code this? The answer is to pass a function in as an argument, just as you might do a variable or a data.frame.

cfun <- function(c,FUN,...){

  r <- FUN(...)
  return(r*c)
}

cfun(5,myadd,1,2)

Some points:

cfun multiplies the output of the function FUN by c
the function myadd is passed into cfun - but any other function could also be passed in

bioarch-sjh/bacollite documentation built on Oct. 7, 2022, 3:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bioarch-sjh/bacollite
Analysis of MALDI and LC-MS/MS spectral data using sequence data (mainly for collagen)

In bioarch-sjh/bacollite: Analysis of MALDI and LC-MS/MS spectral data using sequence data (mainly for collagen)

Introduction

generating an average spectrum for each sample on a MALDI plate

Worked example using bacollite data

Loading a reference data set

Species ID for a whole plate

Explainer: Passing functions into functions as arguments

R Package Documentation

Browse R Packages

We want your feedback!

bioarch-sjh/bacollite Analysis of MALDI and LC-MS/MS spectral data using sequence data (mainly for collagen)

In bioarch-sjh/bacollite: Analysis of MALDI and LC-MS/MS spectral data using sequence data (mainly for collagen)

Introduction

generating an average spectrum for each sample on a MALDI plate

Worked example using bacollite data

Loading a reference data set

Species ID for a whole plate

Explainer: Passing functions into functions as arguments

R Package Documentation

Browse R Packages

We want your feedback!

bioarch-sjh/bacollite
Analysis of MALDI and LC-MS/MS spectral data using sequence data (mainly for collagen)