knitr::opts_chunk$set(fig.align = 'center')
library(magrittr)

Introduction

The metabolyseR package provides a suite of methods that encompass three elements of metabolomics data analysis:

The package also distinguishes between the flexibility and simplicity required for exploratory analyses compared to the convenience needed for more complex routine analyses. This is reflected in the underlying S4 object-oriented implementations and associated methods defined within the package. It should be noted that it is useful to understand the principles involved in using metabolyseR for exploratory analyses to aid in extracting and wrangling the results generated from routine analyses.

The following document will provide an introduction to the basic usage of the package and includes how to create and use the base classes that are the foundation of metabolyseR. This will be focused around the applications for both exploratory and routine analyses. For more detailed information on the individual analysis elements see their associated vignette using:

browseVignettes('metabolyseR')

There is also an example quick start analysis vignette provided.

vignette('quick_start','metabolyseR')

Any issues, bugs or errors encountered while using the package should be reported here.

The examples shown here will use the abr1 data set from the metaboData package (?metaboData::abr1). This is a nominal mass flow-injection mass spectrometry (FI-MS) fingerprinting data set from a plant-pathogen infection time course experiment. The examples will also include use of the pipe %>% from the magrittr package.

Firstly load the necessary packages:

library(metabolyseR)
library(metaboData)

Parallel processing

The package supports parallel processing using the future package.

By default, processing by metabolyseR will be done sequentially. However, parallel processing can be activated, prior to analysis, by specifying a parallel back-end using plan(). The following example specifies using the multisession implementation (multiple background R sessions) with two worker processes.

plan(future::multisession,workers = 2)

See the future package documentation for more information on the types of parallel implementations that are available.

Exploratory analyses

For exploratory analyses, simple questions of the data need to be answered quickly, requiring few steps. Key requirements for any tool used by investigators are that it should be both simple and flexible.

In metabolyseR, the AnalysisData class is the base S4 class that provides these requirements. The following sections will give an overview of the basics in constructing and using these objects as the base for analysis.

Analysis data

We can firstly construct an AnalysisData object which requires two data tables. The first is the metabolomic data where the columns are the metabolome features, the rows the sample observations and contains the abundance values. The second is the sample meta-information where the row order should match to that of the metabolome data table. Using the example data, his can be constructed and assigned to the variable d by:

d <- analysisData(data = abr1$neg,
                  info = abr1$fact)

Where abr1$neg is the negative ionisation mode data and abr1$fact is the corresponding sample information. By printing d we can view some basic information about our data.

print(d)

We can also return the numbers of samples and numbers of features respectively using the following:

nSamples(d)
nFeatures(d)

The data table can be extracted using the dat method:

dat(d)

Or alternatively, can be used to assign a new data table:

dat(d) <- abr1$pos
d

The sample information table can be extracted using the sinfo method:

sinfo(d)

And similarly used to assign a new sample information table:

sinfo(d) <- abr1$fact[,1:2]
d
d <- analysisData(abr1$neg,abr1$fact)

Sample information

There are a number of methods that provide utility for querying and altering the sample information within an AnalysisData object. These methods are all named with the prefix cls and include:

getNamespaceExports('metabolyseR') %>% 
  {.[stringr::str_detect(.,'cls')]} %>% 
  {.[!stringr::str_detect(.,':')]} %>% 
  sort() %>% 
  stringr::str_c('* `',.,'`') %>% 
  stringr::str_c(collapse = '\n') %>%
  cat()

The names of the available sample information columns can be shown using clsAvailable().

clsAvailable(d)

A given column can be extracted using clsExtract(). Here, the day column is extracted.

clsExtract(d,cls = 'day')

Sample class frequencies could then be computed.

clsExtract(d,cls = 'day') %>%
  table()

It can be seen that there are 20 samples available in each class.

Another example is the addition of a new sample information column. In the following, a column called new_class will be added with all samples labelled 1.

d <- clsAdd(d,cls = 'new_class',value = rep(1,nSamples(d)))
clsAvailable(d)

Keeping / removing samples or features

Samples or features can easily be kept or removed from an AnalysisData object as is most convenient.

Below can be seen the first 6 sample indexes in the injorder column of the sample information.

samples <- d %>%
  clsExtract(cls = 'injorder') %>%
  head()

print(samples)

Only these samples could be kept using:

d %>%
  keepSamples(idx = 'injorder',samples = samples)

Or removed using:

d %>%
  removeSamples(idx = 'injorder',samples = samples)

The process is very similar for keeping or removing specific metabolome features from the data table. Below can be seen the first 6 feature names in the data table.

feat <- d %>%
  features() %>%
  head()

print(feat)

Only these features can be kept using:

d %>%
  keepFeatures(features = feat)

Or to remove these features:

d %>%
  removeFeatures(features = feat)

Routine analyses

Routine analyses are those that are often made up of numerous steps where parameters have likely already been previously established. The emphasis here is on convenience with as little code as possible required. In these analyses, the necessary analysis elements, order and parameters are first prepared and then the analysis routine subsequently performed in a single step. This section will introduce how this type of analysis can be performed using metabolyseR and will include four main topics:

Analysis parameters

Parameter selection is the fundamental aspect for performing routine analyses using metabolyseR and will be the step requiring the most input from the user. The parameters for an analysis are stored in an S4 object of class AnalysisParameters containing the relevant parameters of the selected analysis elements.

The parameters have been named so that they denote the same functionality commonly across all analysis element methods. Discussion of the specific parameters can be found withing the vignettes of the relevant analysis elements. These can be accessed using:

browseVignettes('metabolyseR')

There are several ways to specify the parameters to use for analysis. The first is programatically and the second is through the use of the YAML format.

Programatic specification

The available analysis elements can be shown using:

analysisElements()

The analysisParameters() function can be used to create an AnalysisParameters object containing the default parameters. For example, the code below will return default parameters for all the metabolyseR analysis elements.

p <- analysisParameters()
p

To retrieve parameters for a subset of analysis elements the following can be run, returning parameters for only the pre-treatment and modelling elements.

p <- analysisParameters(c('pre-treatment','modelling'))
p

The changeParameter() function can be used to uniformly change these parameters across all of the selected methods. The example below changes the defaults of all the parameters named cls from the default class to day.

p <- analysisParameters()
changeParameter(p,'cls') <- 'day'
p

Alternatively the parameters of a specific analysis elements can be targeted using the elements argument. The following will only alter the cls parameter back to class for the pre-treatment element parameters:

changeParameter(p,'cls',elements = 'pre-treatment') <- 'class'

Parameters can be extracted from the AnalysisParameters class using the parameters() function for a specified element.

parameters(p,'correlations')

Each analysis element has a function for returning default parameters for specific methods. These include preTreatmentParameters(), modellingParameters() and correlationParameters(). Each returns a list of the default parameters for a specified methods as shown in the example for modellingParameters() below.

modellingParameters('anova')

Refer to the documentation (?) of each function for sepecific usage details.

The parameters returned by these functions can be assigned to an AnalysisParameters object, again using parameters()'

parameters(p,'pre-treatment') <- preTreatmentParameters(
  list(
    occupancyFilter = 'maximum',
    transform = 'TICnorm'
      )
  )

YAML specification

Due to the relatively complex structure of the parameters needed for analyses containing many components, it is also possible to specify analysis parameters using the YAML file format. YAML parameter files (.yaml) can be parsed using the parseParameters() function. The example below shows the YAML specification for the defaults returned by analysisParameters().

paramFile <- system.file('defaultParameters.yaml',package = 'metabolyseR')

stringr::str_c("
```yaml
",
yaml::read_yaml(paramFile) %>%
  yaml::as.yaml(),
"```") %>%
  cat()

This can be passed directly into an AnalysisParameters object using the following:

paramFile <- system.file('defaultParameters.yaml',package = 'metabolyseR')
p <- parseParameters(paramFile)

For more complex pre-treatment situations such as the following:

exampleParamFile <- system.file('exampleParameters.yaml',package = 'metabolyseR')

stringr::str_c("
```yaml
",
  yaml::read_yaml(exampleParamFile) %>%
    yaml::as.yaml(),
  "```") %>%
  cat()

Where multiple steps of the same method needed (here is remove), these are numbered sequentially. Where multiple values also need to be provided to a particular argument (e.g. classes = c('H','1')), these should be supplied as a hyphenated list.

Existing AnalysisParameters objects can also be exported to YAML format as shown below:

p <- analysisParameters()
exportParameters(p,file = 'analysis_parameters.yaml')

Performing an analysis

The analysis is performed in a single step using the metabolyse() function. This accepts the metabolomic data, the sample information and the analysis parameters.

The metabolomic data table of abundance values where the columns are the metabolome features and the rows are each sample observation. Similarly, the sample meta-information table should consist of the observations as rows and the meta information as columns. The order of the observation rows of the sample information table should be concordant with the rows in the metabolomics data table.

We can run an example analysis using the abr1 data set by first generating the default parameters for pre-treatment and modelling (random forest) analysis elements.

p <- analysisParameters(c('pre-treatment','modelling'))

Custom pre-treatment parameters can then be specified to only inlude occupancy filtering and total ion count normalisation.

parameters(p,'pre-treatment') <- preTreatmentParameters(
  list(
  occupancyFilter = 'maximum',
  transform = 'TICnorm')
)

Next the cls parameters can be changed to use the day sample information column throughout the analysis.

changeParameter(p,'cls') <- 'day'

Finally, the analysis can be run in a single step. Here only the fist 200 features of the negative ionisation mode data are specified to reduce the analysis time needed for this example.

analysis <- metabolyse(abr1$neg[,1:200],abr1$fact,p) 

Note: If a data pre-treatment step is not performed prior to modelling or correlation analysis, the raw data will automatically be used.

The analysis object containing the analysis results can be printed to provide some basic information about the results of the analysis.

print(analysis)

Performing a re-analysis

There are likely to be occasions where an analysis will need to be re-analysed using a new set of parameters. This can be achieved using the reAnalyse() function.

In the example below we will run a correlation analysis in addition to the pre-treatment and modelling elements already performed.

Firstly, we can specify the correlation parameters:

parameters <- analysisParameters('correlations')

Then perform the re-analysis on our previously analysed Analysis object, specifying the additional parameters.

analysis <- reAnalyse(analysis,parameters)

An overview of the results of the analysis (now including correlations) can then be printed.

print(analysis)

Extracting analysis results

An analysis performed by metabolyse() returns an S4 object of class Analysis. There are a number of ways of extracting analysis results from this object.

Similarly to the AnalysisData class, the dat() and sinfo() functions can be used to extract the metabolomics data or sample information tables directly for either the raw or pre-treated data.

For example, to extract the pre-treated metabolomics data from our object analysis:

dat(analysis,type = 'pre-treated')

Or to extract the raw sample information:

sinfo(analysis,type = 'raw')

Alternatively the raw or preTreated functions can be used to extract the AnalysisData class objects containing both the metabolomics data and sample information for the raw and pre-treated data respectively.

raw(analysis)
preTreated(analysis)

Lastly the analysisResults function can be used to extract the results of any of the analysis elements. The following will extract the modelling results:

analysisResults(analysis,element = 'modelling')


jasenfinch/metabolyseR documentation built on Sept. 18, 2023, 1:25 a.m.