knitr::opts_chunk$set(fig.width = 8,
                      fig.height = 8,
                      fig.path = 'figures/temp/')

Introduction

Context

Analysis of a phenomics (e.g. metabolomics) data set (i.e. samples times variables table of peak or bucket intensities generated by preprocessing tools such as XCMS) is aimed at mining the data (e.g. trends and outliers) and detecting features of predictive value (biomarker discovery). It comprises multiple steps including:

The phenomis package addresses the two first steps, and can be combined with other packages for multivariate modeling, such as the ropls and biosigner Bioconductor packages, as described below.

Methods

Methods from the **phenomis**, **ropls**, and **biosigner** packages
for the analysis of metabolomics datasets; specific parameter values
used for the **sacurine** dataset described in the 'Hands-on' part below
are provided as examples.

Formats

3 tabular file format used for import/export

Input (i.e. preprocessed) data consists of a 'samples times variables' matrix of intensities (datMatrix numeric matrix), in addition to sample and variable metadata (sampleMetadata and variableMetadata data frames). Theses 3 tables can be conveniently imported to/exported from R as tabular files:

3 table format used as input/output from the data analysis
workflow.

Text and graphical outputs

Text and graphics can be handled with the phenomis methods by setting the two arguments:

Availability

The phenomis package can be installed from github with:

devtools::install_github("odisce/phenomis")

Hands-on

The sacurine cohort study

As an example, we will use the phenomis package to study the sacurine human cohort. The study is aimed at characterizing the physiological variations of the human urine metabolome with age, body mass index (BMI), and gender [\@thevenot_analysis_2015]. Urine samples from 184 volunteers were analyzed by reversed-phase (C18) ultrahigh performance liquid chromatography (UPLC) coupled to high-resolution mass spectrometry (LTQ-Orbitrap). Raw data are publicly available on the MetaboLights repository (MTBLS404).

This vignette describes the statistical analysis of the data set from the negative ionization mode (113 identified metabolites at MSI levels 1 or 2):

A Galaxy version of this analysis is available W4M00001 'Sacurine-statistics' on the Workflow4metabolomics.org online infrastructure [\@guitton_create_2017]

reading: Reading the data

The reading function reads the data sets and builds the ExpressionSet object. For additional information about ExpressionSet class, see the "An introduction to Biobase and ExpressionSets" documentation from the Biobase package.

sacurine.se <- reading(system.file("extdata/W4M00001_Sacurine-statistics", package = "phenomis"))

inspecting: Looking at the data

sacurine.se <- inspecting(sacurine.se)

Post-processing

correcting: Correcting signal drift and batch effect

sacurine.se <- correcting(sacurine.se,
                          reference.vc = "pool",
                          col_batch.c = "batch",
                          col_injectionOrder.c = "injectionOrder",
                          col_sampleType.c = "sampleType")

Variable filtering

sacurine.se <- inspecting(sacurine.se)
sacurine.se <- sacurine.se[rowData(sacurine.se)[, "pool_CV"] <= 0.3, ]
sacurine.se <- sacurine.se[, colData(sacurine.se)[, "sampleType"] != "pool"]
print(sacurine.se)

Normalizing

assay(sacurine.se) <- sweep(assay(sacurine.se),
                            2,
                            colData(sacurine.se)[, "osmolality"],
                            "/")

transforming: Transforming the data intensities

sacurine.se <- transforming(sacurine.se, method.c = "log10")

Sample filtering

sacurine.se <- inspecting(sacurine.se)
sacurine.se <- sacurine.se[, colData(sacurine.se)[, "hotel_pval"] >= 0.001 &
                             colData(sacurine.se)[, "miss_pval"] >= 0.001 &
                             colData(sacurine.se)[, "deci_pval"] >= 0.001]

Final visual check of the data before performing the statistics

phenomis::inspecting(sacurine.se)

hypotesting: Univariate hypothesis testing

sacurine.se <- hypotesting(sacurine.se,
                           test.c = "ttest",
                           factor_names.vc = "gender",
                           adjust.c = "BH",
                           adjust_thresh.n = 0.05)

Unsupervised analysis

Principal component analysis

ropls Bioconductor package (already loaded as a dependance from phenomis) [\@thevenot_analysis_2015]

sacPca <- ropls::opls(sacurine.se, info.txt = NA)
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "gender"],
            typeVc = "x-score")
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "age"],
            typeVc = "x-score")
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "bmi"],
            typeVc = "x-score")
sacurine.se <- ropls::getEset(sacPca)

clustering: hierarchical clustering

sacurine.se <- clustering(sacurine.se, correl.c = "spearman",
                          clusters.vi = c(5, 3))

Supervised modeling

(O)PLS(-DA) modeling

With the ropls Bioconductor package [\@thevenot_analysis_2015]:

sacPlsda <- ropls::opls(sacurine.se, "gender")
sacurine.se <- ropls::getEset(sacPlsda)

Feature selection

With the biosigner Bioconductor package [\@rinaudo_biosigner:_2016]:

sacurine.biosign <- biosigner::biosign(sacurine.se, "gender", seedI = 123)
sacurine.se <- biosigner::getEset(sacurine.biosign)

annotating: MS annotation

This method is based on the biodb and biodbChebi R packages available on github.

Viewing the parameters from the annotating method and their default values:

phenomis::annotating_parameters()

Chemical annotation with ChEBI

by mz values

sacurine.se <- annotating(sacurine.se,
                          database.c = "chebi",
                          param.ls = list(query.type = "mz",
                                          query.col = "mass_to_charge",
                                          ms.mode = "neg",
                                          mz.tol = 10,
                                          mz.tol.unit = "ppm",
                                          max.results = 3,
                                          prefix = "chebiMZ."),
                          report.c = "none")
knitr::kable(head(rowData(sacurine.se)[, grep("chebiMZ", colnames(rowData(sacurine.se)))]))

by ChEBI identifiers

sacurine.se <- annotating(sacurine.se, database.c = "chebi",
                          param.ls = list(query.type = "chebi.id",
                                          query.col = "database_identifier",
                                          prefix = "chebiID."))
knitr::kable(head(rowData(sacurine.se)[, grep("chebiID", colnames(rowData(sacurine.se)))]))

Chemical annotation with a local database

Loading a local (example) MS database:

msdbDF <- read.table(system.file("extdata/local_ms_db.tsv", package = "phenomis"),
                     header = TRUE,
                     sep = "\t",
                     stringsAsFactors = FALSE)

Querying the local database

sacurine.se <- annotating(sacurine.se,
                          database.c = "local.ms",
                          param.ls = list(query.type = "mz",
                                          query.col = "mass_to_charge",
                                          ms.mode = "neg",
                                          mz.tol = 5,
                                          mz.tol.unit = "ppm",
                                          local.ms.db = msdbDF,
                                          prefix = "localMS."),
                          report.c = "none")
knitr::kable(rowData(sacurine.se)[!is.na(rowData(sacurine.se)[, "localMS.accession"]), grep("localMS.", colnames(rowData(sacurine.se)), fixed = TRUE)])

writing: Exporting the results

phenomis::writing(sacurine.se, dir.c = getwd())
phenomis::writing(sacurine.se, dir.c = 'figures/temp', overwrite.l = TRUE)

References



SciDoPhenIA/phenomis documentation built on June 9, 2022, 11:54 p.m.