In SciDoPhenIA/phenomis: Analysis of single and multiple phenomic data sets

knitr::opts_chunk$set(fig.width = 8,
                      fig.height = 8,
                      fig.path = 'figures/temp/')

Introduction

Context

Analysis of a phenomics (e.g. metabolomics) data set (i.e. samples times variables table of peak or bucket intensities generated by preprocessing tools such as XCMS) is aimed at mining the data (e.g. trends and outliers) and detecting features of predictive value (biomarker discovery). It comprises multiple steps including:

Post-processing (quality control, normalization and/or transformation of intensities)
Statistical analysis (univariate hypothesis testing, multivariate modeling, feature selection)
Annotation (physico-chemical properties, biological pathways)

The phenomis package addresses the two first steps, and can be combined with other packages for multivariate modeling, such as the ropls and biosigner Bioconductor packages, as described below.

Methods

Methods from the **phenomis**, **ropls**, and **biosigner** packages
for the analysis of metabolomics datasets; specific parameter values
used for the **sacurine** dataset described in the 'Hands-on' part below
are provided as examples.

Formats

3 tabular file format used for import/export

Input (i.e. preprocessed) data consists of a 'samples times variables' matrix of intensities (datMatrix numeric matrix), in addition to sample and variable metadata (sampleMetadata and variableMetadata data frames). Theses 3 tables can be conveniently imported to/exported from R as tabular files:

3 table format used as input/output from the data analysis
workflow.

Text and graphical outputs

Text and graphics can be handled with the phenomis methods by setting the two arguments:

report.c: if set to 'interactive' [default], messages are displayed interactively; if set to a character corresponding the a filename (with the '.txt' extension), messages are diverted to this file by using the sink function internally (the same file can be appended by successive methods); if set to 'none', messages are suppressed
figure.c: if set to 'interactive' [default], graphics are displayed interactively; if set to a character corresponding the a filename (with the '.pdf' extension), a pdf file with the plot is generated instead; if set to 'none', graphics are suppressed

Availability

The phenomis package can be installed from github with:

devtools::install_github("odisce/phenomis")

Hands-on

The sacurine cohort study

As an example, we will use the phenomis package to study the sacurine human cohort. The study is aimed at characterizing the physiological variations of the human urine metabolome with age, body mass index (BMI), and gender [\@thevenot_analysis_2015]. Urine samples from 184 volunteers were analyzed by reversed-phase (C18) ultrahigh performance liquid chromatography (UPLC) coupled to high-resolution mass spectrometry (LTQ-Orbitrap). Raw data are publicly available on the MetaboLights repository (MTBLS404).

This vignette describes the statistical analysis of the data set from the negative ionization mode (113 identified metabolites at MSI levels 1 or 2):

correcting: Correction of the signal drift by local regression (loess) modeling of the intensity trend in pool samples [\@dunn_procedures_2011]; Adjustment of oﬀset diﬀerences between the two analytical batches by using the average of the pool intensities in each batch [\@van_der_kloet_analytical_2009]
Variable quality control by discarding features with a coefficient of variation above 30% in pool samples
Normalization by the sample osmolality
transforming: log10 transformation
inspecting: Computing metrics to filter out outlier samples according to the Weighted Hotellings'T2 distance [\@tenenhaus_approche_1999], the Z-score of one of the intensity distribution deciles [\@alonso_astream:_2011], and the Z-score of the number of missing values [\@alonso_astream:_2011]. A 0.001 threshold for all p-values results in the HU_096 sample being discarded
hypotesting: Univariate hypothesis testing of significant variations with age, BMI, or between genders (Student's T test with Benjamini Hochberg correction)
PCA exploration of the data set; ropls Bioconductor package [\@thevenot_analysis_2015]
clustering: Hierarchical clustering
OPLS(-DA) modeling of age, BMI and gender; ropls Bioconductor package [\@thevenot_analysis_2015]
Selection of the features which significantly contributes to the discrimination between gender with PLS-DA, Random Forest, or Support Vector Machines classifiers; biosigner Bioconductor package [\@rinaudo_biosigner:_2016]

A Galaxy version of this analysis is available W4M00001 'Sacurine-statistics' on the Workflow4metabolomics.org online infrastructure [\@guitton_create_2017]

reading: Reading the data

The reading function reads the data sets and builds the ExpressionSet object. For additional information about ExpressionSet class, see the "An introduction to Biobase and ExpressionSets" documentation from the Biobase package.

sacurine.se <- reading(system.file("extdata/W4M00001_Sacurine-statistics", package = "phenomis"))

inspecting: Looking at the data

sacurine.se <- inspecting(sacurine.se)

Post-processing

correcting: Correcting signal drift and batch effect

sacurine.se <- correcting(sacurine.se,
                          reference.vc = "pool",
                          col_batch.c = "batch",
                          col_injectionOrder.c = "injectionOrder",
                          col_sampleType.c = "sampleType")

Variable filtering

using inspecting to compute the 'pool_CV' metric

sacurine.se <- inspecting(sacurine.se)

discarding features with 'pool_CV' > 0.3

sacurine.se <- sacurine.se[rowData(sacurine.se)[, "pool_CV"] <= 0.3, ]

discarding the 'pool' observations

sacurine.se <- sacurine.se[, colData(sacurine.se)[, "sampleType"] != "pool"]
print(sacurine.se)

Normalizing

assay(sacurine.se) <- sweep(assay(sacurine.se),
                            2,
                            colData(sacurine.se)[, "osmolality"],
                            "/")

transforming: Transforming the data intensities

sacurine.se <- transforming(sacurine.se, method.c = "log10")

Sample filtering

sacurine.se <- inspecting(sacurine.se)
sacurine.se <- sacurine.se[, colData(sacurine.se)[, "hotel_pval"] >= 0.001 &
                             colData(sacurine.se)[, "miss_pval"] >= 0.001 &
                             colData(sacurine.se)[, "deci_pval"] >= 0.001]

Final visual check of the data before performing the statistics

phenomis::inspecting(sacurine.se)

hypotesting: Univariate hypothesis testing

sacurine.se <- hypotesting(sacurine.se,
                           test.c = "ttest",
                           factor_names.vc = "gender",
                           adjust.c = "BH",
                           adjust_thresh.n = 0.05)

Unsupervised analysis

Principal component analysis

ropls Bioconductor package (already loaded as a dependance from phenomis) [\@thevenot_analysis_2015]

sacPca <- ropls::opls(sacurine.se, info.txt = NA)
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "gender"],
            typeVc = "x-score")
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "age"],
            typeVc = "x-score")
ropls::plot(sacPca,
            parAsColFcVn = colData(sacurine.se)[, "bmi"],
            typeVc = "x-score")

sacurine.se <- ropls::getEset(sacPca)

clustering: hierarchical clustering

sacurine.se <- clustering(sacurine.se, correl.c = "spearman",
                          clusters.vi = c(5, 3))

Supervised modeling

(O)PLS(-DA) modeling

With the ropls Bioconductor package [\@thevenot_analysis_2015]:

sacPlsda <- ropls::opls(sacurine.se, "gender")

sacurine.se <- ropls::getEset(sacPlsda)

Feature selection

With the biosigner Bioconductor package [\@rinaudo_biosigner:_2016]:

sacurine.biosign <- biosigner::biosign(sacurine.se, "gender", seedI = 123)
sacurine.se <- biosigner::getEset(sacurine.biosign)

annotating: MS annotation

This method is based on the biodb and biodbChebi R packages available on github.

Viewing the parameters from the annotating method and their default values:

phenomis::annotating_parameters()

Chemical annotation with ChEBI

by mz values

sacurine.se <- annotating(sacurine.se,
                          database.c = "chebi",
                          param.ls = list(query.type = "mz",
                                          query.col = "mass_to_charge",
                                          ms.mode = "neg",
                                          mz.tol = 10,
                                          mz.tol.unit = "ppm",
                                          max.results = 3,
                                          prefix = "chebiMZ."),
                          report.c = "none")
knitr::kable(head(rowData(sacurine.se)[, grep("chebiMZ", colnames(rowData(sacurine.se)))]))

by ChEBI identifiers

sacurine.se <- annotating(sacurine.se, database.c = "chebi",
                          param.ls = list(query.type = "chebi.id",
                                          query.col = "database_identifier",
                                          prefix = "chebiID."))
knitr::kable(head(rowData(sacurine.se)[, grep("chebiID", colnames(rowData(sacurine.se)))]))

Chemical annotation with a local database

Loading a local (example) MS database:

msdbDF <- read.table(system.file("extdata/local_ms_db.tsv", package = "phenomis"),
                     header = TRUE,
                     sep = "\t",
                     stringsAsFactors = FALSE)

Querying the local database

sacurine.se <- annotating(sacurine.se,
                          database.c = "local.ms",
                          param.ls = list(query.type = "mz",
                                          query.col = "mass_to_charge",
                                          ms.mode = "neg",
                                          mz.tol = 5,
                                          mz.tol.unit = "ppm",
                                          local.ms.db = msdbDF,
                                          prefix = "localMS."),
                          report.c = "none")
knitr::kable(rowData(sacurine.se)[!is.na(rowData(sacurine.se)[, "localMS.accession"]), grep("localMS.", colnames(rowData(sacurine.se)), fixed = TRUE)])

writing: Exporting the results

phenomis::writing(sacurine.se, dir.c = getwd())

phenomis::writing(sacurine.se, dir.c = 'figures/temp', overwrite.l = TRUE)

References

SciDoPhenIA/phenomis documentation built on June 9, 2022, 11:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SciDoPhenIA/phenomis
Analysis of single and multiple phenomic data sets

In SciDoPhenIA/phenomis: Analysis of single and multiple phenomic data sets

Introduction

Context

Methods

Formats

3 tabular file format used for import/export

Text and graphical outputs

Availability

Hands-on

The sacurine cohort study

reading: Reading the data

inspecting: Looking at the data

Post-processing

correcting: Correcting signal drift and batch effect

Variable filtering

Normalizing

transforming: Transforming the data intensities

Sample filtering

hypotesting: Univariate hypothesis testing

Unsupervised analysis

Principal component analysis

clustering: hierarchical clustering

Supervised modeling

(O)PLS(-DA) modeling

Feature selection

annotating: MS annotation

Chemical annotation with ChEBI

by mz values

by ChEBI identifiers

Chemical annotation with a local database

writing: Exporting the results

References

R Package Documentation

Browse R Packages

We want your feedback!

SciDoPhenIA/phenomis Analysis of single and multiple phenomic data sets

In SciDoPhenIA/phenomis: Analysis of single and multiple phenomic data sets

Introduction

Context

Methods

Formats

3 tabular file format used for import/export

Text and graphical outputs

Availability

Hands-on

The sacurine cohort study

reading: Reading the data

inspecting: Looking at the data

Post-processing

correcting: Correcting signal drift and batch effect

Variable filtering

Normalizing

transforming: Transforming the data intensities

Sample filtering

hypotesting: Univariate hypothesis testing

Unsupervised analysis

Principal component analysis

clustering: hierarchical clustering

Supervised modeling

(O)PLS(-DA) modeling

Feature selection

annotating: MS annotation

Chemical annotation with ChEBI

by mz values

by ChEBI identifiers

Chemical annotation with a local database

writing: Exporting the results

References

R Package Documentation

Browse R Packages

We want your feedback!

SciDoPhenIA/phenomis
Analysis of single and multiple phenomic data sets