knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=7, fig.height=6
)
library(NOREVA)

Introduction

NOREVA is constructed to enable the online services of (1) normalizing the time-course/multi-class metabolomic data using 168 methods/strategies, (2) evaluating the normalization performances from multiple perspectives, and (3) enabling the systematic comparison among all methods/strategies based on a comprehensive performance ranking. Particularly, five well-established criteria, each with a distinct underlying theory, are integrated to ensure a much more comprehensive evaluation than any single criterion. Besides its largest and most diverse sets of normalization methods/strategies among all available tools, NOREVA provided a unique feature of allowing the quality control-based correction sequentially followed by data normalization.

For function descriptions and analysis of sample datasets you can also use "??NOREVA" command in R.

Installation

# download the source package of NOREVA-0.1.0.tar.gz and install it
install.packages(pkg = 'NOREVA-0.1.0.tar.gz')

# Or the development version from GitHub:
install.packages("devtools")
devtools::install_github("idrblab/NOREVA")

# NOREVA package depends on several packages, which can be installed using the below commands:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biobase")
BiocManager::install("pcaMethods")
BiocManager::install("multtest")
BiocManager::install("limma")
BiocManager::install("impute")
BiocManager::install("statTarget")
BiocManager::install("ProteoMM")
BiocManager::install("timecourse")
BiocManager::install("ropls")
BiocManager::install("vsn")
BiocManager::install("affy")
devtools::install_github("metabolomicstats/NormalizeMets")
devtools::install_github("fawda123/ggord")

Usage

library(NOREVA)

1. Conduct normalization and assess performance of multi-class (N>1) metabolomic study with dataset without QCSs and ISs.

```(r) allranks_non <- normulticlassnoall(fileName, assum_a="Y", assum_b="Y", assum_c="Y")

`fileName` 
This function is used for processing the dataset without QCSs and ISs of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`assum_a` 
Input a letter "Y" or "N".<br>
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_b` 
Input a letter "Y" or "N".<br>
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_c` 
Input a letter "Y" or "N".<br>
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

<font size=3>2. Conduct normalization and assess performance of multi-class (N>1) metabolomic study with dataset with quality control samples (QCSs).</font>

```(r)
allranks_qcs <- normulticlassqcall(fileNameQ, assum_a="Y", assum_b="Y", assum_c="Y")

fileNameQ This function is used for processing the dataset with quality control samples (QCSs) of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

assum_a Input a letter "Y" or "N".
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_b Input a letter "Y" or "N".
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_c Input a letter "Y" or "N".
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

3. Conduct normalization and assess performance of multi-class (N>1) metabolomic study with dataset with internal standards (ISs).

```(r) allranks_is <- normulticlassisall(fileNameI, IS, assum_a="Y", assum_b="Y", assum_c="Y")

`fileNameI` 
This function is used for processing the dataset with internal standards (ISs) of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`IS`
Input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,)

`assum_a` 
Input a letter "Y" or "N".<br>
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_b` 
Input a letter "Y" or "N".<br>
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_c` 
Input a letter "Y" or "N".<br>
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

<font size=3>4. Conduct normalization and assess performance of time-course metabolomic study with dataset without QCSs and ISs.</font>

```(r)
allranks_non <- nortimecoursenoall(fileNameC, assum_a="Y", assum_b="Y", assum_c="Y")

fileNameC This function is used for processing the dataset without QCSs and ISs of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

assum_a Input a letter "Y" or "N".
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_b Input a letter "Y" or "N".
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_c Input a letter "Y" or "N".
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

5. Conduct normalization and assess performance of time-course metabolomic study with dataset with QCSs.

```(r) allranks_non <- nortimecourseqcall(fileNameS, assum_a="Y", assum_b="Y", assum_c="Y")

`fileNameS` 
This function is used for processing the dataset with QCSs of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`assum_a` 
Input a letter "Y" or "N".<br>
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_b` 
Input a letter "Y" or "N".<br>
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_c` 
Input a letter "Y" or "N".<br>
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

<font size=3>6. Conduct normalization and assess performance of time-course metabolomic study with dataset with internal standards (ISs).</font>

```(r)
allranks_tis <- nortimecourseisall(fileNameT, IS, assum_a="Y", assum_b="Y", assum_c="Y")

fileNameT This function is used for processing the dataset with internal standards (ISs) of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

IS Input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,)

assum_a Input a letter "Y" or "N".
All metabolites were assumed to be equally important.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_b Input a letter "Y" or "N".
The level of metabolite abundance is constant among all samples.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_c Input a letter "Y" or "N".
The intensities of the majority of the metabolites are not changed under the studied conditions.The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

7. Select normalization and multi-class (N>1) metabolomic study with dataset without QCSs and ISs.

```(r) seleranks_non <- normulticlassnopart(fileName, selectFile)

`fileName`
This function is used for processing the dataset without QCSs and ISs of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`selectFile`
Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

<font size=3>8. Select normalization and assess performance of multi-class (N>1) metabolomic study with dataset with Quality Control Samples (QCSs).</font>

```(r)
seleranks_non <- normulticlassqcpart(fileNameQ, selectFile)

fileNameQ This function is used for processing the dataset with Quality Control Samples (QCSs) of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

selectFile Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

9. Select normalization and assess performance of multi-class (N>1) metabolomic study with dataset with internal standards (ISs).

```(r) seleranks_non <- normulticlassispart(fileNameI, IS, selectFile)

`fileNameI`
This function is used for processing the dataset with internal standards (ISs) of multi-class (N>1) metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`IS`
Input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,).

`selectFile`
Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

<font size=3>10. Select normalization and assess performance of time-course metabolomic study with dataset without QCSs and ISs.</font>

```(r)
seleranks_non <- nortimecoursenopart(fileNameC, selectFile)

fileNameC This function is used for processing the dataset without QCSs and ISs of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

selectFile Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

11. Select normalization and assess performance of time-course metabolomic study with dataset with QCSs.

```(r) seleranks_non <- nortimecourseqcpart(fileNameS, selectFile)

`fileNameS`
This function is used for processing the dataset with QCSs of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`selectFile`
Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

<font size=3>12. Select normalization and assess performance of time-course metabolomic study with dataset with internal standards (ISs).</font>

```(r)
seleranks_non <- nortimecourseispart(fileNameT, IS, selectFile)

fileNameT This function is used for processing the dataset with internal standards (ISs) of time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

IS Input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,).

selectFile Input the name of your prefered strategies. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data/selectdata.rda”.

13. Multi-class (N>1) metabolomic study with dataset after normalization.

```(r) seleranks_non <- normulticlassmatrix(datatype, fileName, IS, impt=NULL, trsf=NULL, nmal=NULL, nmal2=NULL, nmals=NULL)

`datatype`
Input the number of data type.<br>
If set 1, the dataset of multi-class (N>1) metabolomic study without QCSs and ISs.<br>
If set 2, the dataset of multi-class (N>1) metabolomic study with quality control samples (QCSs).<br>
If set 3, the dataset of multi-class (N>1) metabolomic study with dataset with internal standards (ISs).<br>

`fileName`
This function is used for processing the dataset with your prefered 'datatype' of multi-class (N>1) metabolomic study . Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

`IS`
If you select 'datatype = 3', please input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,).

`impt` 
Input the name of imputation methods.<br>
If set 1, method of column mean imputation.<br>
If set 2, method of column median imputation.<br>
If set 3, method of half of the minimum positive value.<br>
If set 4, method of KNN imputation.

`trsf` 
Input the name of transformation methods.<br>
If set 1, method of cube root transformation.<br>
If set 2, method of log transformation.<br>
If set 3, none transformation method.

`nmal`
Input the name of normalization methods.<br>
To select the proper normalization method, please refer to the detailed file in the the working directory (in github) “idrblab/NOREVA/data/allmethods.rda”.

`nmal2`
Input the name of normalization methods.<br>
To select the proper normalization method, please refer to the detailed file in the the working directory (in github) “idrblab/NOREVA/data/allmethods.rda”.

`nmals`
Input the name of normalization for IS.<br>
If set 1, method of Single Internal Standard(SIS).<br>
Normalizing by subtracting log metabolite abundance of a single IS from log abundances of the metabolites in each sample<sup>1</sup>.<br>
If set 2, method of Normalization using Optimal Selection of Multiple ISs(NOMIS).<br>
Finding optimal normalization factor for removing systematic variation using variability data from multiple IS compounds<sup>2</sup>.<br>
If set 3, method of Cross-contribution Compensating MultiISs Normalization(CCMN).<br>
Monitoring systematic errors from randomized and designed experiments using multiple internal standards<sup>3</sup>.<br>
If set 4, method of Remove Unwanted Variation-Random(RUVrand).<br>
Based on a linear mixed effects model utilizing IS metabolites to obtain normalized data in metabolomics experiments<sup>4</sup>.

<font size=3>14. Time-course metabolomic study with dataset after normalization.</font>

```(r)
seleranks_non <- nortimecoursematrix(datatype, fileNameC, IS, impt=NULL, trsf=NULL, nmal=NULL, nmal2=NULL, nmals=NULL)

datatype Input the number of data type.
If set 1, the dataset of multi-class (N>1) metabolomic study without QCSs and ISs.
If set 2, the dataset of multi-class (N>1) metabolomic study with quality control samples (QCSs).
If set 3, the dataset of multi-class (N>1) metabolomic study with dataset with internal standards (ISs).

fileName This function is used for processing the dataset with your prefered 'datatype' of Time-course metabolomic study. Sample data of this data type is in the working directory (in github) “idrblab/NOREVA/data”.

IS If you select 'datatype = 3', please input the Column of Internal Standards. For example, the replacement of IS to 2,6,9,n indicates that the metabolites in the 2st, 6th, 9th, and nth columns of in your input dataset Input-Dataset.csv should be considered as the ISs or quality control metabolites. If there is only one IS, the column number of this IS should be listed. If there are multiple ISs, the column number of all ISs should be listed and separated by comma (,).

impt Input the name of imputation methods.
If set 1, method of column mean imputation.
If set 2, method of column median imputation.
If set 3, method of half of the minimum positive value.
If set 4, method of KNN imputation.

trsf Input the name of transformation methods.
If set 1, method of cube root transformation.
If set 2, method of log transformation.
If set 3, none transformation method.

nmal Input the name of normalization methods.
To select the proper normalization method, please refer to the detailed file in the the working directory (in github) “idrblab/NOREVA/data/allmethods.rda”.

nmal2 Input the name of normalization methods.
To select the proper normalization method, please refer to the detailed file in the the working directory (in github) “idrblab/NOREVA/data/allmethods.rda”.

nmals Input the name of normalization for IS.
If set 1, method of Single Internal Standard(SIS).
Normalizing by subtracting log metabolite abundance of a single IS from log abundances of the metabolites in each sample1.
If set 2, method of Normalization using Optimal Selection of Multiple ISs(NOMIS).
Finding optimal normalization factor for removing systematic variation using variability data from multiple IS compounds2.
If set 3, method of Cross-contribution Compensating MultiISs Normalization(CCMN).
Monitoring systematic errors from randomized and designed experiments using multiple internal standards3.
If set 4, method of Remove Unwanted Variation-Random(RUVrand).
Based on a linear mixed effects model utilizing IS metabolites to obtain normalized data in metabolomics experiments4.

15. Draw circular barplot.

```(r) norvisualization(data, outputfile="NOREVA-Ranking-Top.%d.workflows.%s",cutoff="100", outputtype="pdf", maxValue="40", colorSet = c("#EA4335", "#4285F4", "#FBBC05", "#800080"), totalAngle = "340", bgColor = "#FFFFFF", fontColor="#000000")

`data`
The input is the output file of the above functions such as the "normulticlassnoall", "normulticlassqcall", or "nortimecoursenoall" <i>et al</i>.
`outputfile`
Format string, together with cutoff value and type to generate formatted file name.
`cutoff`
Integer, which means to filter the results, the default is 100.
`outputtype`
String, indicating the output type, support pdf, eps, default is pdf.
`maxValue`
Double-precision floating-point number, representing the characteristic value represented by the maximum length of the rectangle, the default is 40.
`colorSet`
Hexadecimal color string group, representing the four-layer color setting of the graphics from the inside to the outside, the default is red (#EA4335), blue (#4285F4), yellow (#FBBC05), purple (#800080).
`totalAngle`
Double-precision floating-point number, representing the total angle of rotation of the drawing, in degrees, the default value is 340.
`bgColor`
Hexadecimal color string, representing the background color of the graphic drawing, the default is white (#FFFFFF).
`fontColor`
Hexadecimal color string, representing the font color, the default is black (#000000).

## Examples

```(r)
# Step 1: Assessing all normalization methods for time-course/multi-class metabolomic data.

# Note: the file should be in the format of Comma-Separated Values (CSV), which provides the intensity data of metabolites. This input file should be numeric type except the first columns containing the names, label (control or case) or timecourse label of the studied samples, respectively (the detail information of the file format of different files in the working directory (in github) “idrblab/NOREVA/data”). The intensity data should be provided in this input file with the following order: samples in row and metabolites in column. Missing value (NA) of metabolites intensity are allowed.

multi_non_data <- read.csv(file = "XXX.csv", header = TRUE, stringsAsFactors = FALSE)

1.1 Multi-class (N>1) metabolomic study with dataset without QCSs and ISs
allranks_non <- normulticlassnoall(fileName = multi_non_data, assum_a="Y", assum_b="Y", assum_c="Y")

1.2 Multi-class (N>1) Metabolomic Study with dataset with Quality Control Samples (QCSs)
allranks_qcs <- normulticlassqcall(fileNameQ = multi_qcs_data, assum_a="Y", assum_b="Y", assum_c="Y")

1.3 Multi-class (N>1) metabolomic study with dataset with Internal Standards (ISs)
allranks_is <- normulticlassisall(fileNameI = multi_is_data, IS = "3,4,5", assum_a="Y", assum_b="Y", assum_c="Y")

1.4 Time-course metabolomic study with dataset without QCSs and ISs
allranks_tnon <- nortimecoursenoall(fileNameC = timec_non_data, assum_a="Y", assum_b="Y", assum_c="Y")

1.5 Time-course metabolomic study with dataset with QCSs
allranks_tqcs <- nortimecourseqcall(fileNameS = timec_qcs_data, assum_a="Y", assum_b="Y", assum_c="Y")

1.6 Time-course metabolomic study with dataset with Internal Standards (ISs)
allranks_tis <- nortimecourseisall(fileNameT = timec_is_data, IS = "4,5", assum_a="Y", assum_b="Y", assum_c="Y")

```(r)

Step 2: a circular barplot illustrating the performance ranking of all strategies

norvisualization(data = "data.csv", cutoff = "100")

```(r)
# Step 3: Normalizes datasets using the normalization strategies based on the results of assessment

3.1 Normalization with datasets of multi-class (N>1) metabolomic study
nordata <- normulticlassmatrix(datatype = 1, fileName = multi_non_data, impt="1", trsf="1", nmal="1", nmal2="1")

3.2 Normalization with datasets of time-course metabolomic study
nordata <- nortimecoursematrix(datatype = "1", fileNameC = timec_non_data, impt="1", trsf="1", nmal="1", nmal2="1")

# Note: please select the appropriate number code represents imputation, transformation, normalization methods (See above details).

```(r)

Step 4: Users can also use NOREVA for accessing the part of normalization methods/strategies which you preferred

4.1 Multi-class (N>1) metabolomic study with dataset without QCSs and ISs seleranks_non <- normulticlassnopart(fileName = multi_non_data, selectFile = selectdata)

4.2 Multi-class (N>1) Metabolomic Study with dataset with Quality Control Samples (QCSs) seleranks_qcs <- normulticlassqcpart(fileNameQ = multi_qcs_data, selectFile = selectdata)

4.3 Multi-class (N>1) metabolomic study with dataset with Internal Standards (ISs) seleallranks_is <- normulticlassispart(fileNameI = multi_is_data, IS = "3,4,5", selectFile = selectdataS)

4.4 Time-course metabolomic study with dataset without QCSs and ISs seleranks_non <- nortimecoursenopart(fileNameC = timec_non_data, selectFile = selectdata)

4.5 Time-course metabolomic study with dataset with QCSs seleranks_qcs <- nortimecourseqcpart(fileNameS = timec_qcs_data, selectFile = selectdata)

4.6 Time-course metabolomic study with dataset with Internal Standards (ISs) seleallranks_tis <- nortimecourseispart(fileNameT = timec_is_data, IS = "4,5", selectFile = selectdataS)

Note: please select the appropriate number code represents imputation, transformation, normalization methods (See above details).

```r
load("../data/selectdata.rda")
head(selectdata)

Should you have any questions, please contact Jianbo Fu at fujianbo@zju.edu.cn

Reference

  1. Gullberg, J., Jonsson, P., et al. Design of experiments: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry. Anal Biochem 331, 283-295 (2004).

  2. Sysi-Aho, M., Katajamaa, M., et al. Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8, 93 (2007).

  3. Redestig, H., Fukushima, A., et al. Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. Anal Chem 81, 7974-7980 (2009).

  4. De Livera, A.M., Dias, D.A., et al. Normalizing and integrating metabolomics data. Anal Chem 84, 10768-10776 (2012).



idrblab/NOREVA2020 documentation built on Sept. 14, 2020, 12:04 a.m.