In JianboFu0406/EVALFQ111: R Package for Evaluating Label-Free Proteome Quantification

knitr::opts_chunk$set(echo = TRUE)
library('EVALFQ')
set.seed(135)

Introduction

The EVALFQ provides an open assess online service enabling (1) the label-free proteome quantification (LFQ) based on three quantification measurements SWATH-MS, Peak Intensity and Spectral Counting, (2) the evaluation of LFQ performances from multiple perspectives and (3) the identification of the optimal LFQs based on comprehensive performance ranking. The EVALFQ mainly includes two function lfq_access and lfq_spiked to realize not only AUTOMATICALLY detects the diverse formats of data generated by all quantification software, but also provides the most complete set of processing methods among available tools, which including the methods of transformation, pretreatment (centering, scaling & normalization) and missing value imputation.

This tutorial will walk the readers through an example analysis (as follow 'Examples').

Installation

# download the source package of EVALFQ_0.1.0.tar.gz and install it.
install.packages(pkg = 'EVALFQ_0.1.0.tar.gz')

# Alternatively EVALFQ can be installed from GitHub:
# install.packages("devtools")
devtools::install_github("idrblab/EVALFQ")
library(EVALFQ)

# Before install "EVALFQ", please make sure the following dependent packages have been installed:
library(Biobase)##BiocManager::install("Biobase")
library(BiocGenerics)##BiocManager::install("BiocGenerics")
library(ROTS)##BiocManager::install("ROTS")
library(limma)##BiocManager::install("limma")
library(ProteoMM)##BiocManager::install("ProteoMM")
library(impute)##BiocManager::install("impute")
library(pcaMethods)##BiocManager::install("pcaMethods")
library(vsn)##BiocManager::install("vsn")
library(affy)##BiocManager::install("affy")
library(metabolomics)##install_github("cran/metabolomics")
library(devtools)
install_github("idrblab/EVALFQ1")
library(EVALFQ)

Usage

library(EVALFQ)

1. Conduct LFQ and assess performance of all possible LFQ workflows.

```(r) allranks <- lfqevalueall(data_q, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1")

`data_q` 
This input file should be numeric type except the first and second column containing the names and label (control or case) of the studied samples, respectively. The intensity data should be provided in this input file with the following order: samples in row and proteins/peptides in column. Missing value (NA) of protein intensity are allowed.

`assum_a` 
All proteins were assumed to be equally important.<br>
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_b` 
The level of protein abundance was assumed to be constant among all samples.<br>
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`assum_c` 
The intensities of the vast majority of the proteins were assumed to be unchanged under the studied conditions.<br>
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

`Ca` 
Criterion (a): precision of LFQ based on the proteomes among replicates<sup>1</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (a).<br>
If set 0, the user excludes Criterion (a) from performance assessment.<br>
The default setting of this value is “1”.

`Cb` 
Criterion (b): classification ability of LFQ between distinct sample groups<sup>2</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (b).<br>
If set 0, the user excludes Criterion (b) from performance assessment.<br>
The default setting of this value is “1”.

`Cc` 
Criterion (c): differential expression analysis by reproducibility-optimization<sup>3</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (c).<br>
If set 0, the user excludes Criterion (c) from performance assessment.<br>
The default setting of this value is “1”.

`Cd` 
Criterion (d): reproducibility of the identified protein markers among different datasets<sup>4</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (d).<br>
If set 0, the user excludes Criterion (d) from performance assessment.<br>
The default setting of this value is “1”.

#### 2. Conduct LFQ and assess performance by collectively considering the spiked proteins.

```(r)
allranks <- lfqspikedall(data_s,
                         spiked,
                         assum_a = "Y",
                         assum_b = "Y",
                         assum_c = "Y",
                         Ca = "1", 
                         Cb = "1", 
                         Cc = "1", 
                         Cd = "1",
                         Ce = "1")

data_s This input file should be numeric type except the first and second column containing the names and label (control or case) of the studied samples, respectively. The intensity data should be provided in this input file with the following order: samples in row and proteins/peptides in column. Missing value (NA) of protein intensity are allowed.

spiked The file should provide the concentrations of known proteins (such as spiked proteins). This file is required, if the user want to conduct assessment using criteria (e) This file should contain the class of samples and the Sample ID. The Sample ID should be unique and defined by the preference of EVALFQ users, and the class of samples refers to the group of Sample ID. The ID of the spiked proteins should be consistent in both “data_s" and "spiked”. Detail information are described in the online “Example”.

assum_a All proteins were assumed to be equally important.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_b The level of protein abundance was assumed to be constant among all samples.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

assum_c The intensities of the vast majority of the proteins were assumed to be unchanged under the studied conditions.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.

Ca Criterion (a): precision of LFQ based on the proteomes among replicates¹.
If set 1, the user chooses to assess LFQ workflows using Criterion (a).
If set 0, the user excludes Criterion (a) from performance assessment.
The default setting of this value is “1”.

Cb Criterion (b): classification ability of LFQ between distinct sample groups².
If set 1, the user chooses to assess LFQ workflows using Criterion (b).
If set 0, the user excludes Criterion (b) from performance assessment.
The default setting of this value is “1”.

Cc Criterion (c): differential expression analysis by reproducibility-optimization³.
If set 1, the user chooses to assess LFQ workflows using Criterion (c).
If set 0, the user excludes Criterion (c) from performance assessment.
The default setting of this value is “1”.

Cd Criterion (d): reproducibility of the identified protein markers among different datasets⁴.
If set 1, the user chooses to assess LFQ workflows using Criterion (d).
If set 0, the user excludes Criterion (d) from performance assessment.
The default setting of this value is “1”.

Ce Criterion (e): accuracy of LFQ based on spiked and background proteins⁵.
If set 1, the user chooses to assess LFQ workflows using Criterion (e).
If set 0, the user excludes Criterion (e) from performance assessment.
The default setting of this value is “1”.

3. Draw heatmap and save as EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf.

```(r) lfqvisualize(object, top = 100)

`object` 
The input is the output file of the `lfq_access` or `lfq_spiked`.

`top`
The default 'top' value is 100.<br>
You can view the top ranking heatmap you want.<br>

#### 4. Conduct LFQ and assess performance of one specific LFQ workflow.

```(r)
res <- lfqevalupart(data_q,
                    tra = NULL, 
                    cen = NULL, 
                    sca = NULL, 
                    nor = NULL, 
                    imp = NULL,
                    Ca = "1", 
                    Cb = "1", 
                    Cc = "1", 
                    Cd = "1")

data_q Same as the description of the 'lfq_access' above.

tra All transformation metohds are as follows:
"1": Box–Cox Transformation(BOX).
Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution⁶.
"2": Log Transformation (LOG).
Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance⁷.
"3": No transformation method applied (NON).

cen All centering metohds are as follows:
"1": Mean-Centering (MEC).
Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important ⁸.
"2": Median-Centering (MDC).
Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important ⁸.
"3": No centering method applied (NON).

sca All scaling metohds are as follows:
"1": No scaling method applied (NON).
"2": Auto Scaling (ATO).
Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor ⁹; assuming all proteins are equally important ⁸.
"3": Pareto Scaling (PAR).
Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor ¹⁰; assuming all proteins are equally important ⁸.
"4": Vast Scaling (VAS).
Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor ¹¹; assuming all proteins are equally important ⁸.
"5": Range Scaling (RAN).
Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor ¹²; assuming all proteins are equally important ⁸.

nor All normalization metohds are as follows:
"1": Cyclic Loess (CYC).
Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups ^{13, 14} and the systematic bias is nonlinearly dependent on the protein abundances ¹⁵.
"2": EigenMS (EIG).
Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples ^{16, 17}; Does not require any assumption about the relative strength of signals due to each source of variation ¹⁶.
"3": Locally Weighted Scatterplot Smoothing (LOW).
Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances ^{18, 14} and the systematic bias is nonlinearly dependent on the protein intensities ¹⁸.
"4": Median Absolute Deviation (MAD).
Ensuring the comparability of protein intensities among all samples ¹⁹; assuming the median level of the protein abundance and the spread of abundances are the same in all samples ²⁰.
"5": Mean Normalization (MEA).
Ensuring the protein abundance values from all studied samples directly comparable with each other ¹⁹; assuming the mean level of the protein abundance is constant for all samples ¹⁵.
"6": Median Normalization (MED).
Making the protein intensities from all individual samples directly comparable with each other ^{15, 19}; assuming the median level of the protein abundance is constant for all samples ²¹.
"7": No normalization method applied (NON).
"8": Probabilistic Quotient Normalization (PQN).
Ensuring the comparability of protein intensities among all samples ¹⁹; assuming that the majority of the protein intensities does not vary for the studied classes ²².
"9": Quantile Normalization (QUA).
Making the protein intensities from all samples directly comparable with each other ¹⁹; assuming that the majority of protein intensity signals are unchanged among samples ¹⁸.
"10": Robust Linear Regression (RLR).
Assuming that the intensities of the majority of the proteins are not changed in control and case groups ^{14, 23} and the systematic bias is linearly dependent on the magnitude of protein abundances ¹⁵.
"11": Total Ion Current (TIC).
Making the protein intensities from all samples directly comparable with each other ¹⁹; assuming the total area under the protein abundance curve is constant among samples ²⁴.
"12": Trimmed Mean of M Values (TMM).
Ensuring the protein abundance values from all studied samples directly comparable with each other ¹⁹; assuming the majority of proteins are not differentially expressed between control and case groups ²⁵.
"13": Variance Stabilization Normalization (VSN).
Having a built-in transformation ²⁶ and making individual observations more directly comparable ²⁷; assuming that most of the proteins across different samples are not differentially expressed ²⁷.

imp All imputation metohds are as follows:
"1": No Imputation method applied (NON).
"2": Background Imputation (BAK).
Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run ¹⁵.
"3": Bayesian Principal Component Imputation (BPC).
Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components ²⁸.
"4": Censored Imputation (CEN).
Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity ¹⁵.
"5": K-nearest Neighbor Imputation (KNN).
Finding k most similar proteins (k-nearest neighbors) and using a weighted average over these k proteins to estimate the missing protein values ^{15, 29}.
"6": Singular Value Decomposition (SVD).
Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data ²⁹.
"7": Zero Imputation (ZER).
Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros ¹⁵.

5. Conduct LFQ and assess performance by collectively considering the spiked proteins.

```(r) res <- lfqspikepart(data_s, spiked, tra = NULL, cen = NULL, sca = NULL, nor = NULL, imp = NULL, Ca = "1", Cb = "1", Cc = "1", Cd = "1")

`data_s` 
Same as the description of the 'lfq_spiked' above.

`spiked` 
Same as the description of the 'lfq_spiked' above.

`tra` 
All transformation metohds are as follows:<br>
"1": Box–Cox Transformation(BOX).<br>
Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution<sup>6</sup>.<br>
"2": Log Transformation (LOG).<br>
Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance<sup>7</sup>.<br>
"3": No transformation method applied (NON).<br>

`cen` 
All centering metohds are as follows:<br>
"1": Mean-Centering (MEC).<br>
Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important <sup>8</sup>.<br>
"2": Median-Centering (MDC).<br>
Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important <sup>8</sup>.<br>
"3": No centering method applied (NON).<br>

`sca` 
All scaling metohds are as follows:<br>
"1": No scaling method applied (NON).<br>
"2": Auto Scaling (ATO).<br>
Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor <sup>9</sup>; assuming all proteins are equally important <sup>8</sup>.<br>
"3": Pareto Scaling (PAR).<br>
Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor <sup>10</sup>; assuming all proteins are equally important <sup>8</sup>.<br>
"4": Vast Scaling (VAS).<br>
Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor <sup>11</sup>; assuming all proteins are equally important <sup>8</sup>.<br>
"5": Range Scaling (RAN).<br>
Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor <sup>12</sup>; assuming all proteins are equally important <sup>8</sup>.<br>

`nor` 
All normalization metohds are as follows:<br>
"1": Cyclic Loess (CYC).<br>
Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups <sup>13, 14</sup> and the systematic bias is nonlinearly dependent on the protein abundances <sup>15</sup>.<br>
"2": EigenMS (EIG).<br>
Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples <sup>16, 17</sup>; Does not require any assumption about the relative strength of signals due to each source of variation <sup>16</sup>.<br>
"3": Locally Weighted Scatterplot Smoothing (LOW).<br>
Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances <sup>18, 14</sup> and the systematic bias is nonlinearly dependent on the protein intensities <sup>18</sup>.<br>
"4": Median Absolute Deviation (MAD).<br>
Ensuring the comparability of protein intensities among all samples <sup>19</sup>; assuming the median level of the protein abundance and the spread of abundances are the same in all samples <sup>20</sup>.<br>
"5": Mean Normalization (MEA).<br>
Ensuring the protein abundance values from all studied samples directly comparable with each other <sup>19</sup>; assuming the mean level of the protein abundance is constant for all samples <sup>15</sup>.<br>
"6": Median Normalization (MED).<br>
Making the protein intensities from all individual samples directly comparable with each other <sup>15, 19</sup>; assuming the median level of the protein abundance is constant for all samples <sup>21</sup>.<br>
"7": No normalization method applied (NON).<br>
"8": Probabilistic Quotient Normalization (PQN).<br>
Ensuring the comparability of protein intensities among all samples <sup>19</sup>; assuming that the majority of the protein intensities does not vary for the studied classes <sup>22</sup>.<br>
"9": Quantile Normalization (QUA).<br>
Making the protein intensities from all samples directly comparable with each other <sup>19</sup>; assuming that the majority of protein intensity signals are unchanged among samples <sup>18</sup>.<br>
"10": Robust Linear Regression (RLR).<br>
Assuming that the intensities of the majority of the proteins are not changed in control and case groups <sup>14, 23</sup> and the systematic bias is linearly dependent on the magnitude of protein abundances <sup>15</sup>.<br>
"11": Total Ion Current (TIC).<br>
Making the protein intensities from all samples directly comparable with each other <sup>19</sup>; assuming the total area under the protein abundance curve is constant among samples <sup>24</sup>.<br>
"12": Trimmed Mean of M Values (TMM).<br>
Ensuring the protein abundance values from all studied samples directly comparable with each other <sup>19</sup>; assuming the majority of proteins are not differentially expressed between control and case groups <sup>25</sup>.<br>
"13": Variance Stabilization Normalization (VSN).<br>
Having a built-in transformation <sup>26</sup> and making individual observations more directly comparable <sup>27</sup>; assuming that most of the proteins across different samples are not differentially expressed <sup>27</sup>.<br>

`imp` 
All imputation metohds are as follows:<br>
"1": No Imputation method applied (NON).<br>
"2": Background Imputation (BAK).<br>
Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run <sup>15</sup>.<br>
"3": Bayesian Principal Component Imputation (BPC).<br>
Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components <sup>28</sup>.<br>
"4": Censored Imputation (CEN).<br>
Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity <sup>15</sup>.<br>
"5": K-nearest Neighbor Imputation (KNN).<br>
Finding <i>k</i> most similar proteins (<i>k</i>-nearest neighbors) and using a weighted average over these <i>k</i> proteins to estimate the missing protein values <sup>15, 29</sup>.<br>
"6": Singular Value Decomposition (SVD).<br>
Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data <sup>29</sup>.<br>
"7": Zero Imputation (ZER).<br>
Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros <sup>15</sup>.<br>

`Ca` 
Criterion (a): precision of LFQ based on the proteomes among replicates<sup>1</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (a).<br>
If set 0, the user excludes Criterion (a) from performance assessment.<br>
The default setting of this value is “1”.

`Cb` 
Criterion (b): classification ability of LFQ between distinct sample groups<sup>2</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (b).<br>
If set 0, the user excludes Criterion (b) from performance assessment.<br>
The default setting of this value is “1”.

`Cc` 
Criterion (c): differential expression analysis by reproducibility-optimization<sup>3</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (c).<br>
If set 0, the user excludes Criterion (c) from performance assessment.<br>
The default setting of this value is “1”.

`Cd` 
Criterion (d): reproducibility of the identified protein markers among different datasets<sup>4</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (d).<br>
If set 0, the user excludes Criterion (d) from performance assessment.<br>
The default setting of this value is “1”.

`Ce` 
Criterion (e): accuracy of LFQ based on spiked and background proteins<sup>5</sup>.<br>
If set 1, the user chooses to assess LFQ workflows using Criterion (e).<br>
If set 0, the user excludes Criterion (e) from performance assessment.<br>
The default setting of this value is “1”.

## Examples

```r
load("../data/my_spiked.rda")
dim(my_spiked)
head(my_spiked[1:4])
load("../data/spiked_data.rda")
dim(spiked_data)
head(spiked_data[1:5])

```(r)

Step 1: conduct LFQ and assess performance of all possible LFQ workflows or assess performance by collectively considering the spiked proteins.

Note: the file should be in the format of Comma-Separated Values (CSV), which provides the intensity data of proteins/peptides. This input file should be numeric type except the first and second column containing the names and label (control or case) of the studied samples, respectively. The intensity data should be provided in this input file with the following order: samples in row and proteins/peptides in column. Missing value (NA) of protein intensity are allowed.

allranks <- lfqevalueall(data_q = my_df, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1")

Note: the file should be in the format of Comma-Separated Values (CSV), which provides the concentrations of known proteins (such as spiked proteins). This file is required, if the user want to conduct assessment using criteria (e) This file should contain the class of samples and the Sample ID. The Sample ID should be unique and defined by the preference of EVALFQ users, and the class of samples refers to the group of Sample ID. The ID of the spiked proteins should be consistent in both "my_spiked" and "spiked_data".

allranks <- lfqspikedall(data_s = my_spiked, spiked = spiked_data, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1", Ce = "1")

Note: 'allranks' containing all information of performance assessment, criteria selected and ranking.

```r
load("../data/allranks.rda")
head(allranks)

```(r)

Step 2: a heatmap illustrating the performance ranking of all LFQ workflows based on the criteria selected by user.

lfqvisualize(object = allranks, top = 100)

Note: the 'EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf' would be successfully saved in the current path. Please use 'getwd()' to find the current path!

```(r)
# Users can also use EVALFQ by selecting one specific LFQ as follows:

res <- lfqevalupart(data_q = my_df,
                    tra = "1", 
                    cen = "1", 
                    sca = "1", 
                    nor = "1", 
                    imp = "1",
                    Ca = "1", 
                    Cb = "1", 
                    Cc = "1", 
                    Cd = "1")

OR

res <- lfqspikepart(data_s = my_spiked,
                    spiked = spiked_data,
                    tra = "1", 
                    cen = "1", 
                    sca = "1", 
                    nor = "1", 
                    imp = "1",
                    Ca = "1", 
                    Cb = "1", 
                    Cc = "1", 
                    Cd = "1")            

Note: please select the appropriate number code represents transformation, centering, scaling, normalization, imputation methods (See above details).

Should you have any questions, please contact Jianbo Fu at fujianbo@zju.edu.cn

References

Kuharev J, Navarro P, Distler U, et al. In-depth evaluation of software tools for data-independent acquisition based label-free quantification. Proteomics 2015;15:3140–3151.
Griffin NM, Yu J, Long F, et al. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat Biotechnol 2010;28:83–89.
Risso D, Ngai J, Speed TP, et al. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 2014;32:896–902.
Wang X, Gardiner EJ, Cairns MJ. Optimal consistency in microRNA expression analysis using reference-gene-based normalization. Mol Biosyst 2015;11:1235–1240.
Navarro P, Kuharev J, Gillet LC, et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 2016;34:1130–1136.
Lo, K., and Gottardo, R. (2012) Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: An alternative to the skew-t distribution. Stat. Comput. 22, 33–52.
Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Webb-Robertson, B. J., Smith, R. D., and Lipton, M. S. (2006) Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 5, 277–286.
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., and van der Werf, M. J. (2006) Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics 7, 142.
Wang, S. Y., Kuo, C. H., and Tseng, Y. J. (2013) Batch normalizer: A fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. Anal. Chem. 85, 1037–1046.
Wang, X., Zhang, A., Han, Y., Wang, P., Sun, H., Song, G., Dong, T., Yuan, Y., Yuan, X., Zhang, M., Xie, N., Zhang, H., Dong, H., and Dong, W. (2012) Urine metabolomics analysis for biomarker discovery and detection of jaundice syndrome in patients with liver disease. Mol. Cell. Proteomics 11, 370 –380.
Di Guida, R., Engel, J., Allwood, J. W., Weber, R. J., Jones, M. R., Sommer, U., Viant, M. R., and Dunn, W. B. (2016) Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 12, 93.
Smilde, A. K., van der Werf, M. J., Bijlsma, S., van der Werff-van der Vat, B. J., and Jellema, R. H. (2005) Fusion of mass spectrometry-based metabolomics data. Anal. Chem. 77, 6729 – 6736.
Cox, J., Hein, M. Y., Luber, C. A., Paron, I., Nagaraj, N., and Mann, M. (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526.
Ballman, K. V., Grill, D. E., Oberg, A. L., and Therneau, T. M. (2004) Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778 –2786.
Va¨ likangas, T., Suomi, T., and Elo, L. L. (2018) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform. 19, 1–11.
Leek, J. T., and Storey, J. D. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724 –1735.
Karpievitch, Y. V., Taverner, T., Adkins, J. N., Callister, S. J., Anderson, G. A., Smith, R. D., and Dabney, A. R. (2009) Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics 25, 2573–2580.
Adriaens, M. E., Jaillard, M., Eijssen, L. M., Mayer, C. D., and Evelo, C. T. (2012) An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies. BMC Genomics 13, 42.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., and Lindon, J. C. (2006) Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78, 2262–2267.
Fundel, K., Haag, J., Gebhard, P. M., Zimmer, R., and Aigner, T. (2008) Normalization strategies for mRNA expression data in cartilage research. Osteoarthritis Cartilage 16, 947–955.
De Livera, A. M., Dias, D. A., De Souza, D., Rupasinghe, T., Pyke, J., Tull, D., Roessner, U., McConville, M., and Speed, T. P. (2012) Normalizing and integrating metabolomics data. Anal. Chem. 84, 10768 –10776.
Tobin, J., Walach, J., de Beer, D., Williams, P. J., Filzmoser, P., and Walczak, B. (2017) Untargeted analysis of chromatographic data for green and fermented rooibos: Problem with size effect removal. J. Chromatogr. A 1525, 109 –115.
Wang, B., Wang, X. F., and Xi, Y. (2011) Normalizing bead-based microRNA expression data: A measurement error model-based approach. Bioinformatics 27, 1506 –1512.
Smolinska, A., Hauschild, A. C., Fijten, R. R., Dallinga, J. W., Baumbach, J., and van Schooten, F. J. (2014) Current breathomics—A review on data pre-processing techniques and machine learning in metabolomics breath analysis. J. Breath Res. 8, 027105.
Branson, O. E., and Freitas, M. A. (2016) A multi-model statistical approach for proteomic spectral count quantitation. J. Proteomics 144, 23–32.
Rausch, T. K., Schillert, A., Ziegler, A., Lu¨ king, A., Zucht, H. D., and Schulz-Knappe, P. (2016) Comparison of pre-processing methods for multiplex bead-based immunoassays. BMC Genomics 17, 601.
Lin, S. M., Du, P., Huber, W., and Kibbe, W. A. (2008) Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acids Res. 36, e11.
Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007) pcaMethods—A bioconductor package providing PCA methods for incomplete data. Bioinformatics 23, 1164 –1167.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520 –525.

JianboFu0406/EVALFQ111 documentation built on Aug. 10, 2020, 1:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

JianboFu0406/EVALFQ111
R Package for Evaluating Label-Free Proteome Quantification

In JianboFu0406/EVALFQ111: R Package for Evaluating Label-Free Proteome Quantification

Introduction

Installation

Usage

1. Conduct LFQ and assess performance of all possible LFQ workflows.

3. Draw heatmap and save as EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf.

5. Conduct LFQ and assess performance by collectively considering the spiked proteins.

Step 1: conduct LFQ and assess performance of all possible LFQ workflows or assess performance by collectively considering the spiked proteins.

Note: 'allranks' containing all information of performance assessment, criteria selected and ranking.

Step 2: a heatmap illustrating the performance ranking of all LFQ workflows based on the criteria selected by user.

Note: the 'EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf' would be successfully saved in the current path. Please use 'getwd()' to find the current path!

References

R Package Documentation

Browse R Packages

We want your feedback!

JianboFu0406/EVALFQ111 R Package for Evaluating Label-Free Proteome Quantification

In JianboFu0406/EVALFQ111: R Package for Evaluating Label-Free Proteome Quantification

Introduction

Installation

Usage

1. Conduct LFQ and assess performance of all possible LFQ workflows.

3. Draw heatmap and save as EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf.

5. Conduct LFQ and assess performance by collectively considering the spiked proteins.

Step 1: conduct LFQ and assess performance of all possible LFQ workflows or assess performance by collectively considering the spiked proteins.

Note: 'allranks' containing all information of performance assessment, criteria selected and ranking.

Step 2: a heatmap illustrating the performance ranking of all LFQ workflows based on the criteria selected by user.

Note: the 'EVALFQ-OUTPUT.Figure-Top.XXX.workflows.pdf' would be successfully saved in the current path. Please use 'getwd()' to find the current path!

References

R Package Documentation

Browse R Packages

We want your feedback!

JianboFu0406/EVALFQ111
R Package for Evaluating Label-Free Proteome Quantification