knitr::opts_chunk$set(echo = TRUE) library('EVALFQ') set.seed(135)
The EVALFQ provides an open assess online service enabling (1) the label-free proteome quantification (LFQ) based on three quantification measurements SWATH-MS, Peak Intensity and Spectral Counting, (2) the evaluation of LFQ performances from multiple perspectives and (3) the identification of the optimal LFQs based on comprehensive performance ranking. The EVALFQ mainly includes two function lfq_access
and lfq_spiked
to realize not only AUTOMATICALLY detects the diverse formats of data generated by all quantification software, but also provides the most complete set of processing methods among available tools, which including the methods of transformation, pretreatment (centering, scaling & normalization) and missing value imputation.
This tutorial will walk the readers through an example analysis (as follow 'Examples').
# download the source package of EVALFQ_0.1.0.tar.gz and install it. install.packages(pkg = 'EVALFQ_0.1.0.tar.gz') # Alternatively EVALFQ can be installed from GitHub: # install.packages("devtools") devtools::install_github("idrblab/EVALFQ") library(EVALFQ) # Before install "EVALFQ", please make sure the following dependent packages have been installed: library(Biobase)##BiocManager::install("Biobase") library(BiocGenerics)##BiocManager::install("BiocGenerics") library(ROTS)##BiocManager::install("ROTS") library(limma)##BiocManager::install("limma") library(ProteoMM)##BiocManager::install("ProteoMM") library(impute)##BiocManager::install("impute") library(pcaMethods)##BiocManager::install("pcaMethods") library(vsn)##BiocManager::install("vsn") library(affy)##BiocManager::install("affy") library(metabolomics)##install_github("cran/metabolomics") library(devtools) install_github("idrblab/EVALFQ1") library(EVALFQ)
library(EVALFQ)
```(r) allranks <- lfqevalueall(data_q, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1")
`data_q` This input file should be numeric type except the first and second column containing the names and label (control or case) of the studied samples, respectively. The intensity data should be provided in this input file with the following order: samples in row and proteins/peptides in column. Missing value (NA) of protein intensity are allowed. `assum_a` All proteins were assumed to be equally important.<br> The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite. `assum_b` The level of protein abundance was assumed to be constant among all samples.<br> The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite. `assum_c` The intensities of the vast majority of the proteins were assumed to be unchanged under the studied conditions.<br> The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite. `Ca` Criterion (a): precision of LFQ based on the proteomes among replicates<sup>1</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (a).<br> If set 0, the user excludes Criterion (a) from performance assessment.<br> The default setting of this value is “1”. `Cb` Criterion (b): classification ability of LFQ between distinct sample groups<sup>2</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (b).<br> If set 0, the user excludes Criterion (b) from performance assessment.<br> The default setting of this value is “1”. `Cc` Criterion (c): differential expression analysis by reproducibility-optimization<sup>3</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (c).<br> If set 0, the user excludes Criterion (c) from performance assessment.<br> The default setting of this value is “1”. `Cd` Criterion (d): reproducibility of the identified protein markers among different datasets<sup>4</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (d).<br> If set 0, the user excludes Criterion (d) from performance assessment.<br> The default setting of this value is “1”. #### 2. Conduct LFQ and assess performance by collectively considering the spiked proteins. ```(r) allranks <- lfqspikedall(data_s, spiked, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1", Ce = "1")
data_s
This input file should be numeric type except the first and second column containing the names and label (control or case) of the studied samples, respectively. The intensity data should be provided in this input file with the following order: samples in row and proteins/peptides in column. Missing value (NA) of protein intensity are allowed.
spiked
The file should provide the concentrations of known proteins (such as spiked proteins). This file is required, if the user want to conduct assessment using criteria (e) This file should contain the class of samples and the Sample ID. The Sample ID should be unique and defined by the preference of EVALFQ users, and the class of samples refers to the group of Sample ID. The ID of the spiked proteins should be consistent in both “data_s" and "spiked”. Detail information are described in the online “Example”.
assum_a
All proteins were assumed to be equally important.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.
assum_b
The level of protein abundance was assumed to be constant among all samples.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.
assum_c
The intensities of the vast majority of the proteins were assumed to be unchanged under the studied conditions.
The authors will be asked to input a letter “Y” to indicate the corresponding assumption is held for the studied dataset and a letter “N” to denote the opposite.
Ca
Criterion (a): precision of LFQ based on the proteomes among replicates1.
If set 1, the user chooses to assess LFQ workflows using Criterion (a).
If set 0, the user excludes Criterion (a) from performance assessment.
The default setting of this value is “1”.
Cb
Criterion (b): classification ability of LFQ between distinct sample groups2.
If set 1, the user chooses to assess LFQ workflows using Criterion (b).
If set 0, the user excludes Criterion (b) from performance assessment.
The default setting of this value is “1”.
Cc
Criterion (c): differential expression analysis by reproducibility-optimization3.
If set 1, the user chooses to assess LFQ workflows using Criterion (c).
If set 0, the user excludes Criterion (c) from performance assessment.
The default setting of this value is “1”.
Cd
Criterion (d): reproducibility of the identified protein markers among different datasets4.
If set 1, the user chooses to assess LFQ workflows using Criterion (d).
If set 0, the user excludes Criterion (d) from performance assessment.
The default setting of this value is “1”.
Ce
Criterion (e): accuracy of LFQ based on spiked and background proteins5.
If set 1, the user chooses to assess LFQ workflows using Criterion (e).
If set 0, the user excludes Criterion (e) from performance assessment.
The default setting of this value is “1”.
```(r) lfqvisualize(object, top = 100)
`object` The input is the output file of the `lfq_access` or `lfq_spiked`. `top` The default 'top' value is 100.<br> You can view the top ranking heatmap you want.<br> #### 4. Conduct LFQ and assess performance of one specific LFQ workflow. ```(r) res <- lfqevalupart(data_q, tra = NULL, cen = NULL, sca = NULL, nor = NULL, imp = NULL, Ca = "1", Cb = "1", Cc = "1", Cd = "1")
data_q
Same as the description of the 'lfq_access' above.
tra
All transformation metohds are as follows:
"1": Box–Cox Transformation(BOX).
Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution6.
"2": Log Transformation (LOG).
Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance7.
"3": No transformation method applied (NON).
cen
All centering metohds are as follows:
"1": Mean-Centering (MEC).
Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important 8.
"2": Median-Centering (MDC).
Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important 8.
"3": No centering method applied (NON).
sca
All scaling metohds are as follows:
"1": No scaling method applied (NON).
"2": Auto Scaling (ATO).
Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor 9; assuming all proteins are equally important 8.
"3": Pareto Scaling (PAR).
Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor 10; assuming all proteins are equally important 8.
"4": Vast Scaling (VAS).
Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor 11; assuming all proteins are equally important 8.
"5": Range Scaling (RAN).
Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor 12; assuming all proteins are equally important 8.
nor
All normalization metohds are as follows:
"1": Cyclic Loess (CYC).
Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups 13, 14 and the systematic bias is nonlinearly dependent on the protein abundances 15.
"2": EigenMS (EIG).
Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples 16, 17; Does not require any assumption about the relative strength of signals due to each source of variation 16.
"3": Locally Weighted Scatterplot Smoothing (LOW).
Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances 18, 14 and the systematic bias is nonlinearly dependent on the protein intensities 18.
"4": Median Absolute Deviation (MAD).
Ensuring the comparability of protein intensities among all samples 19; assuming the median level of the protein abundance and the spread of abundances are the same in all samples 20.
"5": Mean Normalization (MEA).
Ensuring the protein abundance values from all studied samples directly comparable with each other 19; assuming the mean level of the protein abundance is constant for all samples 15.
"6": Median Normalization (MED).
Making the protein intensities from all individual samples directly comparable with each other 15, 19; assuming the median level of the protein abundance is constant for all samples 21.
"7": No normalization method applied (NON).
"8": Probabilistic Quotient Normalization (PQN).
Ensuring the comparability of protein intensities among all samples 19; assuming that the majority of the protein intensities does not vary for the studied classes 22.
"9": Quantile Normalization (QUA).
Making the protein intensities from all samples directly comparable with each other 19; assuming that the majority of protein intensity signals are unchanged among samples 18.
"10": Robust Linear Regression (RLR).
Assuming that the intensities of the majority of the proteins are not changed in control and case groups 14, 23 and the systematic bias is linearly dependent on the magnitude of protein abundances 15.
"11": Total Ion Current (TIC).
Making the protein intensities from all samples directly comparable with each other 19; assuming the total area under the protein abundance curve is constant among samples 24.
"12": Trimmed Mean of M Values (TMM).
Ensuring the protein abundance values from all studied samples directly comparable with each other 19; assuming the majority of proteins are not differentially expressed between control and case groups 25.
"13": Variance Stabilization Normalization (VSN).
Having a built-in transformation 26 and making individual observations more directly comparable 27; assuming that most of the proteins across different samples are not differentially expressed 27.
imp
All imputation metohds are as follows:
"1": No Imputation method applied (NON).
"2": Background Imputation (BAK).
Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run 15.
"3": Bayesian Principal Component Imputation (BPC).
Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components 28.
"4": Censored Imputation (CEN).
Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity 15.
"5": K-nearest Neighbor Imputation (KNN).
Finding k most similar proteins (k-nearest neighbors) and using a weighted average over these k proteins to estimate the missing protein values 15, 29.
"6": Singular Value Decomposition (SVD).
Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data 29.
"7": Zero Imputation (ZER).
Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros 15.
Ca
Criterion (a): precision of LFQ based on the proteomes among replicates1.
If set 1, the user chooses to assess LFQ workflows using Criterion (a).
If set 0, the user excludes Criterion (a) from performance assessment.
The default setting of this value is “1”.
Cb
Criterion (b): classification ability of LFQ between distinct sample groups2.
If set 1, the user chooses to assess LFQ workflows using Criterion (b).
If set 0, the user excludes Criterion (b) from performance assessment.
The default setting of this value is “1”.
Cc
Criterion (c): differential expression analysis by reproducibility-optimization3.
If set 1, the user chooses to assess LFQ workflows using Criterion (c).
If set 0, the user excludes Criterion (c) from performance assessment.
The default setting of this value is “1”.
Cd
Criterion (d): reproducibility of the identified protein markers among different datasets4.
If set 1, the user chooses to assess LFQ workflows using Criterion (d).
If set 0, the user excludes Criterion (d) from performance assessment.
The default setting of this value is “1”.
```(r) res <- lfqspikepart(data_s, spiked, tra = NULL, cen = NULL, sca = NULL, nor = NULL, imp = NULL, Ca = "1", Cb = "1", Cc = "1", Cd = "1")
`data_s` Same as the description of the 'lfq_spiked' above. `spiked` Same as the description of the 'lfq_spiked' above. `tra` All transformation metohds are as follows:<br> "1": Box–Cox Transformation(BOX).<br> Making asymmetric data fulfill the normality assumption in a regression model by converting the protein abundances into a more symmetric distribution<sup>6</sup>.<br> "2": Log Transformation (LOG).<br> Converting the distribution of ratios of abundance values of proteins into a more symmetric (almost normal distribution) and minimizing the effect of proteins with extreme abundance<sup>7</sup>.<br> "3": No transformation method applied (NON).<br> `cen` All centering metohds are as follows:<br> "1": Mean-Centering (MEC).<br> Converting all the intensities to fluctuations around zero instead of around the mean of the protein intensities; assuming all proteins are equally important <sup>8</sup>.<br> "2": Median-Centering (MDC).<br> Making all the intensities to fluctuations around zero instead of around the median of the protein intensities; assuming all proteins are equally important <sup>8</sup>.<br> "3": No centering method applied (NON).<br> `sca` All scaling metohds are as follows:<br> "1": No scaling method applied (NON).<br> "2": Auto Scaling (ATO).<br> Adjusting each protein abundance for systematic variance using the standard deviation of each protein of all samples as scaling factor <sup>9</sup>; assuming all proteins are equally important <sup>8</sup>.<br> "3": Pareto Scaling (PAR).<br> Scaling each protein abundance for systematic variance using the square root of the standard deviation of each protein of all samples as scaling factor <sup>10</sup>; assuming all proteins are equally important <sup>8</sup>.<br> "4": Vast Scaling (VAS).<br> Adjusting each protein abundance for systematic variance using the coefficient of variation of each protein of all samples as scaling factor <sup>11</sup>; assuming all proteins are equally important <sup>8</sup>.<br> "5": Range Scaling (RAN).<br> Scaling each protein abundance for systematic variance using the abundance range of each protein of all samples as scaling factor <sup>12</sup>; assuming all proteins are equally important <sup>8</sup>.<br> `nor` All normalization metohds are as follows:<br> "1": Cyclic Loess (CYC).<br> Assuming that the intensities of the vast majority of the proteins are not changed in control and case groups <sup>13, 14</sup> and the systematic bias is nonlinearly dependent on the protein abundances <sup>15</sup>.<br> "2": EigenMS (EIG).<br> Overcoming the problems caused by the heterogeneity in the protein intensities of studied samples <sup>16, 17</sup>; Does not require any assumption about the relative strength of signals due to each source of variation <sup>16</sup>.<br> "3": Locally Weighted Scatterplot Smoothing (LOW).<br> Assuming that the abundances of the majority of the proteins are unchanged under the studies circumstances <sup>18, 14</sup> and the systematic bias is nonlinearly dependent on the protein intensities <sup>18</sup>.<br> "4": Median Absolute Deviation (MAD).<br> Ensuring the comparability of protein intensities among all samples <sup>19</sup>; assuming the median level of the protein abundance and the spread of abundances are the same in all samples <sup>20</sup>.<br> "5": Mean Normalization (MEA).<br> Ensuring the protein abundance values from all studied samples directly comparable with each other <sup>19</sup>; assuming the mean level of the protein abundance is constant for all samples <sup>15</sup>.<br> "6": Median Normalization (MED).<br> Making the protein intensities from all individual samples directly comparable with each other <sup>15, 19</sup>; assuming the median level of the protein abundance is constant for all samples <sup>21</sup>.<br> "7": No normalization method applied (NON).<br> "8": Probabilistic Quotient Normalization (PQN).<br> Ensuring the comparability of protein intensities among all samples <sup>19</sup>; assuming that the majority of the protein intensities does not vary for the studied classes <sup>22</sup>.<br> "9": Quantile Normalization (QUA).<br> Making the protein intensities from all samples directly comparable with each other <sup>19</sup>; assuming that the majority of protein intensity signals are unchanged among samples <sup>18</sup>.<br> "10": Robust Linear Regression (RLR).<br> Assuming that the intensities of the majority of the proteins are not changed in control and case groups <sup>14, 23</sup> and the systematic bias is linearly dependent on the magnitude of protein abundances <sup>15</sup>.<br> "11": Total Ion Current (TIC).<br> Making the protein intensities from all samples directly comparable with each other <sup>19</sup>; assuming the total area under the protein abundance curve is constant among samples <sup>24</sup>.<br> "12": Trimmed Mean of M Values (TMM).<br> Ensuring the protein abundance values from all studied samples directly comparable with each other <sup>19</sup>; assuming the majority of proteins are not differentially expressed between control and case groups <sup>25</sup>.<br> "13": Variance Stabilization Normalization (VSN).<br> Having a built-in transformation <sup>26</sup> and making individual observations more directly comparable <sup>27</sup>; assuming that most of the proteins across different samples are not differentially expressed <sup>27</sup>.<br> `imp` All imputation metohds are as follows:<br> "1": No Imputation method applied (NON).<br> "2": Background Imputation (BAK).<br> Assuming that the protein values are missing because of having small concentrations in the sample and thus cannot be detected during the MS run <sup>15</sup>.<br> "3": Bayesian Principal Component Imputation (BPC).<br> Imputing based on the variational Bayesian framework that does not force orthogonality between the principal components <sup>28</sup>.<br> "4": Censored Imputation (CEN).<br> Imputing the lowest intensity values in the dataset by assuming that the missing of protein values is because of being below detection capacity <sup>15</sup>.<br> "5": K-nearest Neighbor Imputation (KNN).<br> Finding <i>k</i> most similar proteins (<i>k</i>-nearest neighbors) and using a weighted average over these <i>k</i> proteins to estimate the missing protein values <sup>15, 29</sup>.<br> "6": Singular Value Decomposition (SVD).<br> Applying this imputation method to the data to obtain sets of mutually orthogonal expression patterns of all proteins in the data <sup>29</sup>.<br> "7": Zero Imputation (ZER).<br> Imputing the missing intensities of the studied proteins by directly replacing these missing values with a number of zeros <sup>15</sup>.<br> `Ca` Criterion (a): precision of LFQ based on the proteomes among replicates<sup>1</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (a).<br> If set 0, the user excludes Criterion (a) from performance assessment.<br> The default setting of this value is “1”. `Cb` Criterion (b): classification ability of LFQ between distinct sample groups<sup>2</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (b).<br> If set 0, the user excludes Criterion (b) from performance assessment.<br> The default setting of this value is “1”. `Cc` Criterion (c): differential expression analysis by reproducibility-optimization<sup>3</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (c).<br> If set 0, the user excludes Criterion (c) from performance assessment.<br> The default setting of this value is “1”. `Cd` Criterion (d): reproducibility of the identified protein markers among different datasets<sup>4</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (d).<br> If set 0, the user excludes Criterion (d) from performance assessment.<br> The default setting of this value is “1”. `Ce` Criterion (e): accuracy of LFQ based on spiked and background proteins<sup>5</sup>.<br> If set 1, the user chooses to assess LFQ workflows using Criterion (e).<br> If set 0, the user excludes Criterion (e) from performance assessment.<br> The default setting of this value is “1”. ## Examples ```r load("../data/my_spiked.rda") dim(my_spiked) head(my_spiked[1:4]) load("../data/spiked_data.rda") dim(spiked_data) head(spiked_data[1:5])
```(r)
allranks <- lfqevalueall(data_q = my_df, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1")
OR
allranks <- lfqspikedall(data_s = my_spiked, spiked = spiked_data, assum_a = "Y", assum_b = "Y", assum_c = "Y", Ca = "1", Cb = "1", Cc = "1", Cd = "1", Ce = "1")
```r load("../data/allranks.rda") head(allranks)
```(r)
lfqvisualize(object = allranks, top = 100)
```(r) # Users can also use EVALFQ by selecting one specific LFQ as follows: res <- lfqevalupart(data_q = my_df, tra = "1", cen = "1", sca = "1", nor = "1", imp = "1", Ca = "1", Cb = "1", Cc = "1", Cd = "1") OR res <- lfqspikepart(data_s = my_spiked, spiked = spiked_data, tra = "1", cen = "1", sca = "1", nor = "1", imp = "1", Ca = "1", Cb = "1", Cc = "1", Cd = "1") Note: please select the appropriate number code represents transformation, centering, scaling, normalization, imputation methods (See above details).
Should you have any questions, please contact Jianbo Fu at fujianbo@zju.edu.cn
Kuharev J, Navarro P, Distler U, et al. In-depth evaluation of software tools for data-independent acquisition based label-free quantification. Proteomics 2015;15:3140–3151.
Griffin NM, Yu J, Long F, et al. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis. Nat Biotechnol 2010;28:83–89.
Risso D, Ngai J, Speed TP, et al. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 2014;32:896–902.
Wang X, Gardiner EJ, Cairns MJ. Optimal consistency in microRNA expression analysis using reference-gene-based normalization. Mol Biosyst 2015;11:1235–1240.
Navarro P, Kuharev J, Gillet LC, et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat Biotechnol 2016;34:1130–1136.
Lo, K., and Gottardo, R. (2012) Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: An alternative to the skew-t distribution. Stat. Comput. 22, 33–52.
Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Webb-Robertson, B. J., Smith, R. D., and Lipton, M. S. (2006) Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 5, 277–286.
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., and van der Werf, M. J. (2006) Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics 7, 142.
Wang, S. Y., Kuo, C. H., and Tseng, Y. J. (2013) Batch normalizer: A fast total abundance regression calibration method to simultaneously adjust batch and injection order effects in liquid chromatography/time-of-flight mass spectrometry-based metabolomics data and comparison with current calibration methods. Anal. Chem. 85, 1037–1046.
Wang, X., Zhang, A., Han, Y., Wang, P., Sun, H., Song, G., Dong, T., Yuan, Y., Yuan, X., Zhang, M., Xie, N., Zhang, H., Dong, H., and Dong, W. (2012) Urine metabolomics analysis for biomarker discovery and detection of jaundice syndrome in patients with liver disease. Mol. Cell. Proteomics 11, 370 –380.
Di Guida, R., Engel, J., Allwood, J. W., Weber, R. J., Jones, M. R., Sommer, U., Viant, M. R., and Dunn, W. B. (2016) Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 12, 93.
Smilde, A. K., van der Werf, M. J., Bijlsma, S., van der Werff-van der Vat, B. J., and Jellema, R. H. (2005) Fusion of mass spectrometry-based metabolomics data. Anal. Chem. 77, 6729 – 6736.
Cox, J., Hein, M. Y., Luber, C. A., Paron, I., Nagaraj, N., and Mann, M. (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526.
Ballman, K. V., Grill, D. E., Oberg, A. L., and Therneau, T. M. (2004) Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778 –2786.
Va¨ likangas, T., Suomi, T., and Elo, L. L. (2018) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform. 19, 1–11.
Leek, J. T., and Storey, J. D. (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724 –1735.
Karpievitch, Y. V., Taverner, T., Adkins, J. N., Callister, S. J., Anderson, G. A., Smith, R. D., and Dabney, A. R. (2009) Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics 25, 2573–2580.
Adriaens, M. E., Jaillard, M., Eijssen, L. M., Mayer, C. D., and Evelo, C. T. (2012) An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies. BMC Genomics 13, 42.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., and Lindon, J. C. (2006) Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78, 2262–2267.
Fundel, K., Haag, J., Gebhard, P. M., Zimmer, R., and Aigner, T. (2008) Normalization strategies for mRNA expression data in cartilage research. Osteoarthritis Cartilage 16, 947–955.
De Livera, A. M., Dias, D. A., De Souza, D., Rupasinghe, T., Pyke, J., Tull, D., Roessner, U., McConville, M., and Speed, T. P. (2012) Normalizing and integrating metabolomics data. Anal. Chem. 84, 10768 –10776.
Tobin, J., Walach, J., de Beer, D., Williams, P. J., Filzmoser, P., and Walczak, B. (2017) Untargeted analysis of chromatographic data for green and fermented rooibos: Problem with size effect removal. J. Chromatogr. A 1525, 109 –115.
Wang, B., Wang, X. F., and Xi, Y. (2011) Normalizing bead-based microRNA expression data: A measurement error model-based approach. Bioinformatics 27, 1506 –1512.
Smolinska, A., Hauschild, A. C., Fijten, R. R., Dallinga, J. W., Baumbach, J., and van Schooten, F. J. (2014) Current breathomics—A review on data pre-processing techniques and machine learning in metabolomics breath analysis. J. Breath Res. 8, 027105.
Branson, O. E., and Freitas, M. A. (2016) A multi-model statistical approach for proteomic spectral count quantitation. J. Proteomics 144, 23–32.
Rausch, T. K., Schillert, A., Ziegler, A., Lu¨ king, A., Zucht, H. D., and Schulz-Knappe, P. (2016) Comparison of pre-processing methods for multiplex bead-based immunoassays. BMC Genomics 17, 601.
Lin, S. M., Du, P., Huber, W., and Kibbe, W. A. (2008) Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acids Res. 36, e11.
Stacklies, W., Redestig, H., Scholz, M., Walther, D., and Selbig, J. (2007) pcaMethods—A bioconductor package providing PCA methods for incomplete data. Bioinformatics 23, 1164 –1167.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520 –525.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.