knitr::opts_chunk$set(tidy = FALSE,message = FALSE)
library("BiocStyle") BiocStyle::markdown()
suppressPackageStartupMessages(library("OmicsEV")) suppressPackageStartupMessages(library("R.utils")) suppressPackageStartupMessages(library("dplyr")) suppressPackageStartupMessages(library("kableExtra")) suppressPackageStartupMessages(library("formattable"))
High-throughput technologies such as RNA-Seq and mass spectrometry-based
proteomics are increasingly being applied to large sample cohorts, which
creates vast amount of quantitative data for genes and proteins. Many algorithms,
software, and pipelines have been developed to analyze these data. However,
how to select optimal algorithms, software, and parameters for analyzing a
specific omics dataset remains a significant challenge. To address this
challenge, we have developed an R package named OmicsEV
, which is dedicated to
compare and evaluate different data matrices generated from the same omics
dataset using different tools, algorithms, or parameter settings. In OmicsEV
,
we have implemented more than 15 evaluation metrics and all the evaluation
results are included in an HTML-report for intuitive browsing. OmicsEV is easy
to install and use. Only one function is needed to perform the whole evaluation
process. A GUI based on R shiny is also implemented.
A few examples can be downloaded at https://github.com/bzhanglab/OmicsEV. One of the examples contains 6 data matrices generated from the same RNA dataset using different normalization methods. In addition, a proteomics data matrix and a sample list are also included. How to run this example is shown below.
The two major inputs files are the omics data tables and a sample annotation file. More details can be found below.
In OmicsEV
, Only one function (run_omics_evaluation) is needed to
perform the whole evaluation process. An example is showing below:
library(OmicsEV) run_omics_evaluation(data_dir = "datasets/", sample_list = "sample_list.tsv", x2 = "protein.tsv", cpu=6, data_type="gene", class_for_ml="sample_ml.tsv")
In general, only a few parameters have to be set:
example_data <- read.delim(system.file("extdata/example_input_datasets.tsv", package = "OmicsEV"), stringsAsFactors = FALSE) kable(example_data,digits = 3,caption="An example of input dataset") %>% kable_styling(bootstrap_options = "striped", full_width = F)
example_data <- read.delim(system.file("extdata/example_sample_list.tsv", package = "OmicsEV"), stringsAsFactors = FALSE) kable(example_data,digits = 3,caption="An example of sample list") %>% kable_styling(bootstrap_options = "striped", full_width = F)
All other parameters are optional. When input data tables for parameter
data_dir are protein expression data and users also have gene expression
data for the same samples, users can set parameter x2 as a file contains
the gene expression data in tsv format, and vice versa. If parameter x2 is
not NULL, sample wise and gene wise correlation analysis will be performed. See ?run_omics_evaluation
for a more in-depth description of all its arguments.
The parameter class_for_ml is also set in above example. This parameter is used to specify the class information for class prediction. A sample list file or a character vector such as class_for_ml=c("T","C") is supported. If this is a sample list file, it must have the same format with the parameter "sample_list". This is useful when the class users want to predict is different from the one in the file for parameter "sample_list". OmicsEV uses an R S3 data class object to store data table and sample annotation data so it also needs to have batch and order as this is format requirement although order and batch are not used in class prediction. This file can be modified from the file for parameter "sample_list" by only updating the class to what users want for class prediction. If users want to predict the class present in the file for parameter "sample_list", then only a character vector to specify the class name is needed, such as class_for_ml=c("T","C"). If sample class prediction is not needed, then don't set anything to the parameter class_for_ml.
When the function is finished successfully, an HTML-based report that contains different evaluation metrics will be generated. Example reports are available at https://github.com/bzhanglab/OmicsEV.
So far, more than 15 evaluation metrics have been implemented in OmicsEV
and
the evaluation result is organized in the following structure:
A few example evaluation reports are available at https://github.com/bzhanglab/OmicsEV.
All software and respective versions used to produce this document are listed below.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.