knitr::opts_chunk$set(echo = TRUE)
Installing the package in a new environment may take a long time. If the installation fails, please post a new issue here.
To install rMAUPS, you have to first install conda following the document.
You should also install r, r-recommended. If you have R installed before, you need to ensure libgfortran, libnetcdf and libxml2 are also installed in your conda environment. Besides, it seems important to have r-data.table and r-rcpparmadillo installed through conda before biocmanager installs dependencies (such as DESeq2).
$ conda install -c r r r-markdown r-recommended $ conda install -c anaconda libgfortran libnetcdf libxml2 $ conda install -c conda-forge pandoc r-data.table r-rcpparmadillo
install.packages(c("devtools", "BiocManager"), repos = "https://cloud.r-project.org") # Install dependencies BiocManager::install(c("ggpubr", "metap", "ggrepel", "GSVA", "DESeq2", "limma", "impute", "biomaRt", "msigdbr", "BiocStyle", "msmsTests")) # Install rMAUPS from github devtools::install_github("WubingZhang/rMAUPS")
The environment should be OK if you can load the required packages successfully.
library(ggplot2) library(rMAUPS)
The rMAUPS package includes two real LC-MS/MS data files ending with "export_proteins.txt", which are exported from the Proteome Discoverer software. Here, we will take the two datasets as an example to describe how to analyze the data using the rMAUPS pipeline. Before running the pipeline, the data can be preprocessed into a tidy format using the function normalizeProteomeDiscoverer
.
datapath = system.file("extdata", package = "rMAUPS") list.files(datapath, pattern = "export_proteins")
normalizeProteomeDiscoverer(datapath, output = "./", log2 = TRUE)
normdata = normalizeProteomeDiscoverer(file.path(datapath, "experiment1_export_proteins.txt"), log2 = TRUE, return = TRUE) head(normdata)
After preprocessing the datasets, you can run rMAUPS pipeline quickly. The pipeline requires a metadata, which configs the path to the datasets, list of samples and their experimental conditions, and design matrix of the comparisons.
rMAUPS includes a metadata as an example, you can read the file metadata.csv
and check the format of the metadata.
metadata = read.csv(file.path(datapath, "metadata.csv")) head(metadata)
After configuring the metadata, it's ready to run the pipeline using one-line command.
MAUPSr(metadata, outdir = "analysis/") ## Or MAUPSr(system.file("extdata", "metadata.csv", package = "rMAUPS"), outdir = "analysis/")
To better display all the results, we developed a mini shiny app, which includes all the rMAUPS results in a webpage. You can open it by using function view
.
view(outdir = "analysis/")
Input the path to rMAUPS results, e.g. "analysis/" here, click submit
, then all the figure results will be loaded on the webpage. It take seconds to load all the figures, please be patient after clicking submit
.
Besides the quick run of rMAUPS pipeline, you can also perform step-by-step analysis using functions in rMAUPS. You can perform quality control using the function ProteomicsQC
, normalize the proteomics data using normalizeProteomics
, impute the missing values using imputeNA
, perform differetial analysis using DEAnalyze
, and test the differential abundance of protein complexes or pathways using DeComplex
.
To give an example about the quality control and imputation, we randomly assigned 10% values to be NA in the data.
data = as.matrix(normdata[,-1]) meta = metadata[metadata$Experiment=="experiment1_normdata.csv", -1] rownames(meta) = meta[,1] simulated = data idx = sample(1:length(simulated), round(0.1*length(simulated))) simulated[idx] = NA
qc = ProteomicsQC(simulated, condition = meta[colnames(data), 2], proj.name = "TestQC") qc$p1 qc$p2 qc$p4 qc$p5 qc$p6 qc$p7
rMAUPS provides a function, normalizeProteomics
, to normalize proteomics data. You can easily normalize the data using multiple optional methods, such as median normalization, median ratio normalization, z-score normalization, quatile normalization and loess normalization.
normalized = normalizeProteomics(simulated, norm = "median", log2 = FALSE)
imputed = imputeNA(normalized) plot(imputed[idx], data[idx])
After normalization and imputation, you can perform the quality control analysis again.
rMAUPS provides a integrated function DEAnalyze
to perform differential expression analysis for both RNA-seq data and proteomics data. For RNA-seq data, type = "RNAseq", method = "DESeq2"
is recommended; for label-free proteomics, type = "msms", method = "msms.edgeR"
is suggested; for isobaric labeling-based relative quantification of prpteomics, type = "msms", method = "limma"
is preferred.
deres = DEAnalyze(data, meta[,-1], type = "msms", method = "limma") ## Visualize the results deres$logFDR = log10(deres$padj) ScatterView(deres, x = "log2FC", y = "logFDR", x_cut = c(-0.5,0.5), y_cut = -2, groups = c("bottomleft", "bottomright"), top = 5)
For proteomics data analysis, the protein complex level analysis is informative. So we design a function DeComplex to combine the differential abundance of proteins into differential level of protein complexes or biological pathways.
res = DeComplex(deres) head(res$deComplex) res$gobp.p res$reactome.p res$gocc.p res$corum.p
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.