PEIMAN2-vignette
In PEIMAN2: Post-Translational Modification Enrichment, Integration, and Matching Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1 Introduction

The annotation enrichment analysis increases the chance of identifying relevant biological features in a list of genes or proteins. The post translational enrichment, integration, and matching analysis (PEIMAN v1) software was introduced to provide a systematic framework to identify more probable and enriched post-translational modification (PTM) terms in a list of proteins obtained from high-throughput technologies [@nickchi2015]. PEIMAN maps a large list of proteins to PTM keywords and test for their statistical significance, using a hypergeometric test. PEIMAN uses the most traditional way of enrichment analysis, by getting a list of proteins selected by user, and search for enriched PTM terms one by one. This strategy is called Singular Enrichment Analysis or SEA. Although this is a very promising approach for identifying biological features, the quality of selected list by researcher can potentially affect results at the end of the analysis.

To avoid this problem, we extend our enrichment framework to a wider class of enrichment analysis called Gene Set Enrichment Analysis or GSEA [@Subramanian2005]. The underlying idea of GSEA is very similar to SEA. Instead of applying a cutoff on input genes obtained from micro array experiments (either p-value or fold-change in gene expression), a 'no-cutoff' strategy is considered. The immediate benefits of this approach is to reduce the bias of gene selection and include genes with a low change in their expression level to participate in final analysis. The maximum value of the running score profile for ranked genes in each enrichment category is then calculated and compared with random scores obtained from permutation. More details on [@Subramanian2005]. This framework can be expanded to enrichment analysis in proteins. Inspired by GSEA idea, we here introduce a package in R for Protein Set Enrichment Analysis (PSEA).

The database in PEIMAN package updates monthly according to changes in UniProt. The package can be used to perform singular enrichment analysis (SEA) and visualize the results. PEIMAN can also be used to match and integrate results of two SEA analysis (for the same species) by visualizing their common PTM terms. To correct for biases in SEA, we implement protein set enrichment analysis (PSEA) as a new tool for computational community. Researchers can use this package to run PSEA and visualize the results.

Figure1: Our suggested workflow for a PTM-centric proteomics using PEIMAN software v2.0 {width=75%}

2 Example data

We consider two example datasets to demonstrate the features of our package.

1) exmplData1: We use the first example data for single enrichment analysis. This dataset contains two list of human proteins randomly selected from UniProt. The first list contains 45 proteins and the second list contains of 97 randomly selected proteins. Both lists belongs to Homo Sapiens (Human). Note: Only the first six proteins in each list are shown below.

knitr::kable( head(PEIMAN2::exmplData1$pl1), col.names = '', caption = '')
knitr::kable( head(PEIMAN2::exmplData1$pl2), col.names = '', caption = '')

2) exmplData2: We will use the second dataset to perform protein set enrichment analysis or PSEA. The dataset is described in [@gholizadeh2021].

knitr::kable(PEIMAN2::exmplData2[1:6,], caption='beatAML dataset samples')

3 Singular Enrichment analysis (SEA)

In this section, we introduce the functions related to singular enrichment analysis or SEA in PEIMAN2 package. The functions in this section are divided into two parts, functions for enrichment and functions for plotting. We use exmplData1 in this part.

3.1 Enrichment

runEnrichment() function can be used to run singular enrichment analysis for one list of protein. This function takes the following inputs:

protein which is a character vector with protein UniProt accession codes.
os.name which is a character vector of length one with exact taxonomy name of species.
p.adj.method which is pvalue adjustment methos and optional. By default the value is set to 'BH'. To see a possible list of values, type p.adjust.methods in R console.

As it was mentioned, the taxonomy name of species must be provided, e.g for a list of proteins belongs to human we pass os.name as 'Homo sapiens (Human)'. The list is available at UniProt website. We also included a helper function named getTaxonomyName to help getting the exact taxonomy name. More on this function later.

The following lines of code illustrate the steps to run SEA on exmplData1. In runEnrichment function, we pass pl1 (a character vector of UniProt accession code) to perform SEA as follows and save the results in enrich1.

# Load PEIMAN2 package
library(PEIMAN2)

# Extract dataset and assign a variable name to it
pl1 <- exmplData1$pl1

# Run SEA on the list
enrich1 <- runEnrichment(protein = pl1, os.name = 'Homo sapiens (Human)')

The function returns a dataframe with the following columns:

PTM: Post-translational modification (PTM).
Freq in Population: The total number of proteins with this PTM in UniProt (or background list).
Freq in Sample: The total number of proteins with this PTM in the list.
Sample: Number of proteins in the given list.
Population: Total number of proteins in the current version of PEIMAN database (or background list).
pvalue: The p-value obtained from hypergeometric test (enrichment analysis).
corrected pvalue: Adjusted p-value to correct for multiple testing.
AC: Uniprot accession code (AC) of proteins with each PTM.

knitr::kable( head(enrich1), format = 'html')

Note: As it was mentioned, the os.name is the exact taxonomy name of species that you are working with. The name should be exactly the same as UniProt definition. To facilitate searching for this name, you can pass your protein list with UniProt accession ID to getTaxonomyName function as follows. The result is the exact taxonomy name of protein list that you need to pass to runEnrichment. In the following example, the exact taxonomy name is printed:

getTaxonomyName(x = exmplData1$pl1)

Similarly, we can run SEA for the second list of proteins:

# Extract dataset and assign a variable name to it
pl2 <- exmplData1$pl2

# Run SEA on the list
enrich2 <- runEnrichment(protein = pl2, os.name = 'Homo sapiens (Human)')

knitr::kable( head(enrich2), format = 'html')

3.2 Plotting SEA results

The plotEnrichment function can be used to visualize singular enrichment analysis for one set of proteins or match, analyse, and integrate results for two sets of proteins. To read more about this match and integration, please read details at [@nickchi2015]. We start by plotting the results for the firs list.

plotEnrichment(x = enrich1, sig.level = 0.05)

The results is a Lollipop plot which presents "Relative frequency" of each "PTM keywords" along with their corrected p-value measured in log scale. Note that only significant PTMs are shown. The default value for significance level is 5 percent. One can also visualize and match the results of two enrichment. For example, we can see the integrated results of enrich1 and enrich2 by the following line of code:

plotEnrichment(x = enrich1, y = enrich2, sig.level = 0.05)

The plot presents the 'Relative frequency' of common PTM terms among two enriched list (x and y). The coloring is the corrected p-value measured in log scale. By default a significance level of 5 percent is set to filter results. This can be modified by sig.level parameter.

4 Protein set enrichment analysis (PSEA)

In this section, we introduce the functions for protein set enrichment analysis (PSEA). The functions in this section are divided into two parts, functions for PSEA and functions for plotting the results. We use exmplData2 in this part.

4.1 PSEA

In order to run protein set enrichment analysis (PSEA), you can use runPSEA function. This function takes the following inputs:

protein: A character vector with protein UniProt accession.
os.name: A character of length one for the exact name of organism name.
pexponent: Enrichment weighting exponent, p. The default value is 1. For values of p < 1, one can detect incoherent patterns in a set of protein. If one expects a small number of proteins to be coherent in a large set, then p > 1 is a good choice.
nperm: Number of permutation to adjust for multiple testing in different PTM terms. Default is 1000.
p.adj.method: The adjustment method to correct for multiple testing. Run p.adjust.methods to get a list of possible methods.
sig.level: The significance level to filter PTM terms (applies to adjusted p-value), 0.05 is the default value.
minSize: PTM terms with a lower number of proteins than minSize are excluded. The default value is one.

psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 1000)

The result is a list with 6 elements. The first element of this list is important: A dataframe with protein set enrichment analysis (PSEA) results. Every row corresponds to a post-translational modification (PTM) term with the following columns:

pval: p-value for singular enrichment analysis.
pvaladj: Adjusted p-value
ES: Enrichment score
NES: Enrichmnt score normalized to mean enrichment of random samples of the same size.
nMoreExtreme: Number of times the permuted sample resulted in profile with ES larger than abs(ES original)
size: Number of proteins with a particular PTM term.
Enrichment: Whether the proteins with a certain PTM term have been enriched in the list.
leadingEdge: UniProt accession code of leading edge proteins that drive the enrichment.

knitr::kable(psea_res[[1]], format = 'html')

4.2 Plotting

We now introduce the plotting features for protein set enrichment analysis. Two functions are included to visualize PSEA results returned from runPSEA function. The first plot is generated by plotPSEA function and shows Normalized Enrichment Score (NES) for each PTM term. User can restrict the number of PTM terms to draw based by adjusting sig.level parameter (default value is 0.05). The coloring of the plot indicates if the PTM term is enriched or not.

plotPSEA(x = psea_res)

The second plot is generated by plotRunningScore function. A running enrichment score plot for each PTM can be plotted.

plotRunningScore(x = psea_res)

5. Translate PEIMAN results for Mass spectrometry searching tools

In addition to the introduced features and extensions from previous version, the results from PEIMAN can also be utilized in Mass spectrometry searching tools. The enriched PTM terms in list of proteins generated by runPSEA function in the previous step can be searched in subset of protein modifications database. psea2mass function takes PSEA results and a significant level (default value is 0.05) and returns protein modification of statistically significant PTM terms for later searches in mass spectrometry tools. Note that p-values obtained from permutation is used to identify significant PTM terms. As an example on how the function works, continuing from exmplData2 for PSEA, we call psea2mass function as follows:

MS <- psea2mass(x = psea_res, sig.level = 0.05)
MS

Note that list of proteins generated by runEnrichment function can be passed to sea2mass function too.

References

Any scripts or data that you put into this service are public.

PEIMAN2 documentation built on April 11, 2025, 6:12 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PEIMAN2
Post-Translational Modification Enrichment, Integration, and Matching Analysis

PEIMAN2-vignette
In PEIMAN2: Post-Translational Modification Enrichment, Integration, and Matching Analysis

1 Introduction

2 Example data

3 Singular Enrichment analysis (SEA)

3.1 Enrichment

3.2 Plotting SEA results

4 Protein set enrichment analysis (PSEA)

4.1 PSEA

4.2 Plotting

5. Translate PEIMAN results for Mass spectrometry searching tools

References

Try the PEIMAN2 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PEIMAN2 Post-Translational Modification Enrichment, Integration, and Matching Analysis

PEIMAN2-vignette In PEIMAN2: Post-Translational Modification Enrichment, Integration, and Matching Analysis

1 Introduction

2 Example data

3 Singular Enrichment analysis (SEA)

3.1 Enrichment

3.2 Plotting SEA results

4 Protein set enrichment analysis (PSEA)

4.1 PSEA

4.2 Plotting

5. Translate PEIMAN results for Mass spectrometry searching tools

References

Try the PEIMAN2 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PEIMAN2
Post-Translational Modification Enrichment, Integration, and Matching Analysis

PEIMAN2-vignette
In PEIMAN2: Post-Translational Modification Enrichment, Integration, and Matching Analysis