metaprotr: R package for post-processing metaproteomics data

License: GPL v3


Set of tools for descriptive analysis of metaproteomics data generated from high-throughput mass spectrometry instruments. These tools allow to cluster peptides and proteins abundance, expressed as spectral counts, and to manipulate them in groups of metaproteins.

This information can be represented using multiple visualization functions to portray the global metaproteome landscape and to differentiate samples or conditions, in terms of abundance of metaproteins, taxonomic levels and/or functional annotation.

The provided tools allow to implement flexible analytical pipelines that can be easily applied to studies interested in metaproteomics analysis.

Application case

Pipeline to analyse the metaproteomes of gut microbiota

A curated R script is available with the detailed instructions to analyse intestinal microbiota.

Data inputs

The required files to use the package are :

  1. Peptide abundances expressed as spectral counts. This file is generated from X!Tandempipeline using an adapted iterative approach described by Bassignani, 2019. Contact PAPPSO for more details. This file should have the first seven columns named:
  2. Group: protein group number, proteins are grouped together if they share at least one peptide
  3. Peptide: a unique reference of the identified peptide
  4. Sequence: peptide sequence
  5. Modifs: textual informations of peptide modifications
  6. MhTheo: theoretical MH+ of the peptide
  7. Charge: list of all possible peptides charges
  8. Subgroup: protein subgroup number, proteins inside a group sharing exactly the same set of peptides (indistinguishable)
  9. The next columns should contain the peptide abundances as spectral counts. The name of the columns should be identical to the content of the column msrunfile from the metadata information.

  10. List of protein names associated to the identified peptides. This file should have eight columns named:

  11. Group: protein group number, proteins are grouped together if they share at least one peptide
  12. Subgroup: protein subgroup number, proteins inside a group sharing exactly the same set of peptides (indistinguishable)
  13. Protein: protein number, a single reference to the protein inside the subgroup
  14. Description: protein information obtained from the fasta database at the stage of identification
  15. Total: total number of spectra per protein
  16. Specific: total number of spectra that are specific to a subgroup of proteins. It is only available if there are more than one subgroup within a group
  17. Specific Unique: number of unique peptide sequence specific to this subgroup of proteins. It is only available if there are more than one subgroup within a group.
  18. SubGroup count: number of subgroups (also known as metaproteins) per group

  19. Metadata information. At least three columns must be present and named as:

  20. SC_name: sample names assigned by the user
  21. msrunfile: name of samples as indicated in the corresponding columns of peptide abundances
  22. SampleID: the content should indicate the experimental groups
  23. Additional columns containing complementary information can be added by the user (ex. replicates, order of injection, etc.). The separation between columns should be indicated by tabulation

  24. Catalog of genes with taxonomic annotations with the following format:

  25. The first column named gene must contain the same identifiers of those present in the column Description from the list of proteins
  26. Another column named organism containing the name of the strain assigned to a given protein
  27. A column named The taxonomic classification can be obtained from a tool of sequences aligment and must be ordered by species, genus, family, order, class, phylum and superkingdom. The characters inside must be concatenated by a comma (ex."Streptococcus anginosus,Streptococcus,Streptococcaceae,Lactobacillales,Bacilli,Firmicutes,Bacteria"). For the application case you can download the Integrated non-redundant Gene Catalog (IGC) 9.9 database. DOI

  28. Functional annotations of genes (optional). The functional annotations from the Kyoto Encyclopedia of Genes and Genomes (KEGG) were added to the IGC 9.9 database. DOI. This file should include two columns named:

  29. gene_name: indicating the same protein names to those present in the gene column from the file with taxonomic annotations
  30. ko: indicating the KEGG Orthology code assigned to a given protein

Alt text


Checkout the documentation and the cheatsheet that displays the available functions on metaprotr.

Contribute to the project

Everybody is welcome to contribute to the metaprotr.

Indicate errors :warning: :bangbang:

If you found an error please describe it in the issues section and address it to the package mantainer.

Please provide the following information: * Summarize the bug encountered concisely. * What is the current bug behavior? * What is the expected correct behavior? * Describe the steps to reproduce it. * Paste logs and/or screenshots. * Add possible fixes.

Add modifications :star: :thumbsup:

To improve, modify or add a new feature/function to the project please follow this procedure:

  1. Create a new branch from "stable" and name it with the feature/function that you will work on.

  2. Make changes and commits to this branch while developing.

When making commits it is recommended to use the following graphical identifiers:

| Identifier | Code | Description | |-----------------------|------------------------:|----------------:| | :lollipop: | : lollipop : | Minor change (ex. comment, renaming) | | :pencil2: | : pencil2 : | New code | | :wrench: | : wrench : | Code refactoring | | :checkered_flag: | : checkered_flag : | code test, check or verification | | :bug: | : bug : | bug detected |


  git commit -m ':pencil2: writing core logic of an awesome function'
  1. Make a pull request to the branch "stable".

Try the metaprotr package in your browser

Any scripts or data that you put into this service are public.

metaprotr documentation built on Feb. 5, 2021, 9:06 a.m.