add_kegg: add_kegg

Description Usage Arguments Value Examples

View source: R/metaprotr_add_kegg.R

Description

Integrates a database containing the functional annotation of the identified metaproteins into a list defined as "spectral_count_object". The proteins from the “spectral_count_object” must contain taxonomic information. The functional annotation was obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology database. This database contains the molecular functions represented in terms of functional orthologs (KO terms). Check KEGG for more details.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
add_kegg(
  spectral_count_object,
  annotation_db,
  taxonomic_db,
  metaproteome_origin,
  protein_file,
  peptide_file,
  text_to_filter = "HUMAN",
  taxonomic_levels_allowed = 1
)

Arguments

spectral_count_object

List defined as "spectral_count_object" containing the abundance of the elements (groups, subgroups or peptides) expressed as spectral counts and organized by taxonomic levels. The format of this object is similar to that generated from the function "crumble_taxonomy".

annotation_db

Dataframe containing the functional annotation of the proteins. This dataframe must contain two variables: i) "gene_name": indicating the same protein names to those present in the variable "Accession" from the "peptides_proteins", third dataframe in the list defined as "spectral_count_object"; and, ii) "ko": indicating the KEGG Orthology code assigned to a given protein. An example can be found in this repository.

taxonomic_db

Dataframe containing the taxonomic information for each protein. The first column must contain the same identifiers of those present in the column "Accession" from the dataframe "peptides_proteins" of the "metaproteome_object". Two additional columns have to be present: i) one named "organism" containing the name of the strain assigned to a given protein; and ii) the other named "species.genus.family.order.class.phylum.superkingdom". The taxonomic classification can be obtained from a tool of sequences aligment and must be ordered as follows: species, genus, family, order, class, phylum and superkingdom. The characters inside must be concatenated by a comma without spaces (ex. "Streptococcus anginosus,Streptococcus,Streptococcaceae,Lactobacillales,Bacilli,Firmicutes,Bacteria"). An example can be found in this repository.

metaproteome_origin

List defined as "metaproteome_object" generated from the function 'load_protspeps'.

protein_file

Character indicating the location of a txt file containing the list of proteins generated in X!TandemPipeline using an adapted iterative approach described by Bassignani, 2019. Separation between columns should be indicated by tabulation. For more details regarding data input check format examples.

peptide_file

Character indicating the location of a txt file containing peptides abundances expressed as spectral counts. This file is generated from X!TandemPipeline using an adapted iterative approach described by Bassignani, 2019. Separation between columns should be indicated by tabulation. For more details regarding data input check format examples.

text_to_filter

Character containig a part of text to be searched in the "Description" of the protein file. All the elements containing this character will be removed. The default value was set to "HUMAN".

taxonomic_levels_allowed

Numeric value indicating the maximal number of taxonomic levels allowed per spectral group or subgroup (in function of the type of spectral data). The default value is set to 1.

Value

A list defined as "spectral_count_object" with the functional annotation added to the identified proteins. A new column is added to the dataframe "peptides_proteins". Two quality control plot are also generated, one with the number of taxonomic entities per spectral level and another with the number of KO terms per spectral level.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 

# Download functional and taxonmical annotation db: https://zenodo.org/record/3997093#.X0UYI6Zb_mE 
meta99_full_taxo <- read.csv2("full_taxonomy_MetaHIT99.tsv", header= TRUE, sep="\t")
kegg_db <- read.csv2("hs_9_9_igc_vs_kegg89.table", header = TRUE, sep = "\t")

# Files with spectral abundance and proteins list from X!Tandempipeline
protein_file <- "your/specific/location/protein_list.txt"
peptide_file <- "your/specific/location/peptide_counting.txt"
metadata_file <- "your/location/metadata.csv"

metaproteome_origin <- load_protspeps(protein_file, peptide_file, metadata_file)

SCsgp_species <- crumble_taxonomy(SC_subgroups, "species")

SCsgp_species_annot <- add_kegg(
  SCsgp_species, 
  kegg_db, 
  meta99_full_taxo, 
  metaproteome_origin, 
  protein_file, 
  peptide_file, 
  text_to_filter = "HUMAN"
)


## End(Not run)

metaprotr documentation built on Feb. 5, 2021, 9:06 a.m.