merge_proteingroups_by_gene: Merge protein identifiers that map to the same unique set of...

merge_proteingroups_by_geneR Documentation

Merge protein identifiers that map to the same unique set of gene symbols

Description

Suppose there are 3 proteingroups that have the respective unique genes:

  1. GRIA2

  2. GRIA2

  3. GRIA2;GRIA1

Then the latter proteingroup is an ambiguous proteingroup where respective peptides were matched to 2 genes (by the upstream software that generated the dataset that you imported with e.g. import_dataset_diann()). This function will merge all peptides that belong to proteingroups 1 and 2 into one new proteingroup (that uniquely maps to GRIA2). Proteingroup 3 will remain a distinct proteingroup; only proteingroups with the exact same gene symbol set are matched (technical detail: this uses the definitions in dataset$proteins$gene_symbols_or_id).

This function will fully update the dataset's protein table and the protein_id column in the peptide table.

In most use-cases, this function should be used immediately after import_fasta() as shown in the example.

Usage

merge_proteingroups_by_gene(dataset)

Arguments

dataset

a valid dataset. Prior to calling this function, you must import protein metadata from FASTA using the import_fasta() function

Examples

## Not run: 
# example for using this function. You'll need to update the file paths accordingly
library(msdap)
dataset = import_dataset_diann("C:/data/diann_report.tsv")
dataset = import_fasta(dataset, files = "C:/data/uniprot_human_2020-01.fasta")
dataset = merge_proteingroups_by_gene(dataset)

## End(Not run)

ftwkoopmans/msdap documentation built on March 5, 2025, 12:15 a.m.