aggregate_pep: Aggregate peptide abundances to protein abundances
In m-jahn/R-tools: Utility and wrapper functions for bioinformatics work

View source: R/aggregate_pep.R

aggregate_pep

R Documentation

Aggregate peptide abundances to protein abundances

Description

Similar to the openMS module ProteinQuantifier, this function provides different methods to aggregate peptide intensities to their parent proteins. It is mainly intended for the use with (raw) Diffacto results, a table of peptide intensities and covariation scores (weights) that can be used to filter peptides before aggregating them up to protein abundances.

Usage

aggregate_pep(
  data,
  sample_cols,
  protein_col,
  peptide_col,
  n_protein_col = NULL,
  split_ambiguous = FALSE,
  split_char = NULL,
  weight_col = NULL,
  weight_threshold = 0.5,
  method = "sum"
)

Arguments

`data`	the input data frame
`sample_cols`	(character) columns to be used for peptide aggregation
`protein_col`	(character) column containing unique protein IDs/names
`peptide_col`	(character) column containing unique peptide IDs/sequences
`n_protein_col`	(character) column containing number of proteins annotated for this peptide. THis column indicates ambiguous peptides whose abundance are shared between n proteins.
`split_ambiguous`	(logical) if those protein groups should be split into individual proteins or not
`split_char`	(character) character by which to split protein groups
`weight_col`	(character) the column containing weights or covariance scores
`weight_threshold`	(numeric) covariance score (weight) cutoff, Diffacto's default is 0.5
`method`	(character) aggregation method, one of ('sum', 'weightedsum', 'mean', 'weightedmean', 'wgeomean'). The default is 'sum'

Value

a data frame with aggregated protein intensities, one protein at a row

Examples

# load additional dependencies
library(dplyr)
library(tidyr)

# generate data frame
df <- data.frame(
  protein = c("A", "B", "C", "C/D", "C/D/E", "E", "F", "G"),
  n_protein = c(1,1,1,2,3,1,1,1),
  weight = rep(1,8),
  peptide = letters[1:8],
  ab1 = sample(1:100, 8),
  ab2 = sample(1:100, 8),
  ab3 = sample(1:100, 8)
)

aggregate_pep(
  data = df,
  sample_cols = c("ab1", "ab2", "ab3"),
  protein_col = "protein",
  peptide_col = "peptide",
  n_protein_col = "n_protein",
  split_ambiguous = TRUE,
  split_char = "/",
  method = "sum"
)

m-jahn/R-tools documentation built on Feb. 5, 2023, 1:05 p.m.