testDifferentialAbundance: Test Differential Protein Expression in MS proteomics data

View source: R/testDifferentialAbundance.R

testDifferentialAbundanceR Documentation

Test Differential Protein Expression in MS proteomics data

Description

Test Differential Protein Expression in MS proteomics data starting small: From the precursor level.

Usage

testDifferentialAbundance(
  input_dt = "path/to/DIANN_matrix.tsv",
  protein_group_annotation = NULL,
  study_design = "path/to/Study_design_filled.tsv",
  normalize_data = TRUE,
  normalization_function = limma::normalizeQuantiles,
  condition_1 = unique(fread(study_design)$condition)[2],
  condition_2 = unique(fread(study_design)$condition)[1],
  min_n_obs = 4,
  imp_percentile = 0.001,
  imp_sd = 0.2,
  plot_pdf = TRUE,
  write_tsv_tables = TRUE,
  target_protein = "O08760"
)

Arguments

input_dt

Input data table either in tsv/txt format or already in R as data.table or data.frame with the following columns: #' The table should have the following columns:

  • Protein.Group: Semicolon-separated Uniprot IDs (or similar, as long as it matches)

  • Precursor.Id: Unique Precursor Id for which the quantitative values are contained

  • "filename": The file names of the MS raw data, must be identical to the entries in study_design$filename

Note: The data will be log2-transformed internally.

protein_group_annotation

Protein annotation table with columns Protein.Group and Protein.Names (and others if desired) that will be used to annotate the results. By default it is assumed to be a subset of and and an attempt will be made to extract it from the input_dt.

study_design

Study design in tab-separated .txt with mandatory columns:

  • filename: Must match quantitative data-containing column headers in the input_dt

  • condition: String, biological condition (e.g. "treated" and "untreated")

  • replicate: Replicate number (integer). Minimally 3 replicates are needed per condition for this type of analysis.

normalize_data

Whether or not data is scaled/normalized before differential testing. In some cases it might be preferable not to scale the datasets, e.g. when comparing pulldowns vs. input samples! Defaults to TRUE.

normalization_function

Normalization function to use that transforms a matrix of quantities where columns are samples and rows are analytes. Defaults to limma:normalizeQuantiles, but can be replaced with any such function. You may want to try limma::normalizeVSN or limma::normalizeMedianValues.

condition_1

Manual override to the condition 1 for the differential comparison. By default it is guessed from unique(study_design$condition)

condition_2

Manual override to the condition 2 for the differential comparison. By default it is guessed from unique(study_design$condition)

min_n_obs

Minimum number of observations per precursor (number of runs it was identified in) in order to keep in in the analysis

imp_percentile

Percentile of the total distribution of values on which the random distribution for sampling will be centered

imp_sd

standard deviation of the normal distribution from which values are sampled to impute missing values

plot_pdf

Document processing steps in a string of pdf graphs

write_tsv_tables

Write out final quant table with differential expression testing results

target_protein

Optional string with protein identifier to highlight in volcano plots

Value

A diffExpr object (list) containing (access by x$ or by "x[[name]]")

  • data_source: input_dt path or input R object name

  • data_long: Data in long format

  • data_matrix_log2: Data, filtered and log2 transformed, in wide format matrix

  • data_matrix_log2_imp: Data, filtered, log2 transformed and with missing values imputed, in wide format matrix

  • study_design: study design table

  • annotation_col: column annotation

  • diffExpr_result_dt: Result table with intensities and differential expression testing results

  • candidates_condition1: Proteins that appear higher abundant in condition 1

  • candidates_condition2: Proteins that appear higher abundant in condition 2

Author(s)

Moritz Heusel


heuselm/igseqr documentation built on March 19, 2022, 7:28 p.m.