analysis_quickstart: Quickstart for analyses in this pipeline
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

analysis_quickstart

R Documentation

Quickstart for analyses in this pipeline

Description

all-in-one function that covers the vast majority of use-cases of analyzing a dataset imported into MS-DAP. (assuming you already loaded peptide data, sample metadata and fasta files using MS-DAP import functions).

Usage

analysis_quickstart(
  dataset,
  filter_min_detect = 0,
  filter_fraction_detect = 0,
  filter_min_quant = 0,
  filter_fraction_quant = 0,
  filter_min_peptide_per_prot = 1,
  filter_topn_peptides = 0,
  filter_by_contrast = FALSE,
  norm_algorithm = c("vsn", "modebetween_protein"),
  rollup_algorithm = "maxlfq",
  dea_algorithm = c("deqms", "msqrob", "msempire"),
  dea_qvalue_threshold = 0.01,
  dea_log2foldchange_threshold = 0,
  diffdetect_min_peptides_observed = 2,
  diffdetect_min_samples_observed = 3,
  diffdetect_min_fraction_observed = 0.5,
  pca_sample_labels = "auto",
  var_explained_sample_metadata = NULL,
  multiprocessing_maxcores = NA,
  output_abundance_tables = TRUE,
  output_qc_report = TRUE,
  output_dir,
  output_within_timestamped_subdirectory = TRUE,
  dump_all_data = FALSE
)

Arguments

`dataset`	a valid dataset object generated upstream by an MS-DAP import function. For instance, import_dataset_skyline() or import_dataset_maxquant_evidencetxt()
`filter_min_detect`	in order for a peptide to 'pass' in a sample group, in how many replicates must it be detected?
`filter_fraction_detect`	in order for a peptide to 'pass' in a sample group, what fraction of replicates must it be detected?
`filter_min_quant`	in order for a peptide to 'pass' in a sample group, in how many replicates must it be quantified?
`filter_fraction_quant`	in order for a peptide to 'pass' in a sample group, what fraction of replicates must it be quantified?
`filter_min_peptide_per_prot`	in order for a peptide to 'pass' in a sample group, how many peptides should be available after detect filters? 1 is default, but 2 can be a good choice situationally (eg; to not rely on proteins with just 1 quantified peptide)
`filter_topn_peptides`	maximum number of peptides to maintain for each protein (from the subset that passes above filters, peptides are ranked by the number of samples where detected and their variation between replicates).
`filter_by_contrast`	should the above filters be applied to all sample groups, or only those tested within each contrast? Enabling this optimizes available data in each contrast, but increases the complexity somewhat as different subsets of peptides are used in each contrast and normalization is applied separately.
`norm_algorithm`	normalization algorithm(s), or provide an empty string to skip normalization. Refer to `normalization_algorithms()` function documentation for available options and a brief description of each. Provide an array of options to run each algorithm consecutively, for instance; c("vsn", "modebetween_protein") to first apply vsn normalization and then correct between-group ratios such that the protein-level log2-foldchange mode is zero
`rollup_algorithm`	rollup_algorithm strategy for combining peptides to proteins as used in DEA algorithms that first combine peptides to proteins and then apply statistics, like eBayes and DEqMS. Options: maxlfq, tukey_median, sum. See further documentation for function `rollup_pep2prot()`
`dea_algorithm`	algorithm for differential expression analysis (provide an array of strings to run multiple, in parallel). Refer to `dea_algorithms()` function documentation for available options and a brief description of each. To use a custom DEA function, provide the respective R function name as a string (see GitHub documentation on custom DEA functions for more details)
`dea_qvalue_threshold`	threshold for significance of adjusted p-values in figures and output tables. Output tables will also include all q-values as-is
`dea_log2foldchange_threshold`	threshold for significance of log2 foldchanges. Set to zero to disregard or a positive value to apply a cutoff to absolute log2 foldchanges. MS-DAP can also perform a bootstrap analyses to infer a reasonable threshold by setting this parameter to NA
`diffdetect_min_peptides_observed`	for differential detection only; minimum number of peptides that a protein must be detected with in either group (within at least `diffdetect_min_samples_observed`) in order to be included in the differential detection z-score results. Set to NA to disable differential detection
`diffdetect_min_samples_observed`	for differential detection only; minimum number of samples where a protein should be observed at least once by any of its peptides (in either group) when comparing a contrast of group A vs B. Set to NA to disable differential detection
`diffdetect_min_fraction_observed`	for differential detection only; analogous to `diffdetect_min_samples_observed`, but here you can specify the fraction of samples where a protein needs to be detected in either group (within the respective contrast). default; 0.5 (50% of samples)
`pca_sample_labels`	whether to use sample names or a numeric ID as labels in the PCA plot. options: "auto" (let code decide, default), "shortname" (use sample shortnames), "index" (auto-generated numeric ID), "index_asis" (same as index option and specifically disable label overlap reduction)
`var_explained_sample_metadata`	optionally, enable variance-explained analysis. This is slow, even for small datasets, and even moreso as the number of experiment metadata grows (so to save time in routine analyses, this is disabled by default). Set to NULL to disable (default), NA to automatically infer column names from `dataset@samples` to be used, or provide an array of column names from `dataset@samples` to be used (e.g. `c("group","batch","sex")`)
`multiprocessing_maxcores`	optionally, integer parameter to set the maximum number of cores to use when running MSqRob/MSqRobSum DEA algorithms. If other DEA methods are used, this setting doesn't do anything. Set to NA (default) to automatically select all available CPU cores minus 1. For systems with many CPU cores that run into errors related to "socketConnection" or "PSOCK", try limiting this to a lower number (e.g. 8)
`output_abundance_tables`	whether to write peptide- and protein-level data matrices to file. options: FALSE, TRUE
`output_qc_report`	whether to create the Quality Control report. options: FALSE, TRUE . Highly recommended to set to TRUE (default). Set to FALSE to skip the report PDF (eg; to only do differential expression analysis and skip the time-consuming report creation)
`output_dir`	output directory where all output files should be stored. If the provided file path is not an existing directory, it will be created. Optionally, disable the creation of any output files (QC report, DEA table, etc.) by setting this parameter to NA (also overrides the 'dump_all_data' parameter)
`output_within_timestamped_subdirectory`	optionally, automatically create a subdirectory (within output_dir) that has the current date&time as name and store results there. options: FALSE, TRUE
`dump_all_data`	if you're interested in performing custom bioinformatic analyses and want to use any of the data generated by this tool, you can dump all intermediate files to disk. Has performance impact so don't enable by default. options: FALSE, TRUE

Filtering

Peptide filter criteria applied to replicate samples within a sample group. params; filter_min_detect, filter_fraction_detect, filter_min_quant, filter_fraction_quant. You only have to provide active filters (but specify at least 1), filters/settings you do not specify don't do anything by default.

Settings: for DDA: at least 1~2 detect (MS/MS ID) and quantified in at least ~75% of replicates. for DIA: detect (confidence score < threshold) in at least ~75% of replicates (because for DIA, you typically have an abundance value in each sample regardless of the identifier confidence score). If there are only 3 replicates, we recommend filtering such that there are at least 3 datapoints to work with in differential expression analysis.

Taken together, recommended settings for a DDA dataset with 3~8 replicates in each sample group look like this;

filter_min_detect = 1 (or zero to fully rely on MBR), filter_fraction_detect = 0.25 (or zero to fully rely on MBR), filter_min_quant = 3, filter_fraction_quant = 0.75

Analogous for DIA;

filter_min_detect = 3, filter_fraction_detect = 0.75

Filter within contrast vs using all groups

Two distinct approaches to selecting peptides can be used for differential expression analysis: 1) 'within contrast' and 2) 'apply filter to all sample groups'.

Determine within each contrast (eg; group A vs group B) what peptides can be used by applying above peptide filter criteria and then apply normalization to this data subset. Advantaguous in datasets with many groups; this maximizes the number of peptides used in each contrast (eg; let peptide p be observed in groups A and B, not in C. we'd want to use it in A vs B, not in A vs C). As a disadvantage, this complicates interpretation since the exact data used is different in each contrast (slightly different peptides and normalization in each contrast).
Apply above filter criteria to each sample group (eg; a peptide must past these filter rules in every sample group) and then apply normalization

This data matrix is then used for all downstream statistics

Advantage; simple and robust

Disadvantage; potentially miss out on (group-specific) peptides/data-points that may fail filter criteria in just 1 group, particularly in large datasets with 4+ groups

Set filter_within_contrast = FALSE for this option

Note; if there are just 2 sample groups (eg; WT vs KO), this point is moot as both approaches are the same

Normalization

normalization algorithms are applied to the peptide-level data matrix. options: "" (empty string disables normalization), "vsn", "loess", "rlr", "msempire", "vwmb", "modebetween", "modebetween_protein" (this balances foldchanged between sample groups. Highly recommended, see MS-DAP manuscript) Refer to normalization_algorithms() function documentation for available options and a brief description of each.

You can combine normalizations by providing an array of options to apply subsequential normalizations.

For instance, norm_algorithm = c("vsn", "modebetween_protein") applies the vsn algorithm (quite strong normalization reducing variation) and then balances between-group protein-level foldchanges with modebetween normalization.

Benchmarks have shown that c("vwmb", "modebetween_protein") and c("vsn", "modebetween_protein") are the optimal strategies, see MS-DAP manuscript.

Differential Expression Analysis

Statistical models for differential expression analysis

MSqRob is recommended for most cases; a peptide-level model that is highly sensitive and quite robust. Reference: https://github.com/statOmics/MSqRob

MS-EmpiRe a peptide-level model that works especially well for DDA data. Reference: https://github.com/zimmerlab/MS-EmpiRe

eBayes is robust but conservative, using the limma package to apply moderated t-tests on protein-level abundances. Reference: https://doi.org/doi:10.18129/B9.bioc.limma

options: ebayes, deqms, msempire, msqrob, msqrobsum. Refer to dea_algorithms() function documentation for available options and a brief description of each.

You can simply apply multiple DEA models in parallel by supplying an array of options. The output of each model will be visualized in the PDF report and data included in the output Excel report. e.g.; dea_algorithm = c("ebayes", "deqms", "msempire", "msqrob")

ftwkoopmans/msdap
Mass Spectrometry Downstream Analysis Pipeline

analysis_quickstart: Quickstart for analyses in this pipeline
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

Quickstart for analyses in this pipeline

Description

Usage

Arguments

Filtering

Filter within contrast vs using all groups

Normalization

Differential Expression Analysis

See Also

Related to analysis_quickstart in ftwkoopmans/msdap...

R Package Documentation

Browse R Packages

We want your feedback!

ftwkoopmans/msdap Mass Spectrometry Downstream Analysis Pipeline

analysis_quickstart: Quickstart for analyses in this pipeline In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline

Quickstart for analyses in this pipeline

Description

Usage

Arguments

Filtering

Filter within contrast vs using all groups

Normalization

Differential Expression Analysis

See Also

Related to analysis_quickstart in ftwkoopmans/msdap...

R Package Documentation

Browse R Packages

We want your feedback!

ftwkoopmans/msdap
Mass Spectrometry Downstream Analysis Pipeline

analysis_quickstart: Quickstart for analyses in this pipeline
In ftwkoopmans/msdap: Mass Spectrometry Downstream Analysis Pipeline