analysis_quickstart | R Documentation |
all-in-one function that covers the vast majority of use-cases of analyzing a dataset imported into MS-DAP. (assuming you already loaded peptide data, sample metadata and fasta files using MS-DAP import functions).
analysis_quickstart(
dataset,
filter_min_detect = 0,
filter_fraction_detect = 0,
filter_min_quant = 0,
filter_fraction_quant = 0,
filter_min_peptide_per_prot = 1,
filter_topn_peptides = 0,
filter_by_contrast = FALSE,
norm_algorithm = c("vsn", "modebetween_protein"),
rollup_algorithm = "maxlfq",
dea_algorithm = c("deqms", "msqrob", "msempire"),
dea_qvalue_threshold = 0.01,
dea_log2foldchange_threshold = 0,
diffdetect_min_peptides_observed = 2,
diffdetect_min_samples_observed = 3,
diffdetect_min_fraction_observed = 0.5,
pca_sample_labels = "auto",
var_explained_sample_metadata = NULL,
multiprocessing_maxcores = NA,
output_abundance_tables = TRUE,
output_qc_report = TRUE,
output_dir,
output_within_timestamped_subdirectory = TRUE,
dump_all_data = FALSE
)
dataset |
a valid dataset object generated upstream by an MS-DAP import function. For instance, import_dataset_skyline() or import_dataset_maxquant_evidencetxt() |
filter_min_detect |
in order for a peptide to 'pass' in a sample group, in how many replicates must it be detected? |
filter_fraction_detect |
in order for a peptide to 'pass' in a sample group, what fraction of replicates must it be detected? |
filter_min_quant |
in order for a peptide to 'pass' in a sample group, in how many replicates must it be quantified? |
filter_fraction_quant |
in order for a peptide to 'pass' in a sample group, what fraction of replicates must it be quantified? |
filter_min_peptide_per_prot |
in order for a peptide to 'pass' in a sample group, how many peptides should be available after detect filters? 1 is default, but 2 can be a good choice situationally (eg; to not rely on proteins with just 1 quantified peptide) |
filter_topn_peptides |
maximum number of peptides to maintain for each protein (from the subset that passes above filters, peptides are ranked by the number of samples where detected and their variation between replicates). |
filter_by_contrast |
should the above filters be applied to all sample groups, or only those tested within each contrast? Enabling this optimizes available data in each contrast, but increases the complexity somewhat as different subsets of peptides are used in each contrast and normalization is applied separately. |
norm_algorithm |
normalization algorithm(s), or provide an empty string to skip normalization. Refer to |
rollup_algorithm |
rollup_algorithm strategy for combining peptides to proteins as used in DEA algorithms that first combine peptides to proteins and then apply statistics, like eBayes and DEqMS. Options: maxlfq, tukey_median, sum. See further documentation for function |
dea_algorithm |
algorithm for differential expression analysis (provide an array of strings to run multiple, in parallel). Refer to |
dea_qvalue_threshold |
threshold for significance of adjusted p-values in figures and output tables. Output tables will also include all q-values as-is |
dea_log2foldchange_threshold |
threshold for significance of log2 foldchanges. Set to zero to disregard or a positive value to apply a cutoff to absolute log2 foldchanges. MS-DAP can also perform a bootstrap analyses to infer a reasonable threshold by setting this parameter to NA |
diffdetect_min_peptides_observed |
for differential detection only; minimum number of peptides that a protein must be detected with in either group (within at least |
diffdetect_min_samples_observed |
for differential detection only; minimum number of samples where a protein should be observed at least once by any of its peptides (in either group) when comparing a contrast of group A vs B. Set to NA to disable differential detection |
diffdetect_min_fraction_observed |
for differential detection only; analogous to |
pca_sample_labels |
whether to use sample names or a numeric ID as labels in the PCA plot. options: "auto" (let code decide, default), "shortname" (use sample shortnames), "index" (auto-generated numeric ID), "index_asis" (same as index option and specifically disable label overlap reduction) |
var_explained_sample_metadata |
optionally, enable variance-explained analysis. This is slow, even for small datasets, and even moreso as the number of experiment metadata grows (so to save time in routine analyses, this is disabled by default). Set to NULL to disable (default), NA to automatically infer column names from |
multiprocessing_maxcores |
optionally, integer parameter to set the maximum number of cores to use when running MSqRob/MSqRobSum DEA algorithms. If other DEA methods are used, this setting doesn't do anything. Set to NA (default) to automatically select all available CPU cores minus 1. For systems with many CPU cores that run into errors related to "socketConnection" or "PSOCK", try limiting this to a lower number (e.g. 8) |
output_abundance_tables |
whether to write peptide- and protein-level data matrices to file. options: FALSE, TRUE |
output_qc_report |
whether to create the Quality Control report. options: FALSE, TRUE . Highly recommended to set to TRUE (default). Set to FALSE to skip the report PDF (eg; to only do differential expression analysis and skip the time-consuming report creation) |
output_dir |
output directory where all output files should be stored. If the provided file path is not an existing directory, it will be created. Optionally, disable the creation of any output files (QC report, DEA table, etc.) by setting this parameter to NA (also overrides the 'dump_all_data' parameter) |
output_within_timestamped_subdirectory |
optionally, automatically create a subdirectory (within output_dir) that has the current date&time as name and store results there. options: FALSE, TRUE |
dump_all_data |
if you're interested in performing custom bioinformatic analyses and want to use any of the data generated by this tool, you can dump all intermediate files to disk. Has performance impact so don't enable by default. options: FALSE, TRUE |
Peptide filter criteria applied to replicate samples within a sample group. params; filter_min_detect, filter_fraction_detect, filter_min_quant, filter_fraction_quant. You only have to provide active filters (but specify at least 1), filters/settings you do not specify don't do anything by default.
Settings: for DDA: at least 1~2 detect (MS/MS ID) and quantified in at least ~75% of replicates. for DIA: detect (confidence score < threshold) in at least ~75% of replicates (because for DIA, you typically have an abundance value in each sample regardless of the identifier confidence score). If there are only 3 replicates, we recommend filtering such that there are at least 3 datapoints to work with in differential expression analysis.
Taken together, recommended settings for a DDA dataset with 3~8 replicates in each sample group look like this;
filter_min_detect = 1 (or zero to fully rely on MBR), filter_fraction_detect = 0.25 (or zero to fully rely on MBR), filter_min_quant = 3, filter_fraction_quant = 0.75
Analogous for DIA;
filter_min_detect = 3, filter_fraction_detect = 0.75
Two distinct approaches to selecting peptides can be used for differential expression analysis: 1) 'within contrast' and 2) 'apply filter to all sample groups'.
Determine within each contrast (eg; group A vs group B) what peptides can be used by applying above peptide filter criteria and then apply normalization to this data subset. Advantaguous in datasets with many groups; this maximizes the number of peptides used in each contrast (eg; let peptide p be observed in groups A and B, not in C. we'd want to use it in A vs B, not in A vs C). As a disadvantage, this complicates interpretation since the exact data used is different in each contrast (slightly different peptides and normalization in each contrast).
Apply above filter criteria to each sample group (eg; a peptide must past these filter rules in every sample group) and then apply normalization
This data matrix is then used for all downstream statistics
Advantage; simple and robust
Disadvantage; potentially miss out on (group-specific) peptides/data-points that may fail filter criteria in just 1 group, particularly in large datasets with 4+ groups
Set filter_within_contrast = FALSE
for this option
Note; if there are just 2 sample groups (eg; WT vs KO), this point is moot as both approaches are the same
normalization algorithms are applied to the peptide-level data matrix.
options: "" (empty string disables normalization), "vsn", "loess", "rlr", "msempire", "vwmb", "modebetween", "modebetween_protein" (this balances foldchanged between sample groups. Highly recommended, see MS-DAP manuscript)
Refer to normalization_algorithms()
function documentation for available options and a brief description of each.
You can combine normalizations by providing an array of options to apply subsequential normalizations.
For instance, norm_algorithm = c("vsn", "modebetween_protein")
applies the vsn algorithm (quite strong normalization reducing variation) and then balances between-group protein-level foldchanges with modebetween normalization.
Benchmarks have shown that c("vwmb", "modebetween_protein") and c("vsn", "modebetween_protein") are the optimal strategies, see MS-DAP manuscript.
Statistical models for differential expression analysis
MSqRob is recommended for most cases; a peptide-level model that is highly sensitive and quite robust. Reference: https://github.com/statOmics/MSqRob
MS-EmpiRe a peptide-level model that works especially well for DDA data. Reference: https://github.com/zimmerlab/MS-EmpiRe
eBayes is robust but conservative, using the limma package to apply moderated t-tests on protein-level abundances. Reference: https://doi.org/doi:10.18129/B9.bioc.limma
options: ebayes, deqms, msempire, msqrob, msqrobsum. Refer to dea_algorithms()
function documentation for available options and a brief description of each.
You can simply apply multiple DEA models in parallel by supplying an array of options. The output of each model will be visualized in the PDF report and data included in the output Excel report.
e.g.; dea_algorithm = c("ebayes", "deqms", "msempire", "msqrob")
dea_algorithms()
and normalization_algorithms()
for available algorithms and documentation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.