options(width = 2000)

OVERVIEW

artMS is a Bioconductor package that provides a set of tools for the analysis and integration of large-scale proteomics (mass-spectrometry-based) datasets obtained using the popular proteomics software MaxQuant.

The functions available in artMS can be grouped into the following categories:

Click here for details about all the functions available in artMS.

For a graphical overview check the slides presented at the 2021 online workshop of the Association of Biomolecular Resource Facilities (ABRF)

What's new?

Check the repo NEWS file to be up to date with the new features, improvements, bug fixes affecting the package.

How to install

Bioconductor

artMS version >= 1.10.1 had many changes to adjust for changes in MSstats. This version requires:

# From bioconductor:
BiocManager::install(c("ComplexHeatmap", "org.Mm.eg.db"))

# From CRAN:
install.packages(c("factoextra", "FactoMineR", "gProfileR", "PerformanceAnalytics"))

Extra: Why Bioconductor? Here you can find a nice summary of many good reasons).

Development version from Github (unstable)

Assuming that you have an R (>= 4.1) version running on your system, follow these steps:

install.packages("devtools")
library(devtools)
install_github("biodavidjm/artMS")

Once installed, the package can be loaded and attached to your current workspace as follows:

library(artMS)

Input files

r Biocpkg("artMS") performs the different analyses taking as input the following files:

Check below to find out more about generating the input files.

Configuration file

[//]: # (Having a large number of arguments in an R function or pipeline can make the task of providing them a cumbersome endeavor.)

artmsQuantification() requires a large number of arguments, specially those related to the statistical package r Biocpkg("MSstats"). To facilite the task of providing all those arguments, the function artmsQuantification() takes a config file (in yaml format) for the customization of the parameters for quantification (using MSstats) and other operations, including QC analyses, charts, and annotations.

A configuration file template can be generated by running artmsWriteConfigYamlFile()

Check below to learn the details of the configuration file.

Basic workflows

Proteomics

Metabolomics (unstable)

r Biocpkg("artMS") also enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that the metabolites do not need to have an ID, as the m/z and retention time will be used as identifiers. Typical workflow:

Please, keep in mind that most of the functions available in artMS don't work for metabolomics data due to annotation issues (protein/gene ids are the primary ids for most of the functions). Check the metabolomics section to find out more.

REQUIRED INPUT FILES

IMPORTANT

Before you begin, please, set the working directory. artMS will work from that working directory. For example:

setwd("/path/to/my/working_directory/project_proteomics/")

Most of the artMS functions will create sub-folders to output files assuming that this working directory has been set and will try to find all the required files in this working directory.

Input files

Three basic (tab-delimited) files are required to perform the full pack of operations:

evidence.txt

The output of the quantitative proteomics software package MaxQuant. It combines all the information about the identified peptides.

keys.txt

Tab delimited file generated by the user. It summarizes the experimental design of the evidence file. artMS merges the keys.txt and evidence.txt by the "RawFile" column. Each RawFile corresponds to a unique individual experimental technical replicate / biological replicate / Condition / Run.

For any basic label-free proteomics experiment, the keys file must contain the following columns and rules:

RawFile|IsotopeLabelType|Condition|BioReplicate|Run -----|-----|-----|-----|----- qx006145|L|Cal_33|Cal_33-1|1 qx006146|L|Cal_33|Cal_33-2|2 qx006151|L|HSC6|HSC6-1|3 qx006152|L|HSC6|HSC6-2|4

For more examples, check the artMS data object artms_data_ph_keys

Tip: it is recommended to use Microsoft Excel (OpenOffice Cal / or similar) to generate the keys file. Do not forget to choose the format = Tab Delimited Text (.txt) when saving the file (use save as option)

contrast.txt

The comparisons between conditions that the user wants to quantify.

HSC6-Cal_33
WT_DRUG_A-WT_A549
WT_DRUG_B-WT_A549
WT_DRUG_A-WT_DRUG_B

Requirements:

As a result of the quantification, the condition on the left will take the positive log2FC sign -if the protein is more abundant in condition on the left (numerator), and the condition on the right the negative log2FC -if a protein is more abundant in condition on the right term (denominator).

Example of wrong comparisons

Only condition names are allowed. Individual Bioreplicates cannot be compared. For example, this is wrong:

# WRONG:
HSC6-Cal_33-1

The artMS configuration file


IMPORTANT

Before you begin, please, set the working directory. artMS will work from that working directory. For example:

setwd("/path/to/my/working_directory/project_proteomics/")

Most of the artMS functions will create sub-folders to output files assuming that this working directory has been set and will try to find all the required files in this working directory.


The configuration file (in yaml format) contains a variety of options available for the QC, quantification, and annotations performed by artMS.

To generate a sample configuration file, go to the project folder (setwd(/path/to/my/working_directory/project_proteomics/)) and execute:

library(artMS)
artmsWriteConfigYamlFile(config_file_name = "my_config.yaml")

Open the my_config.yaml file with your favorite editor (RStudio for example).

Note: Although the configuration file might look complex, the default options work very well.

The configuration (yaml) file contains the following sections:

Section: files

Assuming that your working directory (e.g. setwd(/path/to/my/working_directory/project_proteomics/)) has the following structure:

`-- data
    |-- projectx-contrasts.txt
    |-- projectx-evidence.txt
    `-- projectx-keys.txt

The files section of the configuration file should look like this:

files :
  evidence : data/projectx-evidence.txt
  keys : data/projectx-keys.txt
  contrasts : data/projectx-contrast.txt
  summary: data/projectx-summary.txt # Optional
  output : results_folder/projectx--results.txt # this will be created

Notice that in this example all the input files are located in the data/ folder, however, for the results file, an extra folder has been added in the output section of the configuration file example (results_folder): artMS will create that folder structure (no need to create it before hand) and will save the results files (and all the extra outputs) in that folder. This means that you could write another completely different folder for the output (e.g. "results_folder2/other-name-results.txt") and artMS will create the folder for you.


Section: qc

qc:
  basic: 1 # 1 = yes; 0 = no
  extended: 1 # 1 = yes; 0 = no
  extendedSummary: 0 # 1 = yes; 0 = no

Select to perform both 'basic' and 'extended' quality control based on the evidence.txt file or 'extendedSummary' based on the summary.txt file. Read below to find out more about the details of each type of analysis.

Section: data

data:
  enabled : 1 # 1 = yes; 0 = no
  silac: 
    enabled : 0 # 1 for SILAC experiments
  filters: 
    enabled : 1
    contaminants : 1
    protein_groups : remove # remove, keep
    modifications : AB # PH, UB, AC, AB, APMS
  sample_plots : 1 # correlation plots

Let's break it down data:


Section: msstats

Section updated in artMS version > 1.10.1. It allows the user to customize all the arguments of the MSstats dataProcess function for running the quantification. This new version of the msstats section is fully compatible with the previous version. However, we recommend to use the latest version of the configuration file.

msstats :
  enabled: 1 
  msstats_input:
  profilePlots: none 
  normalization_method: equalizeMedians  
  normalization_reference:
  summaryMethod: TMP 
  MBimpute: 1 
  feature_subset: all 
  n_top_feature: 3 
  logTrans: 2 
  remove_uninformative_feature_outlier: FALSE 
  min_feature_count: 2
  equalFeatureVar: TRUE
  censoredInt: NA
  remove50missing: FALSE
  fix_missing: NULL
  maxQuantileforCensored: 0.999
  use_log_file: TRUE
  append: FALSE
  log_file_path: NULL

Let's break down the most important arguments:

For all the other parameters, please, check the documentation for the dataProcess function of MSstats.


Section: output_extras

  enabled : 1 # if 0, won't process anything on this section
  annotate :  
    enabled: 1 
    species : HUMAN
  plots:
    volcano: 1
    heatmap: 1
    LFC : -0.58 0.58 # Range of minimal log2fc
    FDR : 0.05 # adjusted p-value, false discovery rate
    heatmap_cluster_cols : 0
    heatmap_display : log2FC # log2FC or pvalue

Extra actions to perform based on the MSstats results, including annotations and plots (heatmaps and volcano plots). Let's break it down:

Special case: Protein fractionation

To handle protein fractionation experiments, one additional column "Fraction" must be added to the keys.txt file with the information about fractions. For example:

Raw.file|IsotopeLabelType|Condition|BioReplicate|Run|Fraction :-----:|:-----:|:-----:|:-----:|:-----:|:-----: S9524_Fx1|L|AB|AB-1|1|1 S9524_Fx2|L|AB|AB-1|1|2 S9524_Fx3|L|AB|AB-1|1|3 S9524_Fx4|L|AB|AB-1|1|4 S9524_Fx5|L|AB|AB-1|1|5 S9524_Fx6|L|AB|AB-1|1|6 S9524_Fx7|L|AB|AB-1|1|7 S9524_Fx8|L|AB|AB-1|1|8 S9524_Fx9|L|AB|AB-1|1|9 S9524_Fx10|L|AB|AB-1|1|10 S9525_Fx1|L|AB|AB-2|2|1 S9525_Fx2|L|AB|AB-2|2|2 S9525_Fx3|L|AB|AB-2|2|3 S9525_Fx4|L|AB|AB-2|2|4 S9525_Fx5|L|AB|AB-2|2|5 S9525_Fx6|L|AB|AB-2|2|6 S9525_Fx7|L|AB|AB-2|2|7 S9525_Fx8|L|AB|AB-2|2|8 S9525_Fx9|L|AB|AB-2|2|9 S9525_Fx10|L|AB|AB-2|2|10 S9526_Fx1|L|AB|AB-3|3|1 S9526_Fx2|L|AB|AB-3|3|2 S9526_Fx3|L|AB|AB-3|3|3 S9526_Fx4|L|AB|AB-3|3|4 S9526_Fx5|L|AB|AB-3|3|5 S9526_Fx6|L|AB|AB-3|3|6 S9526_Fx7|L|AB|AB-3|3|7 S9526_Fx8|L|AB|AB-3|3|8 S9526_Fx9|L|AB|AB-3|3|9 S9526_Fx10|L|AB|AB-3|3|10

Deprecated: In previous versions of artMS (v <= 1.9), the config.yaml file contained an additional fractions section that had to be activated as follow:

fractions: 
  enabled : 1 # 1 for protein fractions, 0 otherwise

This option is not longer required, as artMS will use the "Fraction" of the keys file to detect that multiple fractions are available.

Special case: SILAC

One of the most widely used techniques that enable relative protein quantification is stable isotope labeling by amino acids in cell culture (SILAC). The keys.txt file can capture the typical SILAC experiment. The following example shows a SILAC experiment with two conditions, two biological replicates, and two technical replicates:

RawFile|IsotopeLabelType|Condition|BioReplicate|Run :-----:|:-----:|:-----:|:-----:|:-----: QE20140321-01|H|iso|iso-1|1 QE20140321-02|H|iso|iso-1|2 QE20140321-04|L|iso|iso-2|3 QE20140321-05|L|iso|iso-2|4 QE20140321-01|L|iso_M|iso_M-1|1 QE20140321-02|L|iso_M|iso_M-1|2 QE20140321-04|H|iso_M|iso_M-2|3 QE20140321-05|H|iso_M|iso_M-2|4

It is also required to activate the silac option in the yaml file as follows:

silac: 
  enabled : 1 # 1 for SILAC experiments

QUALITY CONTROL

artMS provides 3 functions to perform QC analyses.

Basic QC (evidence.txt-based)

The basic quality control analysis takes as input both the evidence.txt and keys.txt files and generates several QC plots exploring different aspects of the MS data. Run it as follows:

artmsQualityControlEvidenceBasic(
  evidence_file = artms_data_ph_evidence,
  keys_file = artms_data_ph_keys,
  prot_exp = "PH")

The following pdf can be generated:

Check ?artmsQualityControlEvidenceBasic() to find out more options. Remember: by default, all the plots are printed to a pdf file by running:

artmsQualityControlEvidenceBasic( 
    evidence_file = artms_data_ph_evidence,
    keys_file = artms_data_ph_keys,
    prot_exp = "PH", 
    plotPTMSTATS = TRUE,
    plotINTDIST = FALSE, plotREPRO = FALSE,
    plotCORMAT = FALSE, plotINTMISC = FALSE,
    printPDF = FALSE, verbose = FALSE)

Extended QC (evidence.txt-based)

It takes as input the evidence.txt and keys.txt files as follows:

artmsQualityControlEvidenceExtended(
   evidence_file = artms_data_ph_evidence,
   keys_file = artms_data_ph_keys)

and generates the following QC plots:

Examples: printing plotTYPE, plotPEPICV, and plotPCA plots only:

artmsQualityControlEvidenceExtended(
  evidence_file = artms_data_ph_evidence,
  keys_file = artms_data_ph_keys,
  plotPCA = TRUE,
  plotTYPE = TRUE,
  plotPEPTIDES = TRUE,
  plotPSM = FALSE,
  plotIONS = FALSE,
  plotPEPTOVERLAP = FALSE,
  plotPROTEINS = FALSE,
  plotPROTOVERLAP = FALSE,
  plotPIO = FALSE,
  plotCS = FALSE,
  plotME = FALSE,
  plotMOCD = FALSE,
  plotPEPICV = FALSE,
  plotPEPDETECT = FALSE,
  plotPROTICV = FALSE,
  plotPROTDETECT = FALSE,
  plotIDoverlap = FALSE,
  plotSP = FALSE,
  printPDF = FALSE,
  verbose = FALSE)

Extended QC (summary.txt based)

It requires two files:

artmsQualityControlSummaryExtended(summary_file = "summary.txt",
                                    keys_file = artms_data_ph_keys)

It generates the following pdf plots:

RELATIVE QUANTIFICATION

The relative quantification is a fundamental step in the analysis of MS data. artMS facilitates and simplifies the analysis using MSstats, a fantastic statistical package for the relative quantification of Mass-Spectrometry based proteomics.

All the options and parameters required to run a relative quantification analysis using MSstats (in addition to other options) are summarized in artMS through a configuration file in .yaml format. Check the input-files section to find out more about each of the options.

Different types of proteomics experiments can be quantified including changes in global protein abundance (AB), affinity purification mass spectrometry (APMS), and different type of posttranslational modifications, including phosphorylation (PH), ubiquitination (UB), and acetylation (AC).

artMS also enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that artMS does not require an ID for the metabolites, as the m/z and retention time will be combined and used as identifiers.

Quantification of Changes in Global Protein Abundance

The quantification of changes in protein abundance between different conditions requires to fill up the following sections of the config file:

files:
  evidence : /path/to/the/evidence.txt
  keys : /path/to/the/keys.txt
  contrasts : /path/to/the/contrast.txt
  output : /path/to/the/output/results_ptm_global/results.txt
  .
  .
  .
data:
  .
  .
  .
  filters:
    modifications : AB 

The remaining options can be left unmodified (and run the default parameters). Then run the following artMS function:

artmsQuantification(
  yaml_config_file = '/path/to/config/file/artms_ab_config.yaml')

Quantification of Changes in Global Phosphorylation, Ubiquitination, Acetylation (or any PTM)

Warning: This quantification is only possible for experiments that have used methods to enrich for the modified peptides (e.g. phosphorylation) prior to the mass spectrometry analysis.

The global PTM quantification analysis calculates changes of the PTM at the protein level. This means that all the modified peptides for every protein are used to quantify changes in protein phosphorylation, ubiquitination, or acetylation between different conditions. The site-specific analysis (explained next) would quantify changes at the site level, i.e., each modified peptide for every PTM site is / are quantified independently between the different conditions (one or more different peptides could be detected for the same PTM)

Only two sections need to be filled up using the default r Biocpkg("artMS", vignette="input-files.html", label="configuration") file:

files:
  evidence : /path/to/the/evidence.txt
  keys : /path/to/the/keys.txt
  contrasts : /path/to/the/contrast.txt
  output : /path/to/the/output/results_ptm_global/results.txt
  .
  .
  .
data:
  .
  .
  .
  filters:
    modifications : PH

The remaining options can be left unmodified.

Once the configuration yaml file is ready, run the following command:

artmsQuantification(
  yaml_config_file = '/path/to/config/file/artms_phglobal_config.yaml')

PTM-Site/Peptide-specific Quantification of Changes (PH, UB, AC)

Warning: This quantification is only possible for experiments that have used methods to enrich phosphopeptides or ubiquitinated peptides prior to the mass spectrometry analysis.

Abbreviations:

  • PH = Protein phosphorylation
  • UB = Protein Ubiquitination
  • AC = Protein Acetylation
  • PTM:XXX:yy : User defined PTM (any PTM supported by MaxQuant). Replace XXX with 1 or more 1-letter amino acid codes on which to find modifications (all uppercase). Replace yy with modification name used within the evidence file (require lowercase characters). Example: PTM:STY:ph will find modifications on aa S,T,Y with this example format _AAGGAPS(ph)PPPPVR_

The site-specific analysis quantifies changes at the modified peptide level. This means that changes in every modified (PH, UB, AC, or PTM) peptide of a given protein will be quantified individually. The caveat is that the proportion of missing values should increase in general relative to a typical non-PTM global analysis (protein global abundance quantification). Both sites and global ptm analysis are highly correlated due to the usually only one or two peptides drive the overall changes in PTMs for every protein.

To run a site/peptide specific analysis follow these steps:

  1. Important pre-processing step on the evidence file to enable the ptm-site/peptide-specific analysis. This step takes any of the Proteins id columns selected by the user (either Leading razor protein, Leading protein, or Proteins) and re-annotates it to incorporate the ptm-site/peptide-specific information. By default, this function converts the column Leading razor protein. This step is computational expensive, which means that it might take several minutes to finish (depending on the size of the fasta database, evidence file, computer power, etc)

It also requires the same reference proteome (fasta sequence database) used for the MaxQuant search.

For phosphorylation:

artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", 
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/output/ph-sites-evidence.txt", 
  mod_type = "PH")

As a result, the IDs in the "Leading razor protein" column will contain site/peptide-specific notation. For example:

Before: P12345 After: P12345_S23_S45

For ubiquitination:

artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", 
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/output/ub-sites-evidence.txt", 
  mod_type = "UB")

For acetylation:

artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", 
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/output/ac-sites-evidence.txt", 
  mod_type = "AC")

For all PTMS supported by MaxQuant (in addition to PH, AC, UB):

Example with PH and UB:

# Phosphopeptide in evidence file: `_AAGGAPS(ph)PPPPVR_`
artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", 
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/output/ac-sites-evidence.txt", 
  mod_type = "PTM:STY:(ph)")

# Ubiquitinated peptide: `_AAASK(gl)LGEFAK_`
artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", 
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/output/ac-sites-evidence.txt", 
  mod_type = "PTM:K:(gl)")

Tip: How to re-annotate all the Protein columns on the same file.

By default, artmsProtein2SiteConversion doesn't allow to overwrite the evidence.txt file for security reasons (you don't want to lose the evidence file if something goes wrong). To overwrite the evidence file the argument overwrite_evidence must be turned on (overwrite_evidence = TRUE).

If the column_name argument is not used, artmsProtein2SiteConversion converts the Leading razor protein column, which is used in the quantification step when protein_groups : remove is selected (default). However, if protein_groups : keep is used, artMS will use the Proteins column. To convert the Proteins column to the site/peptide-specific notation, then add the argument column_name = "Proteins".

To annotate both columns of the same file, first generate the "site-evidence.txt" file, and then use this same output file as the evidence_file and activate overwrite.evidence = TRUE.

In summary, to annotate both the "Leading razor protein" and Proteins columns follow these steps:

# Convert 'Leading razor protein' evidence's file column
artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/evidence.txt", # ORIGINAL
  column_name = "Leading razor protein",
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/phsites-evidence.txt", # SITES VERSION
  mod_type = "PH")

# Convert 'Proteins' evidence's file column
artmsProtein2SiteConversion(
  evidence_file = "/path/to/the/phsites-evidence.txt", # <- USE SITES VERSION
  column_name = "Proteins",
  overwrite_evidence = TRUE, # <--- TURN ON
  ref_proteome_file = "/path/to/the/reference_proteome.fasta", 
  output_file = "/path/to/the/phsites-evidence.txt", # <- SITES VERSION
  mod_type = "PH")
  1. Generate a new configuration file (phsites_config.yaml or ubsites_config.yaml) as explained above, but using the "new" sites-evidence.txt file instead of the original evidence.txt file:
files:
  evidence : /path/to/the/evidence-site.txt
  keys : /path/to/the/keys.txt
  contrasts : /path/to/the/contrast.txt
  output : /path/to/the/output/results_ptmSITES/sites-results.txt # <- this one
  .
  .
  .
data:
  .
  .
  .
  filters:
    modifications : PH # <- Don't forget this one.

Once the new yaml file has been created, execute:

artmsQuantification(
  yaml_config_file = '/path/to/config/file/phsites_config.yaml')

Output files

The files generated after succesfully running artmsQuantification are (based on MSstats documentation):

TXT (tab delimited) files

Plots (pdf)

ANALYSIS OF QUANTIFICATIONS

Before running this function, the following packages must be installed on your system:

  • From bioconductor:
BiocManager::install(c("ComplexHeatmap", "org.Mm.eg.db"))
  • From CRAN:
install.packages(c("factoextra", "FactoMineR", "gProfileR", "PerformanceAnalytics"))

artmsAnalysisQuantifications() performs a comprehensive analysis of the quantifications outputs obtained from the function artmsQuantification(). It includes:

Inputs

It takes as input two files generated from the previous quantification step (artmsQuantification())

To run this analysis:

  1. Set as the working directory the folder with the results obtained from artmsQuantification().
setwd('~/path/to/the/results_quantification/')

And then run the following function (e.g., for a protein abundance "AB" experiment)

artmsAnalysisQuantifications(log2fc_file = "ab-results.txt",
                              modelqc_file = "ab-results_ModelQC.txt",
                              species = "human",
                              output_dir = "AnalysisQuantifications")

A few comments on the available options for artmsAnalysisQuantifications:

Outputs

Summary file (summary.xlsx)

Reminder: for any given relative quantification, as for example WT-Mutant:

The summary excel file (results-summary.xlsx) gathers several tabs:

Text files

Gene Enrichment analysis: enrichment analysis only supported for human and mouse. Check the GprofileR documentation to find out more about the details:

Protein Complex Enrichment analysis (based on CORUM)

Clustering

Correlations

Miscellaneous

PCA

Based on relative abundance

Based on significant changes

MISCELLANEOUS FUNCTIONS

artMS also provides a number of very handy functions.

Annotate data.frame with Gene Symbol, Name, ENTREZ based on Uniprot IDs

Takes the given columnid (of Uniprot IDs) from the input data.frame, and map the gene symbol, name, and entre id (source: bioconductor annotation packages)

# This example adds annotations to the evidence file available in
# artMS, based on the column 'Proteins'.

evidence_anno <- artmsAnnotationUniprot(x = artms_data_ph_evidence,
                                        columnid = 'Proteins',
                                        species = 'human')

Average Intensity, RT, CR

Taking as input the evidence file, it will summarize and return back the average intensity, average retention time, and the average calibrated retention time for each protein. If a list of proteins is provided, then only those proteins will be summarized and returned. Check ?artmsAvgIntensityRT() to find out more options.

artmsAvgIntensityRT(evidence_file = '/path/to/the/evidence.txt)

Change column name

Changes a given column name in the input data.frame

artms_data_ph_evidence <- artmsChangeColumnName(
                               dataset = artms_data_ph_evidence,
                               oldname = "Phospho..STY.",
                               newname = "PH_STY")

Individual abundance dot plots

Protein abundance dot plots for each unique uniprot id. It can take a long time

artmsDataPlots(input_file = "results/ab-results-mss-normalized.txt",
               output_file = "results/ab-results-mss-normalized.pdf")

Enrichment analysis function

Enrichment analysis based on a data.frame with Gene and Comparison/Label protein (i.e, typical MSstats results)

# The data must be annotated (Protein and Gene columns)
data_annotated <- artmsAnnotationUniprot(
                      x = artms_data_ph_msstats_results,
                      columnid = "Protein",
                      species = "human")
# And then the enrichment
enrich_set <- artmsEnrichLog2fc(
                   dataset = data_annotated,
                   species = "human",
                   background = unique(data_annotated$Gene), 
                   verbose = FALSE)

Enrichment analysis using gProfileR

Function that simplifies enrichment analysis using gProfileR

# annotate the MSstats results to get the Gene name
data_annotated <- artmsAnnotationUniprot(
                                     x = artms_data_ph_msstats_results,
                                     columnid = "Protein",
                                     species = "human")

# Filter the list of genes with a log2fc > 2
filtered_data <- 
unique(data_annotated$Gene[which(data_annotated$log2FC > 2)])

# And perform enrichment analysis
data_annotated_enrich <- artmsEnrichProfiler(
                                   x = filtered_data,
                                   categorySource = c('KEGG'),
                                   species = "hsapiens",
                                   background = unique(data_annotated$Gene))

MaxQuant evidence file to SAINTexpress format

Converts the MaxQuant evidence file to the 3 required files by SAINTexpress. Choose one of the following quantitative MS metrics:

artmsEvidenceToSaintExpress(evidence_file = "/path/to/evidence.txt", 
                            keys_file = "/path/to/keys.txt", 
                            ref_proteome_file = "/path/to/org.proteome.fasta")

MaxQuant evidence file to SAINTq format

Converts the MaxQuant evidence file to the required files by SAINTq. The user can filter based on either peptides with spectral counts (use msspc) or all the peptides (use all) for the analysis. The quantitative metric can be also chosen (either MS intensity or spectral counts)

artmsEvidenceToSAINTq(evidence_file = "/path/to/evidence.txt", 
                      keys_file = "/path/to/keys.txt", 
                      output_dir = "saintq_input_files")

Generate Phosfate input file

It generates the Phosfate input file from the imputedL2fcExtended.txt file resulting from running the artmsAnalysisQuantifications() on a ph-site quantification (see above). Notice that the only species suported by PHOTON is humans.

artmsPhosfateOutput(inputFile = "your-imputedL2fcExtended.txt")

Generate Photon input file

It generates the Photon input file from the imputedL2fcExtended.txt file resulting from running the artmsAnalysisQuantifications() on a ph-site quantification (see above). Please, notice that the only species suported by PHOTON is humans.

artmsPhotonOutput(inputFile = "your-imputedL2fcExtended.txt")

Remove contaminants and empty proteins from the MaxQuant evidence file

Remove contaminants and erroneously identified 'reverse' sequences by MaxQuant, in addition to empty protein ids

evidencefiltered <- artmsFilterEvidenceContaminants(x = artms_data_ph_evidence)

Generate ph-site specific evidence file

Generate extended detailed ph-site file, where every line is a ph site instead of a peptide. Therefore, if one peptide has multiple ph sites it will be breaking down in multiple extra lines for each of the sites.

artmsGeneratePhSiteExtended(df = dfobject, 
                            species = "mouse", 
                            ptmType = "ptmsites",
                            output_name = log2fc_file)

METABOLOMICS

artMS enables the relative quantification of untargeted polar metabolites using the alignment table generated by MarkerView. This means that the metabolites do not need to have an id in order to perform the quantification, as the m/z and retention time will be used as identifiers.

MarkerView is an ABSciex software that supports the files generated by Analyst software (.wiff) used to run our specific mass spectrometer (ABSciex Triple TOF 5600+). It also supports .t2d files generated by the Applied Biosystems 4700/4800 MALDI-TOF.

Markview is used to align mass spectrometry data from several samples for comparison. Using the import feature in the software, .wiff files (also .t2d MALDI-TOF files and tab-delimited .txt mass spectra data in mass-intensity format) are loaded for retention time alignment. Once the data files are selected, a series of windows will appear wherein peak finding, alignment, and filtering options can be entered and selected. These options include minimum spectral peak width, minimum retention time peak width, retention time and mass tolerance, and the ability to filter out peaks that do not appear in more than a user selected number of samples.

The alignment file is further processed and formatted to perform QC and relative quantification using the following artMS functions:

Convert Metabolomics

Pre-process the markview .txt file to generate an "evidence-like" file by running:

artmsConvertMetabolomics(input_file = "markview-output.txt", 
                         out_file = "metabolomics-evidence.txt")

QC Metabolomics

Perform quality control analysis on the metabolomics data by running:

artmsQualityControlMetabolomics(evidence_file = "metabolomics-evidence.txt",
                                keys_file = "metabolomics-keys.txt")

It generates the following plots:

Relative Quantification:

The relative quantification is performed using MSstats. It requires a configuration file (yaml format, please check above). A template can be generated by running: artmsWriteConfigYamlFile(config_file_name = "metab_config.yaml"). The relative quantification is performed by running:

artmsQuantification(yaml_config_file = "metabConfig.yaml")

TESTING FILES

The artMS package provides the following testing datasets

Phosphoproteomics dataset: example dataset consisting of two head and neck cancer cell lines (conditions "Cal33" and "HSC6"), 2 biological replicates each). The number of peptides was reduced to 1/8 due to bioconductor limitations on data size.

The full data set (2 conditions, 4 biological replicates) can be found at the following urls:

Protein Complexes dataset: downloaded (2017-08-01) from CORUM database
and further enriched with annotations of mouse mitochondrial complexes not available at CORUM. Used for complex enrichment calculations.

Pathogens Uniprot IDs:

Check the individual help pages (e.g, ?artms_data_ph_evidence) to find out more about them.

HELP

Errors or warnings? try to update the package first (resinstall) just in case a newer version is already available fixing the issue.

Does the issue persist after reinstallation? Then, please, submit your error as a new issue at the official Github repository.

Any other inquiries: artms.help@gmail.com



biodavidjm/artMS documentation built on July 7, 2023, 12:24 p.m.