R/leduc2022_plexDIA.R

##' Leduc et al. 2022 - plexDIA (biorRxiv): melanoma cells
##'
##' Single cell proteomics data acquired by the Slavov Lab. This is
##' the dataset associated to the fourth version of the preprint (and
##' the Genome Biology publication). It contains quantitative
##' information of melanoma cells at precursor, peptide and protein level.
##' This version of the data was acquired using the plexDIA MS
##' acquisition protocol.
##'
##' @format A [QFeatures] object with 48 assays, each assay being a
##' [SingleCellExperiment] object:
##'
##' - Assay 1-45: precursor data acquired with a mTRAQ-3 protocol,
##'   hence those assays contain 3 columns. Columns hold quantitative
##'   information from single cells or negative control samples.
##' - `Ms1Extracted`: the DIA-NN MS1 extracted signal, it combines the
##'   information from assays 1-45.
##' - `peptides`: peptide data containing quantitative data for 3,608
##'   peptides and 104 single cells. The  data were filtered  to  1%
##'   protein FDR.
##' - `proteins`: protein data containing quantitative data for 508
##'   proteins and 105 single cells. Note that the peptide and protein
##'   data provided by the authors differ by 3 samples. The precursor
##'   data were aggregated to protein intensity using maxLFQ. The
##'   protein data were further median normalized by column and by row,
##'   log2 transformed, impute using KNN (k = 3), again median
##'   normalized by column and by row, batch corrected using ComBat,
##'   and median normalized by column and by row once more.
##'
##' The `colData(leduc2022_plexDIA())` contains cell type annotation and
##' batch annotation that are common to all assays. The description of
##' the `rowData` fields for the precursor data can be found in the
##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: CellenONE cell sorting.
##' - **Sample preparation** performed using the improved SCoPE2
##'   protocol using the CellenONE liquid handling system. nPOP cell
##'   lysis (DMSO) + trypsin digestion + mTRAQ-3
##'   labeling and pooling.
##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
##'   25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).
##' - **Ionization**: ESI (1,800V).
##' - **Mass spectrometry**: Thermo Scientific Q-Exactive. The duty
##'   cycle = 1 MS1 + 4 DIA MS2 windows (120 Th, 120 Th, 200 Th and
##'   580 Th, spanning 378-1,402 m/z). Each MS1 and MS2 scan was
##'   conducted at 70,000 resolving power, 3×10E6 AGC and 300ms
##'   maximum injection time.
##' - **Data analysis**: DIA-NN.
##'
##' @section Data collection:
##'
##' The PSM data were collected from a shared Google Drive folder that
##' is accessible from the SlavovLab website (see `Source` section).
##' The folder contains the following files of interest:
##'
##' - `annotation_plexDIA.csv`: sample annotation
##' - `report_plexDIA_mel_nPOP.tsv`: the DIA-NN output file
##'   with the precursor data
##' - `report.pr_matrix_channels_ms1_extracted.tsv`: the DIA-NN
##'   output file with the combined precursor data
##' - `plexDIA_peptide.csv`: the processed data table containing the
##'   `peptide` data
##' - `plexDIA_protein_imputed.csv`: the processed data table
##'   containing the `protein` data
##'
##' We removed the failed runs as identified by the authors. We also
##' formatted the annotation and precuror quantification tables to
##' facilitate matching between corresponding columns. Both annotation
##' and quantification tables are then combined in a single [QFeatures]
##' object using `scp::readSCPfromDIANN()`.
##'
##' The `plexDIA_peptide.csv` and `plexDIA_protein_imputed.csv` files
##' were loaded and formatted as [SingleCellExperiment] objects. The
##' columns names were adapted to match those in the `QFeatures`
##' object. The `SingleCellExperiment` objects were then added to the
##' [QFeatures] object and the rows of the peptide data are linked to
##' the rows of the precursor data based on the peptide sequence or
##' the protein name through an `AssayLink` object.
##'
##' @source
##' The links to the data were found on the
##' [Slavov Lab website](https://scp.slavovlab.net/Leduc_et_al_2022).
##' The data were downloaded from the
##' [Google drive folder 1](https://drive.google.com/drive/folders/117ZUG5aFIJt0vrqIxpKXQJorNtekO-BV) and
##' [Google drive folder 2](https://drive.google.com/drive/folders/12-H2a1mfSHZUGf8O50Cr0pPZ4zIDjTac).
##' The raw data and the quantification data can also be found in the
##' massIVE repository `MSV000089159`:
##' ftp://massive.ucsd.edu/MSV000089159.
##'
##' @references
##' Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet
##' Sample Preparation for Single-Cell Proteomics Applied to the Cell
##' Cycle.” bioRxiv. [Link to article](https://doi.org/10.1101/2021.04.24.441211)
##'
##' Andrew Leduc, Gray Huffman, Joshua Cantlon, Saad Khan, and Nikolai
##' Slavov. 2022. “Exploring Functional Protein Covariation across
##' Single Cells Using nPOP.” Genome Biology 23 (1): 261.
##' [Link to article](http://dx.doi.org/10.1186/s13059-022-02817-5)
##'
##' Jason Derks, Andrew Leduc, Georg Wallmann, Gray Huffman, Matthew
##' Willetts, Saad Khan, Harrison Specht, Markus Ralser, Vadim
##' Demichev, and Nikolai Slavov. 2023. “Increasing the Throughput of
##' Sensitive Proteomics by plexDIA.” Nature Biotechnology 41 (1):
##' 50–59. [Link to article](http://dx.doi.org/10.1038/s41587-022-01389-w)
##'
##' @seealso
##' [leduc2022_pSCoPE]
##'
##' @examples
##' \donttest{
##' leduc2022_plexDIA()
##' }
##'
##' @keywords datasets
##'
"leduc2022_plexDIA"
UCLouvain-CBIO/scpdata documentation built on Oct. 29, 2024, 4:22 p.m.