R/dou2019_mouse.R

##' Dou et al. 2019 (Anal. Chem.): murine cell lines
##'
##' @description
##'
##' Single-cell proteomics using nanoPOTS combined with TMT isobaric
##' labeling. It contains quantitative information at PSM and protein
##' level. The cell types are either "Raw" (macrophage cells), "C10"
##' (epihelial cells), or "SVEC" (endothelial cells). Out of the 132
##' wells, 72 contain single cells, corresponding to 24 C10 cells, 24
##' RAW cells, and 24 SVEC. The other wells are either boosting
##' channels (12), empty channels (36) or reference channels (12).
##' Boosting and reference channels are balanced (1:1:1) mixes of C10,
##' SVEC, and RAW samples at 5 ng and 0.2 ng, respectively. The
##' different cell types where evenly distributed across 4 nanoPOTS
##' chips. Samples were 11-plexed with TMT labeling.
##'
##' @format A [QFeatures] object with 13 assays, each assay being a
##' [SingleCellExperiment] object:
##'
##' - `Single_Cell_Chip_X_Y`: PSM data with 11 columns corresponding
##'   to the TMT channels (see `Notes`). The `X` indicates the chip
##'   number (from 1 to 4) and `Y` indicates the row name on the chip
##'   (from A to C).
##' - `peptides`: peptide data containing quantitative data for 15,492
##'   peptides in 132 samples (run 1 and run 2 combined).
##' - `proteins`: protein data containing quantitative data for 2331
##'   proteins in 132 samples (all runs combined).
##'
##' Sample annotation is stored in `colData(dou2019_mouse())`.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: single-cells from the three murine cell
##'   lines were isolated using FACS (BD Influx II cell sorter ).
##' - **Sample preparation** performed using the nanoPOTs device.
##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
##'   50cm x 30um LC columns; 50nL/min)
##' - **Ionization**: ESI (2,000V)
##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
##'   60,000; MS2 AGC = 1E5)
##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
##'   (custom R package)
##'
##' @section Data collection:
##'
##' The PSM data were collected from the MassIVE repository
##' MSV000084110 (see `Source` section). The downloaded files are:
##'
##'
##' - `Single_Cell_Chip_*_*_msgfplus.mzid`: the MS-GF+ identification
##'   result files.
##' - `Single_Cell_Chip_*_*_ReporterIons.txt`: the MASIC
##'   quantification result files.
##'
##' For each batch, the quantification and identification data were
##' combined based on the scan number (common to both data sets). The
##' combined datasets for the different runs were then concatenated
##' feature-wise. To avoid data duplication due to ambiguous matching
##' of spectra to peptides or ambiguous mapping of peptides to proteins,
##' we combined ambiguous peptides to peptides groups and proteins to
##' protein groups. Feature annotations that are not common within a
##' peptide or protein group are are separated by a `;`. The sample
##' annotation table was manually created based on the available
##' information provided in the article. The data were then converted
##' to a [QFeatures] object using the [scp::readSCP()] function.
##'
##' We generated the peptide data. First, we removed PSM matched to
##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
##' the PSM to peptides based on the peptide (or peptide group)
##' sequence(s) using the median PSM instenity. The peptide data for
##' the different runs were then joined in a single assay (see
##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
##' We then removed the peptide groups. Links between the peptide and
##' the PSM data were created using [QFeatures::addAssayLink]. Note
##' that links between PSM and peptide groups are not stored.
##'
##' The protein data were downloaded from `Supporting information`
##' section from the publisher's website (see `Sources`). The data is
##' supplied as an Excel file `ac9b03349_si_005.xlsx`. The file
##' contains 7 sheets from which we only took the 2nd (named
##' `01 - Raw sc protein data`) with the combined protein data for the
##' 12 runs. We converted the data to a [SingleCellExperiment] object
##' and added the object as a new assay in the [QFeatures] dataset
##' (containing the PSM data). Links between the proteins and the
##' corresponding PSM were created. Note that links to protein groups
##' are not stored.
##'
##' @note Although a TMT-10plex labeling is reported in the article,
##' the PSM data contained 11 channels for each run. Those 11th
##' channel contain mostly missing data and are hence assumed to be
##' empty channels.
##'
##' @source
##' The PSM data can be downloaded from the massIVE repository
##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
##'
##' The protein data can be downloaded from the
##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
##' website (Supporting information section).
##'
##' @references
##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
##' September
##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
##'
##' @seealso
##' [dou2019_lysates], [dou2019_boosting]
##'
##' @examples
##' \donttest{
##' dou2019_mouse()
##' }
##'
##' @keywords datasets
##'
"dou2019_mouse"
UCLouvain-CBIO/scpdata documentation built on Oct. 29, 2024, 4:22 p.m.