dou2019_mouse: Dou et al. 2019 (Anal. Chem.): murine cell lines
In UCLouvain-CBIO/scpdata: Single-Cell Proteomics Data Package

dou2019_mouse

R Documentation

Dou et al. 2019 (Anal. Chem.): murine cell lines

Description

Single-cell proteomics using nanoPOTS combined with TMT isobaric labeling. It contains quantitative information at PSM and protein level. The cell types are either "Raw" (macrophage cells), "C10" (epihelial cells), or "SVEC" (endothelial cells). Out of the 132 wells, 72 contain single cells, corresponding to 24 C10 cells, 24 RAW cells, and 24 SVEC. The other wells are either boosting channels (12), empty channels (36) or reference channels (12). Boosting and reference channels are balanced (1:1:1) mixes of C10, SVEC, and RAW samples at 5 ng and 0.2 ng, respectively. The different cell types where evenly distributed across 4 nanoPOTS chips. Samples were 11-plexed with TMT labeling.

Usage

dou2019_mouse

Format

A QFeatures object with 13 assays, each assay being a SingleCellExperiment object:

Single_Cell_Chip_X_Y: PSM data with 11 columns corresponding to the TMT channels (see Notes). The X indicates the chip number (from 1 to 4) and Y indicates the row name on the chip (from A to C).
peptides: peptide data containing quantitative data for 15,492 peptides in 132 samples (run 1 and run 2 combined).
proteins: protein data containing quantitative data for 2331 proteins in 132 samples (all runs combined).

Sample annotation is stored in colData(dou2019_mouse()).

Acquisition protocol

The data were acquired using the following setup. More information can be found in the source article (see References).

Cell isolation: single-cells from the three murine cell lines were isolated using FACS (BD Influx II cell sorter ).
Sample preparation performed using the nanoPOTs device. Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C digestion + trypsin digestion + TMT-10plex labeling and pooling.
Separation: nanoLC (Dionex UltiMate with an in-house packed 50cm x 30um LC columns; 50nL/min)
Ionization: ESI (2,000V)
Mass spectrometry: Thermo Fisher Orbitrap Fusion Lumos Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000; MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution = 60,000; MS2 AGC = 1E5)
Data analysis: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor (custom R package)

Data collection

The PSM data were collected from the MassIVE repository MSV000084110 (see Source section). The downloaded files are:

⁠Single_Cell_Chip_*_*_msgfplus.mzid⁠: the MS-GF+ identification result files.
⁠Single_Cell_Chip_*_*_ReporterIons.txt⁠: the MASIC quantification result files.

For each batch, the quantification and identification data were combined based on the scan number (common to both data sets). The combined datasets for the different runs were then concatenated feature-wise. To avoid data duplication due to ambiguous matching of spectra to peptides or ambiguous mapping of peptides to proteins, we combined ambiguous peptides to peptides groups and proteins to protein groups. Feature annotations that are not common within a peptide or protein group are are separated by a ⁠;⁠. The sample annotation table was manually created based on the available information provided in the article. The data were then converted to a QFeatures object using the scp::readSCP() function.

We generated the peptide data. First, we removed PSM matched to contaminants or decoy peptides and ensured a 1% FDR. We aggregated the PSM to peptides based on the peptide (or peptide group) sequence(s) using the median PSM instenity. The peptide data for the different runs were then joined in a single assay (see QFeatures::joinAssays), again based on the peptide sequence(s). We then removed the peptide groups. Links between the peptide and the PSM data were created using QFeatures::addAssayLink. Note that links between PSM and peptide groups are not stored.

The protein data were downloaded from ⁠Supporting information⁠ section from the publisher's website (see Sources). The data is supplied as an Excel file ac9b03349_si_005.xlsx. The file contains 7 sheets from which we only took the 2nd (named ⁠01 - Raw sc protein data⁠) with the combined protein data for the 12 runs. We converted the data to a SingleCellExperiment object and added the object as a new assay in the QFeatures dataset (containing the PSM data). Links between the proteins and the corresponding PSM were created. Note that links to protein groups are not stored.

Note

Although a TMT-10plex labeling is reported in the article, the PSM data contained 11 channels for each run. Those 11th channel contain mostly missing data and are hence assumed to be empty channels.

Source

The PSM data can be downloaded from the massIVE repository MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/

The protein data can be downloaded from the ACS Publications website (Supporting information section).

References

Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B. Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a Nanodroplet Sample Preparation Platform.” Analytical Chemistry, September (link to article).