thpLOPIT2021: Protein and PMS-level hyperLOPIT datasets from THP-1 human...

thpLOPIT_lps_mulvey2021R Documentation

Protein and PMS-level hyperLOPIT datasets from THP-1 human leukaema cells

Description

These are spatial proteomics datasets from a hyperLOPIT experimental design on the THP-1 human leukaema cell line.

Usage

data(thpLOPIT_lps_mulvey2021)
data(thpLOPIT_unstimulated_mulvey2021)
data(thpLOPIT_unstimulated_rep1_mulvey2021)
data(thpLOPIT_unstimulated_rep2_mulvey2021)
data(thpLOPIT_unstimulated_rep3_mulvey2021)
data(thpLOPIT_lps_rep1_mulvey2021)
data(thpLOPIT_lps_rep2_mulvey2021)
data(thpLOPIT_lps_rep3_mulvey2021)
data(psms_thpLOPIT_lps_rep1_set1)
data(psms_thpLOPIT_lps_rep1_set2)
data(psms_thpLOPIT_lps_rep2_set1)
data(psms_thpLOPIT_lps_rep2_set2)
data(psms_thpLOPIT_lps_rep3_set1)
data(psms_thpLOPIT_lps_rep3_set2)
data(psms_thpLOPIT_unstim_rep1_set1)
data(psms_thpLOPIT_unstim_rep1_set2)
data(psms_thpLOPIT_unstim_rep2_set1)
data(psms_thpLOPIT_unstim_rep2_set2)
data(psms_thpLOPIT_unstim_rep3_set1)
data(psms_thpLOPIT_unstim_rep3_set2)

Format

These data are instances of class MSnSet from package MSnbase.

Details

These datasets are from a spatiotemporal proteomic profiling experiment (Mulvey et al 2021) in which the dynamic pro-inflammatory response to lipopolysaccharide (LPS) in the THP-1 human leukaemia cell line is mapped.

Triplicate hyperLOPIT experiments using subcellular fractionation were conducted and fractions were digested at either 0h-LPS or 12h-LPS post-stimulation. For each replicate 20 fractions were selected and labelled with 2 x TMT 10plex then acquired with SPS-MS3 acquisition on the Orbitrap Fusion Lumos Tribrid instrument (Thermo). The resulting datasets were processed by Proteome Discoverer software (v2.1, Thermo) and analysed using the Bioconductor packages MSnbase and pRoloc.

PSM-level csv files are available in the codeextdata directory and have been imported as MSnSet instances using readMSnSet2. In total there are 12 PSM-level datasets, 6 for each coniditon and 2 sets per replicate (2 x TMT-10plex). A maximum to 2 missing values per PSM was allowed and are designated as NA.

There are 8 protein-level datasets; 3 replicates for each condition, e.g. thpLOPIT_lps_rep1_mulvey2021, thpLOPIT_lps_rep2_mulvey2021, etc. then a final 2 datasets thpLOPIT_lps_mulvey2021 and thpLOPIT_unstim_mulvey2021 in which the 3 replicates per condition have been concatenated. This last 2 datasets form part of the main analysis in the manuscript from Mulvey et al 2021.

The protein-level data was generated from the PSM-data in the following steps: any PSMs with missing values were assessed by examining if there was any trend in missing values. Barplots of the data suggest the few missing values that appear accumulate in the first few fractions and we deduce they are not missing at random and on the whole reflect the gradient distributions. A left-cenored method was used for imputation using MinDet in MSnbase. PSMs were normalised by sum across the fractions and then combined to protein according to the median PSM per protein group.

Marker proteins were annotated based on the combined protein-level data in thpLOPIT_lps_mulvey2021 and thpLOPIT_unstimulated_mulvey2021 and can be found in the fData slot. A list of well annotated, unambiguous resident organelle marker proteins from 11 subcellular niches: mitochondria, ER, Golgi apparatus, lysosome, peroxisome, PM, nucleus, nucleolus, chromatin, ribosome and cytosol, were curated from the Uniprot database (The UniProt, 2017), Gene Ontology (Ashburner et al., 2000) and from mining the literature. Only proteins known to localise to a single location were included as markers. The processing script is scripts/thpLOPIT2021.R.

The fData slot of the two combined datasets also contains the results from the data analysis as described in Mulvey et al (2021).

The data (expression and feature variable) contain:

  • UniProt Accession: found in the featureNames e.g. featureNames(thpLOPIT_lps_mulvey2021). The Protein Group (no isoform information):Unique UniProt accession for quantified protein group reported by Proteome Discoverer (1% FDR).

  • Expression data slot: Normalized TMT 10-plex reporter distributions representing the normalised abundance of each protein across the fractionation scheme for each experiment. Protein-level reporter ion values were calculated by taking the median of all quantifiable PSMs for the protein group, then normalized so that the sum of all 10 channels was equal to 1 before being concatenated across replicates.

  • GN: Gene name for protein accession.

  • Description: UniProt description for protein accession.

  • Confidence_x: The confidence level of protein identification FDR determined in hyperLOPIT experiment. Only proteins with Medium (Q ² 5 %) and High (Q ² 1 %) FDR confidence levels were retained; Percolator v. 2.05 was used to determine FDR; for details, see L. KSll et al., Nat. Methods 2007, 4, 923-925, L. KSll et al., J. Proteome Res. 2008, 7, 29-34, and L. Kall et al., Bioinformatics 2008, 24, i42-i48.

  • Coverage: Percentage of protein sequence covered by identified peptides for each experiment.

  • Quantified Peptides: Number of quantified peptides for each experiment. Only peptides that were unique to a single protein group were used for quantification.

  • Quantified PSMs: Number of quantified peptide-spectrum matches for each experiment.

  • tagm.xxx.xx: TAGM allocation results from the Baysian T-augmented Gaussian Mixture modelling approach as described in Crook et al. (2018). See ??tagmMcmcTrain.

  • L2_distance: the natural L2 distance between the TAGM joint posterior probabilities

  • non_movers: proteins predicted to not change location

  • type1_translocation: proteins predicted from one organelle class in the unstimulated condition to a different organelle class in the LPS-stimulated dataset i.e. organelle to organelle

  • type2_translocation: proteins precicted to move from an unknown localisation in the unstimulated dataset to a predicted organelle class in the LPS-stimulated dataset i.e. unknown to organelle

  • type3_translocation: proteins predicted to move from a organelle localisation in the unstimulated dataset to an unknown location in the LPS-stimulated dataset i.e. organelle to unknown

  • type4_translocation: a translocation event within the unknown class i.e. a protein that exhibits a large change between posterior probabilities in both conditions and is classified to an unknown location. For more information please see Mulvey et al 2021.

  • markers: annotated protein location based on the combined protein-level data used for training the TAGM MCMC classifier.

  • localisation.prob: assignment score, product of the tagm.mcmc.probability * 1 - tagm.mcmc.outlier.

  • localisation.pred: predicted localisation as filtered by the localisation.prob. Proteins were assigned the localisation predicted from tagm.mcmc.allocation if their localisation.prob was lower than .99 (1% FDR)

Source

The data was generated by C. Mulvey and L. Breckels in the Cambridge Centre for Proteomics. http://www.bio.cam.ac.uk/proteomics/.

References

Using hyperLOPIT to perform high-resolution mapping of the spatial proteome Mulvey, Claire M and Breckels, Lisa M and Geladaki, Aikaterini and Britovšek, Nina Kočevar and Nightingale, Daniel J H and Christoforou, Andy and Elzek, Mohamed and Deery, Michael J and Gatto, Laurent and Lilley, Kathryn S Nature Protocols 12, 1110–1135 (2017). https://doi.org/10.1038/nprot.2017.026

A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269

Examples

## load a THP data (unstimulated combined tripcate)
data(thpLOPIT_unstimulated_mulvey2021)
thpLOPIT_unstimulated_mulvey2021
pData(thpLOPIT_unstimulated_mulvey2021)
head(exprs(thpLOPIT_unstimulated_mulvey2021))

## simple protocol for combining psm to protein 
## load LPS stimulated data for replicate 1

library("MSnbase")
data(psms_thpLOPIT_lps_rep1_set1)
data(psms_thpLOPIT_lps_rep1_set2)

## impute missing values with "MinDet"
set1 <- impute(psms_thpLOPIT_lps_rep1_set1, method = "MinDet")
set2 <- impute(psms_thpLOPIT_lps_rep1_set2, method = "MinDet")

## normalise to sum
set1 <- normalise(set1, "sum")
set2 <- normalise(set2, "sum")

## combine to protein
set1 <- combineFeatures(set1,
		     groupBy = fData(set1)$Master.Protein.Accessions,
		     method = median)
set2 <- combineFeatures(set2,
		     groupBy = fData(set2)$Master.Protein.Accessions,
		     method = median)
		     
## update fvarLabels for set 2 to differentiate them from set 1   
set2 <- updateFvarLabels(set2)

## combine sets to form one replicate
xx <- combine(set1, set2)

## keep on proteins common in both sets
xx <- filterNA(xx)

library("pRoloc")
plot2D(thpLOPIT_unstimulated_mulvey2021, main = "Protein-level unstimulated data")

lgatto/pRolocdata documentation built on Oct. 17, 2024, 10:16 p.m.