In MRCToxBioinformatics/Proteomics_data_analysis: Basic proteomics data analysis tutorials

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

With Label-Free Quantification, it's usually appropriate to apply a naive normalisation such as 'center-median' normalisation, (diff.median in MSnbase::normalise) or Trimmed Mean of the M-values (TMM) normalisation. This is on the basis that the quantification should be relative to the total amount of protein in each sample.

There are occasions when you will want to normalisation to a set of reference proteins, for example endogenous 'house-keeping' proteins, or exogenous spiked in proteins.

Here, we consider yet another normalisation approach where we have a strong prior expectation about the abundance ratio of a subset of proteins between two conditions. Our data is from the LFQ Data processing and QC notebook. Please see the previous notebook for an explanation of the experiment. The important point here is that OOPS is known to enrich glycoproteins at the interface. We can therefore use these proteins as an internal set of reference proteins which we expect to have no difference in abundance between RNase +/-.

Load dependencies

Load the required libraries.

library(ggplot2)
library(MSnbase)
library(biobroom)
library(camprotR)
library(Proteomics.analysis.data)
library(dplyr)
library(tidyr)
library(uniprotREST)

Load data

We start by reading in the protein-level quantification we created in the LFQ Data processing and QC notebook

prot_robust <- readRDS('results/lfq_prot_robust.rds')

First, we need to calculate the RNase +/- ratios, since this is the value we want to normalise in this case.

ratios <- exprs(prot_robust[,pData(prot_robust)$Condition == 'RNase_neg']) - 
  exprs(prot_robust[,pData(prot_robust)$Condition == 'RNase_pos'])

prot_ratios <- MSnSet(exprs = ratios,
                      fData = fData(prot_robust),
                      pData = (pData(prot_robust) %>% filter(Condition == 'RNase_neg')))

Get annotations for glycoproteins and GO-'RNA binding'

Next, we need annotations regarding which proteins are glycoproteins and GO-annotated RBPs.

Here, we query UniProt programmatically to obtain the GO terms and features for our proteins, which we will further parse to determine the GO-RBPs and glycoproteins.

glyco_res <-  uniprot_map(
  ids = rownames(prot_robust),
  from = "UniProtKB_AC-ID",
  to = "UniProtKB",
  fields = "ft_carbohyd",
) %>% rename(c('UNIPROTKB'='From'))

go_res <-  uniprot_map(
  ids = rownames(prot_robust),
  from = "UniProtKB_AC-ID",
  to = "UniProtKB",
  fields = "go",
) %>% rename(c('UNIPROTKB'='From'))

Get the GO-RBPs. Note that we are also identifying proteins annotated with offsprings of the RNA-binding GO term, since some proteins will only be annnotated with the child term.

# GO term for RNA-binding
go_rbp <- "GO:0003723"

# All offspring (since some proteins will only be annotated with an offspring term)
go_rbp_offspring <- AnnotationDbi::get("GO:0003723", GO.db::GOMFOFFSPRING)

# Identify all GO-RBPs
rbps <- go_res %>%
  separate_rows(Gene.Ontology..GO., sep='; ') %>%
  mutate(go=gsub('.*\\[(\\S+)\\]', '\\1', Gene.Ontology..GO.)) %>%
  filter(go %in% c(go_rbp, go_rbp_offspring)) %>% 
  pull(UNIPROTKB) %>%
  unique()

Get the glycoproteins. Note that we are only keeping proteins with at least 3 glycosylated sites to ensure we have confidence in their glycosylation

glycoproteins <- glyco_res %>% 
  # remove proteins with empty glycoprotein annotation
  filter(Glycosylation!='') %>%
  # separate the glycoprotein annotation into constituent parts
  separate_rows(Glycosylation, sep = "; ") %>%
  # keep the annotations relating to glycosylated position
  filter(grepl("CARBOHYD", Glycosylation)) %>%
  # count the number of glycosylated sites per protein
  group_by(UNIPROTKB) %>%  tally() %>%
  # keep proteins with 3 or more glycosylations
  filter(n>=3) %>%
  pull(UNIPROTKB)

Add feature columns describing the glycoprotein and GO-RBP status of the proteins.

fData(prot_ratios) <- fData(prot_ratios) %>%
  mutate(Glycoprotein = rownames(prot_ratios) %in% glycoproteins) %>%
  mutate(GO.RBP = rownames(prot_ratios) %in% rbps) %>%
  mutate(Glyco.RBP = interaction(Glycoprotein, GO.RBP)) %>%
  mutate(Glyco.RBP = factor(recode(
    Glyco.RBP,
    'TRUE.TRUE'='GO:RBGP',
    'FALSE.TRUE'='GO:RBP',
    'TRUE.FALSE'='Glycoprotein',
    'FALSE.FALSE'='Other'),
    levels = c('GO:RBP', 'GO:RBGP', 'Other', 'Glycoprotein'))
  )

Finally, we define a function to plot the ratios for each functional sub-type of proteins.

plot_ratios <- function(obj) {
  to_plot <- merge(
    exprs(obj),
    fData(obj)[,'Glyco.RBP',drop = FALSE],
    by = 'row.names'
  ) %>%
    pivot_longer(cols = -c(Row.names, Glyco.RBP), names_to = 'sample', values_to = 'ratio') %>%
    merge(pData(obj), by.x = 'sample', by.y = 'row.names') %>%
    filter(is.finite(ratio))

  p <- to_plot %>% 
    ggplot(aes(x = Replicate, y = ratio, 
               group = interaction(Glyco.RBP, Replicate), 
               colour = factor(Glyco.RBP))) +
    geom_boxplot(position = position_dodge()) +
    theme_camprot(border = FALSE, base_family = 'sans', base_size = 15) +
    scale_colour_manual(values = c(get_cat_palette(3), 'black'), name = '') +
    geom_hline(yintercept = 0, linetype = 2, colour = 'grey') +
    labs(
      x = "Replicate",
      y = "RNase -/+ ratio"
    )

  print(p)

  invisible(to_plot)
}

Normalising against the glycoproteins

Now we have everything in place, we can look at how the protein ratios look pre-normalisation...

plot_ratios(prot_ratios)

OK, so the glycoproteins are not centered at zero and there are GO-annotated RBPs with negative log RNase -/+ ratios (as much as ~25% in replicate 2)

Below, we perform the center-median normalisation with respect to the reference proteins (here, the glycoproteins).

glycoprotein_medians <- prot_ratios[fData(prot_ratios)$Glyco.RBP == 'Glycoprotein',] %>% 
  camprotR::get_medians()

prot_ratios_norm <- camprotR::center_normalise_to_ref(
  prot_ratios,
  glycoprotein_medians,
  center_to_zero = TRUE, # We want to center the glycoproteins around zero
  on_log_scale = TRUE # The quantifications are on a log scale (log2 ratios)
)

And plot the protein ratios post-normalisation.

plot_ratios(prot_ratios_norm)

saveRDS(prot_ratios_norm, './results/lfq_prot_robust_glyco_norm.rds')

Now, the median log2 RNase -/+ ratio for glycoproteins is zero for all replicates and we have far fewer GO-annotated RBPs with negative log RNase -/+ ratios.

Session info {.unnumbered}

sessionInfo()

References {.unnumbered}

MRCToxBioinformatics/Proteomics_data_analysis documentation built on July 7, 2024, 6:44 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MRCToxBioinformatics/Proteomics_data_analysis
Basic proteomics data analysis tutorials

In MRCToxBioinformatics/Proteomics_data_analysis: Basic proteomics data analysis tutorials

Load dependencies

Load data

Get annotations for glycoproteins and GO-'RNA binding'

Normalising against the glycoproteins

Session info {.unnumbered}

References {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

MRCToxBioinformatics/Proteomics_data_analysis Basic proteomics data analysis tutorials

In MRCToxBioinformatics/Proteomics_data_analysis: Basic proteomics data analysis tutorials

Load dependencies

Load data

Get annotations for glycoproteins and GO-'RNA binding'

Normalising against the glycoproteins

Session info {.unnumbered}

References {.unnumbered}

R Package Documentation

Browse R Packages

We want your feedback!

MRCToxBioinformatics/Proteomics_data_analysis
Basic proteomics data analysis tutorials