knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
With Label-Free Quantification, it's usually appropriate to apply a naive normalisation such as 'center-median' normalisation, (diff.median
in MSnbase::normalise
) or Trimmed Mean of the M-values (TMM) normalisation. This is on the basis that the quantification should be relative to the total amount of protein in each sample.
There are occasions when you will want to normalisation to a set of reference proteins, for example endogenous 'house-keeping' proteins, or exogenous spiked in proteins.
Here, we consider yet another normalisation approach where we have a strong prior expectation about the abundance ratio of a subset of proteins between two conditions. Our data is from the LFQ Data processing and QC notebook. Please see the previous notebook for an explanation of the experiment. The important point here is that OOPS is known to enrich glycoproteins at the interface. We can therefore use these proteins as an internal set of reference proteins which we expect to have no difference in abundance between RNase +/-.
Load the required libraries.
library(ggplot2) library(MSnbase) library(biobroom) library(camprotR) library(Proteomics.analysis.data) library(dplyr) library(tidyr) library(uniprotREST)
We start by reading in the protein-level quantification we created in the LFQ Data processing and QC notebook
prot_robust <- readRDS('results/lfq_prot_robust.rds')
First, we need to calculate the RNase +/- ratios, since this is the value we want to normalise in this case.
ratios <- exprs(prot_robust[,pData(prot_robust)$Condition == 'RNase_neg']) - exprs(prot_robust[,pData(prot_robust)$Condition == 'RNase_pos']) prot_ratios <- MSnSet(exprs = ratios, fData = fData(prot_robust), pData = (pData(prot_robust) %>% filter(Condition == 'RNase_neg')))
Next, we need annotations regarding which proteins are glycoproteins and GO-annotated RBPs.
Here, we query UniProt programmatically to obtain the GO terms and features for our proteins, which we will further parse to determine the GO-RBPs and glycoproteins.
glyco_res <- uniprot_map( ids = rownames(prot_robust), from = "UniProtKB_AC-ID", to = "UniProtKB", fields = "ft_carbohyd", ) %>% rename(c('UNIPROTKB'='From')) go_res <- uniprot_map( ids = rownames(prot_robust), from = "UniProtKB_AC-ID", to = "UniProtKB", fields = "go", ) %>% rename(c('UNIPROTKB'='From'))
Get the GO-RBPs. Note that we are also identifying proteins annotated with offsprings of the RNA-binding GO term, since some proteins will only be annnotated with the child term.
# GO term for RNA-binding go_rbp <- "GO:0003723" # All offspring (since some proteins will only be annotated with an offspring term) go_rbp_offspring <- AnnotationDbi::get("GO:0003723", GO.db::GOMFOFFSPRING) # Identify all GO-RBPs rbps <- go_res %>% separate_rows(Gene.Ontology..GO., sep='; ') %>% mutate(go=gsub('.*\\[(\\S+)\\]', '\\1', Gene.Ontology..GO.)) %>% filter(go %in% c(go_rbp, go_rbp_offspring)) %>% pull(UNIPROTKB) %>% unique()
Get the glycoproteins. Note that we are only keeping proteins with at least 3 glycosylated sites to ensure we have confidence in their glycosylation
glycoproteins <- glyco_res %>% # remove proteins with empty glycoprotein annotation filter(Glycosylation!='') %>% # separate the glycoprotein annotation into constituent parts separate_rows(Glycosylation, sep = "; ") %>% # keep the annotations relating to glycosylated position filter(grepl("CARBOHYD", Glycosylation)) %>% # count the number of glycosylated sites per protein group_by(UNIPROTKB) %>% tally() %>% # keep proteins with 3 or more glycosylations filter(n>=3) %>% pull(UNIPROTKB)
Add feature columns describing the glycoprotein and GO-RBP status of the proteins.
fData(prot_ratios) <- fData(prot_ratios) %>% mutate(Glycoprotein = rownames(prot_ratios) %in% glycoproteins) %>% mutate(GO.RBP = rownames(prot_ratios) %in% rbps) %>% mutate(Glyco.RBP = interaction(Glycoprotein, GO.RBP)) %>% mutate(Glyco.RBP = factor(recode( Glyco.RBP, 'TRUE.TRUE'='GO:RBGP', 'FALSE.TRUE'='GO:RBP', 'TRUE.FALSE'='Glycoprotein', 'FALSE.FALSE'='Other'), levels = c('GO:RBP', 'GO:RBGP', 'Other', 'Glycoprotein')) )
Finally, we define a function to plot the ratios for each functional sub-type of proteins.
plot_ratios <- function(obj) { to_plot <- merge( exprs(obj), fData(obj)[,'Glyco.RBP',drop = FALSE], by = 'row.names' ) %>% pivot_longer(cols = -c(Row.names, Glyco.RBP), names_to = 'sample', values_to = 'ratio') %>% merge(pData(obj), by.x = 'sample', by.y = 'row.names') %>% filter(is.finite(ratio)) p <- to_plot %>% ggplot(aes(x = Replicate, y = ratio, group = interaction(Glyco.RBP, Replicate), colour = factor(Glyco.RBP))) + geom_boxplot(position = position_dodge()) + theme_camprot(border = FALSE, base_family = 'sans', base_size = 15) + scale_colour_manual(values = c(get_cat_palette(3), 'black'), name = '') + geom_hline(yintercept = 0, linetype = 2, colour = 'grey') + labs( x = "Replicate", y = "RNase -/+ ratio" ) print(p) invisible(to_plot) }
Now we have everything in place, we can look at how the protein ratios look pre-normalisation...
plot_ratios(prot_ratios)
OK, so the glycoproteins are not centered at zero and there are GO-annotated RBPs with negative log RNase -/+ ratios (as much as ~25% in replicate 2)
Below, we perform the center-median normalisation with respect to the reference proteins (here, the glycoproteins).
glycoprotein_medians <- prot_ratios[fData(prot_ratios)$Glyco.RBP == 'Glycoprotein',] %>% camprotR::get_medians() prot_ratios_norm <- camprotR::center_normalise_to_ref( prot_ratios, glycoprotein_medians, center_to_zero = TRUE, # We want to center the glycoproteins around zero on_log_scale = TRUE # The quantifications are on a log scale (log2 ratios) )
And plot the protein ratios post-normalisation.
plot_ratios(prot_ratios_norm)
saveRDS(prot_ratios_norm, './results/lfq_prot_robust_glyco_norm.rds')
Now, the median log2 RNase -/+ ratio for glycoproteins is zero for all replicates and we have far fewer GO-annotated RBPs with negative log RNase -/+ ratios.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.