suppressPackageStartupMessages({
  library("BiocStyle")
  library("RIC")
  library(magrittr)
  library(tidyverse)

})

\pagebreak

Abstract

RIC (RNA interaction capture) is a package that provides an analytical workflow ofmass spectrometry proteomics SILAC quantitative data from comparative RNA interaction capture (cRIC) experiments [@Garcia-Moreno:2019]. In this type of experiments, oligo DT capture and total cell lysate as used as input normalization.

RIC requires tabular input e.g. peptides.txt files output of quantitative analysis software like MaxQuant. Functions are provided for preparation and generating
QFeatures objects [@Gatto2020], filtering, calculating cRIC ratios as well as statistical testing of deferentially RBP due to a given biological condition/treatment. It also includes tools to check intermediate steps in the workflow, such as batch correction Finally, visualization tools are provided to explore the results, including barplots, scatter and volcano plots of RIC, WCL and cRIC quantitative and semi-quantitative data.

Installation and loading package

Start R and install RICdata package from github and RIC from Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
install_github("demar01/RICdata")

    install.packages("BiocManager")
BiocManager::install("RIC")
library("RIC")

Once you have the package installed, load RIC and dplyr for data transformation into R.

library(RIC)
library(RICdata)
library(stringr)
library(dplyr)
library(stringr)
library(magrittr)
library(QFeatures)
library(Biostrings)

Getting the data

We analyze the dataset from [@Garcia-Moreno:2019]. These data contain three biological replicates of labeled cells infected with SINV and irradiated with UV light at 4 and 18 h post-infection (hpi), using uninfected cells as a control.

The raw mass spectrometry data were first analyzed using MaxQuant [@Cox:2014] and the resulting “peptides.txt” file is used as input for the downstream analysis. These data are contained in the RICdata package. Thanks to the Qfeatures [@Gatto:2020] readQFeatures function a text file can be read straight into a QFeatures object, a standardize structure to efficiently handle quantitative mass spectrometry data. Since we need to provide with the indexes of the of the columns to be used as expression values, we provided the tabular data with this package.

Loading data as Qfeature objects

# Path to tabular data
WCLpeptidesfilepath<- system.file("extdata","WCL_peptides.txt", package = "RICdata")
RICpeptidesfilepath<- system.file( "extdata", "RIC_peptides.txt", package = "RICdata")

# Tabular data 
data("WCLpeptides.raw")
data("RICpeptides.raw")
RICpeptides <- RICpeptides.raw
WCLpeptides <- WCLpeptides.raw

# Indices of the columns to be used as expression values
j <- str_which(colnames(WCLpeptides),str_c(c("Intensity.((\\D)).18_M_4",
                                             "Intensity.((\\D)).4_18_M",
                                             "Intensity.((\\D)).M_4_18"),
                                              collapse="|"))
i <- str_which(colnames(RICpeptides),str_c("Intensity.[H|M|L].", collapse="|"))

#Converting tabular data into a QFeatures object
QWCLpeptides <- readQFeatures(WCLpeptidesfilepath, ecol = j, sep = "\t", name = "peptides", fnames = "Sequence")
QRICpeptides <- readQFeatures(RICpeptidesfilepath, ecol = i, sep = "\t", name = "peptides", fnames = "Sequence")

readQFeatures can take either file path where yourpeptides.txtlives or the tabular data already read. We need to detect the position where __Intensity__ columns are. Note that the authors of RIC experiment gave different names to the oligo(dT) capture (RIC) and WCL experiments, so we need to have one set of indexes forWCLexperiment (j) and one set of indexes forRIC` experiment (i).

We need additional information to process the RIC pipeline: - SV_proteins.txt (provided with this package) contains additional Sindbis virus (SV) proteins to be added to the mapping. This file should be read in FASTA format. - Additional data and functions obtained from the RBDmap and RBDmapHeLa packages

Loading viral annotation

# Viral protein annotation
SV_seqpath<- system.file( "extdata", "SV_proteins.txt", package = "RIC")
SV_seq <- readAAStringSet(SV_seqpath)

# From RBDmap and RBDmapHeLa 
mapPeptidespath<- system.file( "scripts", "mapPeptides.R", package = "RIC")
source(mapPeptidespath) #part of RBPmap package
mapPeptidespath<- system.file( "scripts", "mapPeptides.R", package = "RIC")
source(mapPeptidespath) #part of RBPmap package

data("miniProtFeatures") #these data are >5 MB and are included in RICdata
summary(miniProtFeatures)
ProtFeatures<-miniProtFeatures
data("Index") #these data are >5 MB and are included in RICdata
data("enigmRBP") 

Processing QFeatures

QFeature annotation

We can annotate with metadata our QFeatures objects. This is important as it defines the order and sample names of experiments.

sample_names=c('hour18','hour4','mock')
QWCLpeptides$group <- paste(sample_names,rep(1:3,each=3),sep='_')
QWCLpeptides$sample <- rep(1:3, each=3)
colData(QWCLpeptides)

QRICpeptides$group <-  paste(sample_names,rep(1:3,each=3),sep='_')
QRICpeptides$sample <- rep(1:3, each=3)
colData(QRICpeptides)

Qfeatures_list<-list(QRICpeptides,QWCLpeptides)

QFeature filtering

We filter for contaminant proteins and decoy database hits which are indicated by "+" in the columns "Potential.contaminants" and "Reverse" respectively using QFeatures-filtering functions.

QWCLpeptidesfiltered <- QWCLpeptides %>% 
    filterFeatures(~ Reverse == "") %>%
    filterFeatures(~ Potential.contaminant == "")

QRICpeptidesfiltered <- QRICpeptides %>% 
    filterFeatures(~ Reverse == "") %>%
    filterFeatures(~ Potential.contaminant == "")

#This could be done in one step on Qfeatures_list

Removing non-needed features

We can retain only rowDatanames of interest. To do this we can use the QFeatures::selectRowData function.

rowDataNames(QWCLpeptidesfiltered)[["peptides"]] %>% length() #139
rowDataNames(QRICpeptidesfiltered)[["peptides"]] %>% length() #142

rowvars <- c("Sequence", "Proteins", "Leading.razor.protein")
QWCLpeptidesfiltered_clean <- selectRowData(QWCLpeptidesfiltered, rowvars)
QRICpeptidesfiltered_clean <- selectRowData(QRICpeptidesfiltered, rowvars)

rowDataNames(QWCLpeptidesfiltered_clean)[["peptides"]] %>% length() #3
rowDataNames(QRICpeptidesfiltered_clean)[["peptides"]] %>% length() #3

# QWCLpeptidesfiltered_clean & QRICpeptidesfiltered_clean could be saved at this point

Peptide aggregation

Visualizatoin of single-matched peptides

We can consider only peptides from each experimental condition than map to a single gene [@Perez-Perri:2020]. We can visualize how many genes each peptides matches to using the plot_singlepeptides.

RIC::plot_singlepeptides(QWCLpeptidesfiltered_clean,SV_seq,ProtFeatures)
RIC::plot_singlepeptides(QRICpeptidesfiltered_clean,SV_seq,ProtFeatures)

Most peptides uniquely match to one gene. Peptides that match to more than one gene will be excluded in the downstream analysis.

Aggregation of mean intensity values for each ENSEMBL gene ID

We now calculate protein intensities from the mean intensity values of peptides mapped to the same gene using the agregate_singlepeptides function that takes the following arguments: - the name QFeatures : QWCLpeptidesfiltered_clean or QRICpeptidesfiltered_clean in this case - whichorder: the correct order of files and not the order given in MaxQuant output. It is critical at this point that we ensure the correct order. - names_samples: the name of the experimental conditions. Note that we have defined this previously when adding QFeatures metadata and it is defined in colData(QWCLpeptides)$group

#checking the sample names order as run in  maxquant for WCL 
 c("sequence",rownames(colData(QWCLpeptidesfiltered_clean)))[c(1,2,4,3,6,5,7,10,9,8)]
whichorder <-c(1,2,4,3,6,5,7,10,9,8)
aggregatedWCL<-RIC::agregate_singlepeptides(QWCLpeptidesfiltered_clean,SV_seq, 
                                            ProtFeatures,whichorder = c(1,2,4,3,6,5,7,10,9,8), 
                                            names_samples= colData(QWCLpeptides)$group )


#checking the sample names order as run in  maxquant for RIC 
 c("sequence",rownames(colData(QRICpeptidesfiltered_clean)))[c(1,4,3,2,5,7,6,9,8,10)]
whichorder <-c(1,4,3,2,5,7,6,9,8,10)
aggregatedRIC<-RIC::agregate_singlepeptides(QRICpeptidesfiltered_clean,SV_seq,
                                            ProtFeatures,whichorder = c(1,4,3,2,5,7,6,9,8,10),
                                            names_samples=colData(QWCLpeptides)$group )

Batch effect

Plotting batch effect

We can get a high-level overview of the data using the plot_batcheffect function, which can be very useful to observe batch effects, such as obvious differences between replicates.

RIC::plot_batcheffect(aggregatedWCL)
RIC::plot_batcheffect(aggregatedRIC)

Removing batch effect

If needed we can use the remove_batcheffect function, that uses limma's removeBatchEffect [@Ritchie:2015].

batch2 <- c("A","A","A","B","B","B","C","C","C")
aggregatedWCL_batch<-remove_batcheffect(aggregatedWCL,batch2)

plot_batcheffect(aggregatedWCL_batch)

Intensity patterns across replicates

We can get an overview of protein intensities across replicates using the function plot_scatterreplicates. In the example below we choose to visualise protein intensities for two input replicates and highligth the intensities of viral proteins "SV_wt_nsP2" and "SV_wt_E2".

plot_scatterreplicates(
  aggregatedWCL_batch,
  protein_1 = "SV_wt_nsP2",
  protein_2 = "SV_wt_E2",
  xlimits = c(20, 34),
  ylimits = c(20, 34),
  repx = "hour18_1",
  repy = "hour18_2"
)

We can also plot the intensity of proteins between two RIC experiment replicates.

plot_scatterreplicates(
  aggregatedRIC,
  protein_1 = "SV_wt_nsP2",
  protein_2 = "SV_wt_E2",
  xlimits = c(16, 29),
  ylimits = c(16, 29),
  repx = "hour18_1",
  repy = "hour18_2"
)

RNA binding activity

We can estimate the magnitude of RNA binding activity calculating the log2 (RIC/WCL) changes using the function calculate_cRIC.

cRIC <- calculate_cRIC(aggregatedWCL_batch, aggregatedRIC)

Differential enrichment testing

test_moderateRIC calculates a moderated t-test for set enrichment as implemented in limma [@Ritchie:2015] and it silently returns a list with two components:

  1. The first component test_moderateRIC output is a list of same length as sample_names r length(sample_names). Each component of this list is a dataframe with t.test output and has r names(test_moderateRIC(aggregatedWCL_batch)[[1]][[1]]) names.

  2. The second component test_moderateRIC output is a list of same length as sample_names r length(sample_names). Each component of this list is a matrix with intensity values (median corrected).

test_moderateRIC(aggregatedWCL_batch)
test_moderateRIC(aggregatedRIC)

Plotting log2 fold changes

We can access the output of test_moderateRIC and visualize with a scatterplot the log2 fold changes across different replicates,highlighting proteins with significant changes using plot_scatterRIC function.

test_moderateRIC(aggregatedWCL_batch)[[1]]$diff_hour18_hour4 -> tabletoplotWCL
test_moderateRIC(aggregatedWCL_batch)[[2]]$diff_hour18_hour4 ->  intensitiestoploWCL
plot_scatterRIC(tabletoplotWCL,intensitiestoploWCL)

test_moderateRIC(aggregatedRIC)[[1]]$diff_hour18_mock -> tabletoplotRIC
test_moderateRIC(aggregatedRIC)[[2]]$diff_hour18_mock -> intensitiestoplotRIC
plot_scatterRIC(tabletoplotRIC,intensitiestoplotRIC)

Plotting volcano cRIC

Similarly, we can access the output of test_moderateRIC and visualize with a
volcanoplot for WCL and RIC experiment using plot_volcanoRIC. Note that we only need the first output component of test_moderateRIC to represent a volcano.

plot_volcanoRIC(tabletoplotRIC)

Semi-quantitative analysis

Semi-quantitative analysis enables the analysis of proteins with ‘zero intensity’ values in one of the conditions. Semi-quantitative analysis can be particularly useful for the identification of RBPs that which from active (present on RIC but not present on WCL [on/off]) to non-active(present on WCL but not present on RIC [off/on]). These missing values would be otherwise be missed in the previous statistical analysis due to missing values. semiq_cRIC function returns a tabular data with one additional summary column per each sample name sample_names that contains the number of detectable intensity values on each condition upon filtering for a minimum threshold (updown) of log2(condition/relative).

 output_semiqcRIC<-semiq_cRIC(aggregatedRIC,condition = "hour4",relative = "mock" ,updown=2 )
output_semiqcRIC %>%
  head(5)

Saving results

To ease future analysis and/or visualization of RIC data, processed data can be saved. For example, we could save the cleaned QWCLpeptidesfiltered_clean and QRICpeptidesfiltered_clean Qfeature objects. This allows us to easily change parameters in future analysis.

``` {r save-load-Qfeatures, eval = FALSE}

Save analyzed data

save(QWCLpeptidesfiltered_clean, QRICpeptidesfiltered_clean, file = "RICprocessed.RData")

These data could be loaded in future R sessions by

load("RICprocessed.RData")

# Session Info

```r
sessionInfo()


demar01/RIC documentation built on Feb. 10, 2021, 5:25 p.m.