In demar01/RICdata: Data accompanying the RIC package

suppressPackageStartupMessages({
    library("BiocStyle")
    library("RICdata")
    library(QFeatures)
    library(magrittr)
    library(tidyverse)
})

Introduction

RICdata is a data package containing the data to analyse RNA interaction capture from the manuscript titled: Global analysis of RNA-binding protein dynamics by comparative and enhanced RNA interactome capture. This paper was published in Nature Protocols 16,27–60(2021).

Proteomics data overview

Peptide information extracted from PRIDE PXD009789.A total of 9 oligo DT capture and total cell lysate samples originated from mass spectrometry proteomics SILAC quantitative experiments [@Garcia-Moreno:2019].

# Path to tabular data
WCLpeptidesfilepath<- system.file("extdata","WCL_peptides.txt", package = "RICdata")
RICpeptidesfilepath<- system.file( "extdata", "RIC_peptides.txt", package = "RICdata")

data("WCLpeptides.raw")
dim(WCLpeptides.raw)
data("RICpeptides.raw")
dim(RICpeptides.raw)

Indices of the columns to be used as expression values are as follow:

j <- str_which(colnames(WCLpeptides.raw),str_c(c("Intensity.((\\D)).18_M_4",
                                             "Intensity.((\\D)).4_18_M",
                                             "Intensity.((\\D)).M_4_18"),
                                              collapse="|"))
colnames(WCLpeptides.raw)[j]

i <- str_which(colnames(RICpeptides.raw),str_c("Intensity.[H|M|L].",
                                               collapse="|"))
colnames(RICpeptides.raw)[i]

We can convert tabular data into a QFeatures object:

QWCLpeptides <- readQFeatures(WCLpeptidesfilepath, ecol = j, sep = "\t",
                              name = "peptides", fnames = "Sequence")
QRICpeptides <- readQFeatures(RICpeptidesfilepath, ecol = i, sep = "\t", 
                              name = "peptides", fnames = "Sequence")

Processing QFeatures

QFeature annotation

We can annotate with metadata our QFeatures objects. This is important as it defines the order and sample names of experiments.

sample_names=c('hour18','hour4','mock')
QWCLpeptides$group <- paste(sample_names,rep(1:3,each=3),sep='_')
QWCLpeptides$sample <- rep(1:3, each=3)
colData(QWCLpeptides)

QRICpeptides$group <-  paste(sample_names,rep(1:3,each=3),sep='_')
QRICpeptides$sample <- rep(1:3, each=3)
colData(QRICpeptides)

QFeature filtering

We filter for contaminant proteins and decoy database hits which are indicated by "+" in the columns "Potential.contaminants" and "Reverse" respectively using QFeatures-filtering functions.

QWCLpeptidesfiltered <- QWCLpeptides %>% 
    filterFeatures(~ Reverse == "") %>%
    filterFeatures(~ Potential.contaminant == "")

QRICpeptidesfiltered <- QRICpeptides %>% 
    filterFeatures(~ Reverse == "") %>%
    filterFeatures(~ Potential.contaminant == "")

Removing non-needed features

We can retain only rowDatanames of interest. To do this we can use the QFeatures::selectRowData function.

rowDataNames(QWCLpeptidesfiltered)[["peptides"]] %>% length() 
rowDataNames(QRICpeptidesfiltered)[["peptides"]] %>% length() 

rowvars <- c("Sequence", "Proteins", "Leading.razor.protein")
QWCLpeptidesfiltered_clean <- selectRowData(QWCLpeptidesfiltered, rowvars)
QRICpeptidesfiltered_clean <- selectRowData(QRICpeptidesfiltered, rowvars)

rowDataNames(QWCLpeptidesfiltered_clean)[["peptides"]] %>% length() 
rowDataNames(QRICpeptidesfiltered_clean)[["peptides"]] %>% length()

Annotation data overview

RICdata package also contains a reduced version of data contained in ProtFeatures [@Castello:2016]. This object is called miniProtFeatures and contains proteins sequence information. miniProtFeature is a list with the following objects:

ProtSeq a AAStringSet object of length 69025 with protein sequences.
GeneName a character vector with ENSEMBL gene id.
Symbol a named character vector with ENSEMBL gene symbols.

data(miniProtFeatures)
head(miniProtFeatures$ProtSeq)
head(miniProtFeatures$GeneName)
head(miniProtFeatures$Symbol)

GO annotation provided in mRNAinteractome is included and called ENSG2category.

data(ENSG2category)
head(ENSG2category)

Index maps for all amino acids 4-mers to proteins is provided as Index object, and is used by the function mapPeptides included in RIC` package to reverse peptides on a protein sequence database.