region_psite: Percentage of P-sites per transcript region.

View source: R/percentage_regions.R

region_psiteR Documentation

Percentage of P-sites per transcript region.

Description

This function computes the percentage of P-sites falling in the three annotated regions of the transcripts (5' UTR, CDS and 3' UTR) and generates a bar plot of the resulting values. Multiple samples and replicates can be handled.

Usage

region_psite(
  data,
  annotation,
  sample,
  multisamples = "average",
  plot_style = "stack",
  transcripts = NULL,
  length_range = NULL,
  cl = 100,
  colour = c("gray70", "gray40", "gray10")
)

Arguments

data

Either list of data tables or GRangesList object from psite_info.

annotation

Data table as generated by create_annotation.

sample

Either character string, character string vector or named list of character string(s)/character string vector(s) specifying the name of the sample(s) and replicate(s) of interest. If a list is provided, each element of the list is considered as an independent sample associated with one ore multiple replicates. Multiple samples and replicates are handled and visualised according to multisamples and plot_style.

multisamples

Either "average" or "independent". It specifies how to handle multiple samples and replicates stored in sample:

  • if sample is a character string vector and multisamples is set to "average" the elements of the vector are considered as replicates of one sample and a single bar plot is returned.

  • if sample is a character string vector and multisamples is set to "independent", each element of the vector is analysed independently of the others.

  • if sample is a list, multisamples must be set to "average". Each element of the list is analysed independently of the others, its replicates averaged and its name reported in the plot. Note: when this parameter is set to "average" the bar plot associated with each sample displays the region-specific mean signal computed across the replicates and, if plot_style is set to "dodge", the corresponding standard error is also reported. Default is "average".

plot_style

Either "stack" or "dodge". It specifies how to organize the bars associated with the three regions of the transcript:

  • "stack": bars are placed one on top of the other.

  • "dodge": bars are placed one next to the other. In this case, the standard error obtained by merging multiple samples (if any, see sample and multisamples) is displayed. Default is "stack".

transcripts

Character string vector listing the name of transcripts to be included in the analysis. Default is NULL, i.e. all transcripts are used. Please note: transcripts without annotated 5' UTR, CDS and 3' UTR are automatically discarded.

length_range

Integer or integer vector for restricting the analysis to a chosen range of read lengths. Default is NULL, i.e. all read lengths are used. If specified, this parameter prevails over cl.

cl

Integer value in 1,100 specifying a confidence level for restricting the plot to an automatically-defined range of read lengths. The new range is computed according to the most frequent read lengths, which accounts for the cl% of the sample and is defined by discarding the (100-cl)% of read lengths falling in the tails of the read lengths distribution. If multiple samples are analysed, a single range of read lengths is computed such that at least the cl% of all samples is represented. Default is 100.

colour

Character string vector of three elements specifying the colour of the bar associated with the 5' UTR, CDS and 3' UTR, respectively. Default is a grayscale.

Details

In the plot, "RNAs" reflects the expected read distribution from random fragmentation of all transcripts used in the analysis. It can be used as baseline to asses the enrichment of ribosomes (P-sites) mapping on the CDS with respect to the UTRs. The three bars are based on the cumulative nucleotide length of the 5' UTRs, CDSs and 3' UTRs, respectively, expressed as percentages.

Value

List containing: one ggplot object(s) and the data table with the corresponding x-, y-axis values and the z-values, defining the color of the bars ("plot_dt"); an additional data table with raw and scaled number of P-sites per frame for each sample ("count_dt").

Examples

## data(reads_list)
## data(mm81cdna)
##
## ## Generate fake samples and replicates
## for(i in 2:6){
##   samp_name <- paste0("Samp", i)
##   set.seed(i)
##   reads_list[[samp_name]] <- reads_list[["Samp1"]][sample(.N, 5000)]
## }
##
## ## Compute and add p-site details
## psite_offset <- psite(reads_list, flanking = 6, extremity = "auto")
## reads_psite_list <- psite_info(reads_list, psite_offset)
##
## ## Define the list of samples and replicate to use as input
## input_samples <- list("S1" = c("Samp1", "Samp2"),
##                       "S2" = c("Samp3", "Samp4", "Samp5"),
##                       "S3" = c("Samp6"))
##
## Generate bar plot
## example_psite_per_region <- region_psite(reads_psite_list, mm81cdna,
##                                          sample = input_samples,
##                                          multisamples = "average",
##                                          plot_style = "stack",
##                                          cl = 85,
##                                          colour = c("#333f50", "gray70", "#39827c")) 

LabTranslationalArchitectomics/riboWaltz documentation built on Jan. 17, 2024, 12:18 p.m.