View source: R/percentage_regions.R
region_psite | R Documentation |
This function computes the percentage of P-sites falling in the three annotated regions of the transcripts (5' UTR, CDS and 3' UTR) and generates a bar plot of the resulting values. Multiple samples and replicates can be handled.
region_psite(
data,
annotation,
sample,
multisamples = "average",
plot_style = "stack",
transcripts = NULL,
length_range = NULL,
cl = 100,
colour = c("gray70", "gray40", "gray10")
)
data |
Either list of data tables or GRangesList object from
|
annotation |
Data table as generated by |
sample |
Either character string, character string vector or named list
of character string(s)/character string vector(s) specifying the name of
the sample(s) and replicate(s) of interest. If a list is provided, each
element of the list is considered as an independent sample associated with
one ore multiple replicates. Multiple samples and replicates are handled
and visualised according to |
multisamples |
Either "average" or "independent". It specifies how to
handle multiple samples and replicates stored in
|
plot_style |
Either "stack" or "dodge". It specifies how to organize the bars associated with the three regions of the transcript:
|
transcripts |
Character string vector listing the name of transcripts to be included in the analysis. Default is NULL, i.e. all transcripts are used. Please note: transcripts without annotated 5' UTR, CDS and 3' UTR are automatically discarded. |
length_range |
Integer or integer vector for restricting the analysis to
a chosen range of read lengths. Default is NULL, i.e. all read lengths are
used. If specified, this parameter prevails over |
cl |
Integer value in 1,100 specifying a confidence level for restricting the plot to an automatically-defined range of read lengths. The new range is computed according to the most frequent read lengths, which accounts for the cl% of the sample and is defined by discarding the (100-cl)% of read lengths falling in the tails of the read lengths distribution. If multiple samples are analysed, a single range of read lengths is computed such that at least the cl% of all samples is represented. Default is 100. |
colour |
Character string vector of three elements specifying the colour of the bar associated with the 5' UTR, CDS and 3' UTR, respectively. Default is a grayscale. |
In the plot, "RNAs" reflects the expected read distribution from random fragmentation of all transcripts used in the analysis. It can be used as baseline to asses the enrichment of ribosomes (P-sites) mapping on the CDS with respect to the UTRs. The three bars are based on the cumulative nucleotide length of the 5' UTRs, CDSs and 3' UTRs, respectively, expressed as percentages.
List containing: one ggplot object(s) and the data table with the corresponding x-, y-axis values and the z-values, defining the color of the bars ("plot_dt"); an additional data table with raw and scaled number of P-sites per frame for each sample ("count_dt").
## data(reads_list)
## data(mm81cdna)
##
## ## Generate fake samples and replicates
## for(i in 2:6){
## samp_name <- paste0("Samp", i)
## set.seed(i)
## reads_list[[samp_name]] <- reads_list[["Samp1"]][sample(.N, 5000)]
## }
##
## ## Compute and add p-site details
## psite_offset <- psite(reads_list, flanking = 6, extremity = "auto")
## reads_psite_list <- psite_info(reads_list, psite_offset)
##
## ## Define the list of samples and replicate to use as input
## input_samples <- list("S1" = c("Samp1", "Samp2"),
## "S2" = c("Samp3", "Samp4", "Samp5"),
## "S3" = c("Samp6"))
##
## Generate bar plot
## example_psite_per_region <- region_psite(reads_psite_list, mm81cdna,
## sample = input_samples,
## multisamples = "average",
## plot_style = "stack",
## cl = 85,
## colour = c("#333f50", "gray70", "#39827c"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.