rlength_distr: Read length distributions.

View source: R/read_length_plot.R

rlength_distrR Documentation

Read length distributions.

Description

This function generates read length distributions, displayed as bar plots. Multiple samples and replicates can be handled..

Usage

rlength_distr(
  data,
  sample,
  multisamples = "average",
  plot_style = "split",
  scale_factors = "auto",
  transcripts = NULL,
  length_range = NULL,
  cl = 100,
  colour = NULL
)

Arguments

data

Either list of data tables or GRangesList object from bamtolist, bedtolist, length_filter or psite_info.

sample

Either character string, character string vector or named list of character string(s)/character string vector(s) specifying the name of the sample(s) and replicate(s) of interest. If a list is provided, each element of the list is considered as an independent sample associated with one ore multiple replicates. Multiple samples and replicates are handled and visualised according to multisamples and plot_style.

multisamples

Either "average" or "independent". It specifies how to handle multiple samples and replicates stored in sample:

  • if sample is a character string vector and multisamples is set to "average" the elements of the vector are considered as replicates of one sample and a single bar plot is returned.

  • if sample is a character string vector and multisamples is set to "independent", each element of the vector is analysed independently of the others. The number of plots returned and their organization is specified by plot_style.

  • if sample is a list, multisamples must be set to "average". Each element of the list is analysed independently of the others, its replicates averaged and its name reported in the plot. The number of plots returned and their organization is specified by plot_style. Note: when this parameter is set to "average" the bar plot associated with each sample displays the length-specific mean signal computed across the replicates and the corresponding standard error is also reported. Default is "average".

plot_style

Either "split", "facet", "dodge" or "mirror". It specifies how to organize and display multiple bar plots:

  • "split": one bar plot for each sample is returned as an independent ggplot object;

  • "facet": the bar plots are placed one next to the other, in independent boxes;

  • "dodge": all bar plots are displayed in one box and, for each length, samples are placed side by side.

  • "mirror": sample must be either a character string vector or a list of exactly two elements and the resulting bar plots are mirrored along the x axis. Default is "split".

scale_factors

Either "auto", a named numeric vector or "none". It specifies how read length distributions should be scaled before merging multiple samples (if any):

  • "auto": each distribution is scaled so that the sum of all bars is 100.

  • named numeric vector: scale_factors must be the same length of unlisted sample and each scale factor must be named after the corresponding string in unlisted sample. No specific order is required. Each distribution is multiplied by the matching scale factor.

  • "none": no scaling is applied. Default is "auto".

transcripts

Character string vector listing the name of transcripts to be included in the analysis. Default is NULL, i.e. all transcripts are used.

length_range

Integer or integer vector for restricting the plot to a chosen range of read lengths. Default is NULL, i.e. all read lengths are used. If specified, this parameter prevails over cl.

cl

Integer value in 1,100 specifying a confidence level for restricting the plot to an automatically-defined range of read lengths. The new range is computed according to the most frequent read lengths, which accounts for the cl% of the sample and is defined by discarding the (100-cl)% of read lengths falling in the tails of the read lengths distribution. If multiple samples are analysed, a single range of read lengths is computed such that at least the cl% of all samples is represented. Default is 100.

colour

Character string or character string vector specifying the colour of the bar plot(s). If plot_style is set to either "dodge" or "mirror", a colour for each sample is required. Default is NULL, i.e. the default R colour palette is used.

Value

List containing: one or more ggplot object(s) and the data table with the corresponding x- and y-axis values ("plot_dt"); an additional data table with raw and scaled number of reads per length in each sample ("count_dt").

Examples

data(reads_list)

## Generate fake samples and replicates
for(i in 2:6){
  samp_name <- paste0("Samp", i)
  set.seed(i)
  reads_list[[samp_name]] <- reads_list[["Samp1"]][sample(.N, 5000)]
}

## Define the list of samples and replicate to use as input
input_samples <- list("S1" = c("Samp1", "Samp2"),
                      "S2" = c("Samp3", "Samp4", "Samp5"),
                      "S3" = c("Samp6"))

## Generate the length distribution for a sub-range of read lengths:
example_length_dist <- rlength_distr(reads_list,
                                     sample = input_samples,
                                     multisamples = "average",
                                     plot_style = "facet",
                                     cl = 99,
                                     colour = c("#333f50", "#39827c", "gray70"))

LabTranslationalArchitectomics/riboWaltz documentation built on Jan. 17, 2024, 12:18 p.m.