codon_usage_psite: Empirical codon usage indexes.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

codon_usage_psite

R Documentation

Empirical codon usage indexes.

Description

This function computes empirical codon usage indexes based on either the ribosome P-sites, A-site or E-site frequency associated with in-frame P-sites falling in the coding sequence. For each sample it computes 64 triplet-specific codon usage indexes ranging from 0 to 1 and optionally normalized for the frequency of the corresponding triplet within the CDSs. This function also allows to compare either two sets of codon usage indexes from different samples or the set of codon usage indexes computed for one sample and 64 triplet-specific values provided by the user. Multiple samples and replicates can be handled.

Usage

codon_usage_psite(
  data,
  annotation,
  sample,
  multisamples = "average",
  plot_style = "split",
  site = "psite",
  frequency_normalization = TRUE,
  contrast_sample = NULL,
  codon_values = NULL,
  fastapath = NULL,
  fasta_genome = TRUE,
  refseq_sep = NULL,
  bsgenome = NULL,
  gtfpath = NULL,
  txdb = NULL,
  dataSource = NA,
  organism = NA,
  transcripts = NULL,
  length_range = NULL,
  cl = 100,
  include_stop_codons = TRUE,
  label_scatter = FALSE,
  label_number = 64,
  label_aminoacid = FALSE
)

Arguments

`data`	Either list of data tables or GRangesList object from `psite_info`. Each data table may or may not include one or more columns among p_site_codon, a_site_codon and e_site_codon reporting the three nucleotides covered by the P-site, A-site and E-site, respectively. These columns can be previously generated by the `psite_info` function. Otherwise, the column of interest must be specified by `site` and it is automatically generated starting from a FASTA file or a BSgenome data package.
`annotation`	Data table as generated by `create_annotation`.
`sample`	Either character string, character string vector or named list of character string(s)/character string vector(s) specifying the name of the sample(s) and replicate(s) of interest. If a list is provided, each element of the list is considered as an independent sample associated with one ore multiple replicates. Multiple samples and replicates are handled and organized according to `multisamples` and `plot_style`.
`multisamples`	Either "average" or "independent". It specifies how to handle multiple samples and replicates stored in `sample`: if `sample` is a character string vector and `multisamples` is set to "average", the elements of the vector are considered as replicates of one sample. if `sample` is a character string vector and `multisamples` is set to "independent", each element of the vector is analysed independently of the others. if `sample` is a list, `multisamples` must be set to "average". Each element of the list is analysed independently of the others, its replicates averaged and its name reported in the plot. Note: when this parameter is set to "average" the bar plot associated with each sample displays the triplet-specific mean signal computed across the replicates and the corresponding standard error is also reported. Scatter plots (see `contrast_sample`) display the mean signal but not the standard error to streamline the plot. Default is "average".
`plot_style`	Either "split" or "facet". It specifies how to organize and display multiple bar plots: "split": one bar plot for each sample is returned as an independent ggplot object. In each plot the bars are arranged in increasing order according to the codon usage indexes. "facet": the bar plots are placed one below the other, in independent boxes. The bars of all plots are arranged following the alphabetical order of the codons. Note: this parameter is considered only if `contrast_sample` is not specified and thus bar plots are generated. Default is "split".
`site`	Either "psite, "asite", "esite". It specifies if the empirical codon usage indexes should be based on ribosome P-sites ("psite"), A-sites ("asite") or E-sites ("esite"). Default is "psite".
`frequency_normalization`	Logical value whether to normalize the 64 codon usage indexes for the corresponding codon frequencies in coding sequences. Default is TRUE. Note: the low frequency of the three stop codons often leads to codon usage indexes higher for these triplets than for the others. To discard the stop codons from the plots and avoid potential biases in the interpretation of the data, see `include_stop_codons`.
`contrast_sample`	Either character string or character string vector. It specifies the sample(s) (if any) to be considered for the comparison of codon usage indexes between: exactly two elements from those listed either in `sample` (if `sample` is a character string or a character string vector) or in names(`sample`) (if `sample` is a list). one sample and 64 triplet-specific values: `contrast_sample` must be one character string from those listed either in `sample` (if `sample` is a character string or a character string vector) or in names(`sample`) (if `sample` is a list). Moreover, `codon_values` must be provided. Note: when this parameter is specified the function generates a scatter plot instead of one or multiple bar plots, it performs a linear regression and computes the Pearson correlation coefficient. See also `label_scatter`, `label_number` and `label_aminoacid`. Default is "NULL".
`codon_values`	Data table containing 64 triplet-specific values to be compared with the empirical codon usage indexes computed for the sample specified in `contrast_sample`. The data table must contain the DNA or RNA nucleotide sequence of the 64 codons and the corresponding values arranged in two columns named codon and value, respectively. Additional columns are ignored. This parameter is considered only if `contrast_sample` is specified. Default is NULL.
`fastapath`	Character string specifying the FASTA file used in the alignment step, including its path, name and extension. This file can contain reference nucleotide sequences either of genome asseblies (chromosome sequences) or of transcripts (see `Details` and `fasta_genome`). Please make sure the sequences derive from the same release of the annotation file used in the `create_annotation` function. Note: either `fastapath` or `bsgenome` is required to compute the frequency of each codon along sequences, used as normalization factors, even if `data` includes one or more columns among p_site_codon, a_site_codon and e_site_codon. Default is NULL.
`fasta_genome`	Logical value whether the FASTA file specified by `fastapath` contains nucleotide sequences of genome asseblies (chromosome sequences). If TRUE (the default), an annotation object is required (see `gtfpath` and `txdb`). FALSE implies nucleotide sequences of transcripts are provided instead.
`refseq_sep`	Character specifying the separator between reference sequences' name and additional information to discard, stored in the headers of the FASTA file specified by `fastapath` (if any). It might be required for matching the reference sequences' identifiers reported in the input list of data tables. All characters before the first occurrence of the specified separator are kept. Default is NULL i.e. no string splitting is performed.
`bsgenome`	Character string specifying the BSgenome data package with the genome sequences to be loaded. If not already present in the system, it is automatically installed through the biocLite.R script (check the list of available BSgenome data packages by running the `available.genomes` function of the BSgenome package). This parameter must be coupled with an annotation object (see `gtfpath` and `txdb`). Please make sure the sequences included in the specified BSgenome data pakage are in agreement with the sequences used in the alignment step. Note: either `fastapath` or `bsgenome` is required to compute the frequency of each codon along sequences, used as normalization factors, even if `data` includes one or more columns among p_site_codon, a_site_codon and e_site_codon. Default is NULL.
`gtfpath`	Character string specifying the location of a GTF file, including its path, name and extension. Please make sure the GTF file and the sequences specified by `fastapath` or `bsgenome` derive from the same release. Note that either `gtfpath` or `txdb` is required if and only if nucleotide sequences of genome assemblies (chromosome sequences) are provided (see `fastapath` or `bsgenome`). Default is NULL.
`txdb`	Character string specifying the TxDb annotation package to be loaded. If not already present in the system, it is automatically installed through the biocLite.R script (check here the list of available TxDb annotation packages). Please make sure the TxDb annotation package and the sequences specified by `fastapath` or `bsgenome` derive from the same release. Note that either `gtfpath` or `txdb` is required if and only if nucleotide sequences of genome assemblies (chromosome sequences) are provided (see `fastapath` or `bsgenome`). Default is NULL.
`dataSource`	Optional character string describing the origin of the GTF data file. This parameter is considered only if `gtfpath` is specified. For more information about this parameter please refer to the description of dataSource of the `makeTxDbFromGFF` function included in the `GenomicFeatures` package.
`organism`	Optional character string reporting the genus and species of the organism of the GTF data file. This parameter is considered only if `gtfpath` is specified. For more information about this parameter please refer to the description of organism of the `makeTxDbFromGFF` function included in the `GenomicFeatures` package.
`transcripts`	Character string vector listing the name of transcripts to be included in the analysis. Default is NULL i.e. all transcripts are used. Please note: transcripts without annotated CDS and transcripts whose coding sequence length is not divisible by 3 are automatically discarded.
`length_range`	Integer or integer vector for restricting the plot to a chosen range of read lengths. Default is NULL, i.e. all read lengths are used. If specified, this parameter prevails over `cl`.
`cl`	Integer value in 1,100 specifying a confidence level for restricting the plot to an automatically-defined range of read lengths. The new range is computed according to the most frequent read lengths, which accounts for the cl% of the sample and is defined by discarding the (100-cl)% of read lengths falling in the tails of the read lengths distribution. If multiple samples are analysed, a single range of read lengths is computed such that at least the cl% of all samples is represented. Default is 100.
`include_stop_codons`	Logical value whether to include the three stop codons in the plots. Default is TRUE.
`label_scatter`	Logical value whether to label the dots in the scatter plot. Each dot is labeled using either the nucleotide sequence of the codon or the corresponding amino acid symbol (see `label_aminoacid`). This parameter is considered only if `contrast_sample` is specified. Default is FALSE.
`label_number`	Integer value in 1,64 specifying how many dots in the scatter plot should be labeled. Dots farthest from the confident interval of the regression line are automatically identified and labeled. Default is 64 i.e. all dots are labeled. This parameter is considered only if `label_scatter` is TRUE.
`label_aminoacid`	Logical value whether to use amino acid symbols to label the dots of the scatter plot. Default is FALSE i.e. codon nucleotide sequences are used instead. This parameter is considered only if `label_scatter` is TRUE.

Details

riboWaltz only works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Most reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. BAM based on transcript coordinates can be generated in two ways: i) aligning directly against transcript sequences; ii) aligning against sequences of genome assemblies i.e. standard chromosome sequences, thus requiring the outputs to be translated in transcript coordinates. The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner, with its option –quantMode TranscriptomeSAM (see Chapter 6 of its manual), is an example of tool providing such a feature.

Value

List containing: one or more ggplot object(s) and the data table with the corresponding x- and y-axis values ("plot_dt"); an additional data table with raw, normalized and scaled number of P-sites associated with the 64 triplets for each sample contributing to the plot ("count_dt").

Examples

## data(reads_list)
## data(mm81cdna)
##
## ## Generate fake samples and replicates
## for(i in 2:6){
##   samp_name <- paste0("Samp", i)
##   set.seed(i)
##   reads_list[[samp_name]] <- reads_list[["Samp1"]][sample(.N, 5000)]
## }
##
## ## Compute and add p-site details
## psite_offset <- psite(reads_list, flanking = 6, extremity = "auto")
## reads_psite_list <- psite_info(reads_list, psite_offset)
##
## ## Define the list of samples and replicate to use as input
## input_samples <- list("S1" = c("Samp1", "Samp2"),
##                       "S2" = c("Samp3", "Samp4", "Samp5"),
##                       "S3" = c("Samp6"))
##
## Generate bar plots, alignment on transcript sequences
## example_cu_barplot <- codon_usage_psite(reads_psite_list, mm81cdna,
##                                        sample = input_samples,
##                                        multisamples = "average",
##                                        plot_style = "facet",
##                                        fastapath = "path/to/transcriptome/FASTA/file",
##                                        fasta_genome = FALSE)
##
## Generate bar plots,  alignment on chromosome sequences
## example_cu_barplot <- codon_usage_psite(reads_psite_list, mm81cdna,
##                                        sample = input_samples,
##                                        multisamples = "average",
##                                        plot_style = "facet",
##                                        fastapath = "path/to/chromosome/FASTA/file",
##                                        fasta_genome = TRUE) 
##
## Generate scatterplot comparing two samples
## example_cu_barplot <- codon_usage_psite(reads_psite_list, mm81cdna,
##                                        sample = input_samples,
##                                        contrast_sample = c("S1", "S2"),
##                                        fastapath = "path/to/transcriptome/FASTA/file",
##                                        fasta_genome = FALSE,
##                                        frequency_normalization = FALSE)

LabTranslationalArchitectomics/riboWaltz documentation built on Feb. 25, 2025, 10:17 p.m.

LabTranslationalArchitectomics/riboWaltz index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LabTranslationalArchitectomics/riboWaltz
Optimization of ribosome P-site positioning in ribosome profiling data

codon_usage_psite: Empirical codon usage indexes.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

Empirical codon usage indexes.

Description

Usage

Arguments

Details

Value

Examples

Related to codon_usage_psite in LabTranslationalArchitectomics/riboWaltz...

R Package Documentation

Browse R Packages

We want your feedback!

LabTranslationalArchitectomics/riboWaltz Optimization of ribosome P-site positioning in ribosome profiling data

codon_usage_psite: Empirical codon usage indexes. In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

Empirical codon usage indexes.

Description

Usage

Arguments

Details

Value

Examples

Related to codon_usage_psite in LabTranslationalArchitectomics/riboWaltz...

R Package Documentation

Browse R Packages

We want your feedback!

LabTranslationalArchitectomics/riboWaltz
Optimization of ribosome P-site positioning in ribosome profiling data

codon_usage_psite: Empirical codon usage indexes.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data