psite: Ribosome P-sites position within reads.

View source: R/psites.R

psiteR Documentation

Ribosome P-sites position within reads.

Description

This function identifies the exact position of the ribosome P-site within each read, determined by the localisation of its first nucleotide (see Details). It returns a data table containing, for all samples and read lengths: i) the percentage of reads in the whole dataset, ii) the percentage of reads aligning on the start codon (if any); iii) the distance of the P-site from the two extremities of the reads before and after the correction step; iv) the name of the sample. Optionally, this function plots a collection of read length-specific occupancy metaprofiles displaying the P-site offsets computed through the process.

Usage

psite(
  data,
  flanking = 6,
  start = TRUE,
  extremity = "auto",
  plot = FALSE,
  plot_dir = NULL,
  plot_format = "png",
  cl = 99,
  txt = FALSE,
  txt_file = NULL
)

Arguments

data

Either list of data tables or GRangesList object from bamtolist, bedtolist, duplicates_filter or length_filter.

flanking

Integer value specifying for the selected reads the minimum number of nucleotides that must flank the reference codon in both directions. Default is 6.

start

Logical value whether to use the translation initiation site as reference codon. Default is TRUE. If FALSE, the second to last codon is used instead.

extremity

Either "5end", "3end" or "auto". It specifies if the correction step should be based on 5' extremities ("5end") or 3' extremities ("3end"). Default is "auto" i.e. the optimal extremity is automatically selected.

plot

Logical value whether to plot the occupancy metaprofiles displaying the P-site offsets computed in both steps of the algorithm. Default is FALSE.

plot_dir

Character string specifying the directory where read length-specific occupancy metaprofiles shuold be stored. If the specified folder doesn't exist, it is automatically created. If NULL (the default), the metaprofiles are stored in a new subfolder of the working directory, called offset_plot. This parameter is considered only if plot is TRUE.

plot_format

Either "png" (the default) or "pdf". This parameter specifies the file format storing the length-specific occupancy metaprofiles. It is considered only if plot is TRUE.

cl

Integer value in 1,100 specifying a confidence level for generating occupancy metaprofiles for to a sub-range of read lengths i.e. for the cl% of read lengths associated to the highest signals. Default is 99. This parameter is considered only if plot is TRUE.

txt

Logical value whether to write in a txt file the extremity used for the correction step and the best offset for each sample. Similar information are displayed by default in the console. Default is FALSE.

txt_file

Character string specifying the path, name and extension (e.g. "PATH/NAME.extension") of the plain text file where the extremity used for the correction step and the best offset for each sample shuold be written. If the specified folder doesn't exist, it is automatically created. If NULL (the default), the information are written in "best_offset.txt" and saved in the working directory. This parameter is considered only if txt is TRUE.

Details

The P-site offset (PO) is defined as the distance between the extremities of a read and the first nucleotide of the P-site itself. The function processes all samples separately starting from reads mapping on the reference codon (either the start codon or the second to last codon, see start) of any annotated coding sequences. Read lengths-specific POs are inferred in two steps. First, reads mapping on the reference codon are grouped according to their length, each group corresponding to a bin. Reads whose extremities are too close to the reference codon are discarded (see flanking). For each bin temporary 5' and 3' POs are defined as the distances between the first nucleotide of the reference codon and the nucleotide corresponding to the global maximum found in the profiles of the 5' and the 3' end at the left and at the right of the reference codon, respectively. After the identification of the P-site for all reads aligning on the reference codon, the POs corresponding to each length are assigned to each read of the dataset. Second, the most frequent temporary POs associated to the optimal extremity (see extremity) and the predominant bins are exploited as reference values for correcting the temporary POs of smaller bins. Briefly, the correction step defines for each length bin a new PO based on the local maximum, whose distance from the reference codon is the closest to the most frequent temporary POs. For further details please refer to the riboWaltz article (available here).

Value

A data table.

Examples

data(reads_list)

## Compute the P-site offset automatically selecting the optimal read
## extremity for the correction step and not plotting any metaprofile:
psite(reads_list, flanking = 6, extremity="auto")

LabTranslationalArchitectomics/riboWaltz documentation built on Jan. 17, 2024, 12:18 p.m.