psite | R Documentation |
This function identifies the exact position of the ribosome P-site within
each read, determined by the localisation of its first nucleotide (see
Details
). It returns a data table containing, for all samples and read
lengths: i) the percentage of reads in the whole dataset, ii) the percentage
of reads aligning on the start codon (if any); iii) the distance of the
P-site from the two extremities of the reads before and after the correction
step; iv) the name of the sample. Optionally, this function plots a
collection of read length-specific occupancy metaprofiles displaying the
P-site offsets computed through the process.
psite(
data,
flanking = 6,
start = TRUE,
extremity = "auto",
plot = FALSE,
plot_dir = NULL,
plot_format = "png",
cl = 99,
txt = FALSE,
txt_file = NULL
)
data |
Either list of data tables or GRangesList object from
|
flanking |
Integer value specifying for the selected reads the minimum number of nucleotides that must flank the reference codon in both directions. Default is 6. |
start |
Logical value whether to use the translation initiation site as reference codon. Default is TRUE. If FALSE, the second to last codon is used instead. |
extremity |
Either "5end", "3end" or "auto". It specifies if the correction step should be based on 5' extremities ("5end") or 3' extremities ("3end"). Default is "auto" i.e. the optimal extremity is automatically selected. |
plot |
Logical value whether to plot the occupancy metaprofiles displaying the P-site offsets computed in both steps of the algorithm. Default is FALSE. |
plot_dir |
Character string specifying the directory where read
length-specific occupancy metaprofiles shuold be stored. If the specified
folder doesn't exist, it is automatically created. If NULL (the default),
the metaprofiles are stored in a new subfolder of the working directory,
called offset_plot. This parameter is considered only if |
plot_format |
Either "png" (the default) or "pdf". This parameter
specifies the file format storing the length-specific occupancy
metaprofiles. It is considered only if |
cl |
Integer value in 1,100 specifying a confidence level for
generating occupancy metaprofiles for to a sub-range of read lengths i.e.
for the cl% of read lengths associated to the highest signals. Default is
99. This parameter is considered only if |
txt |
Logical value whether to write in a txt file the extremity used for the correction step and the best offset for each sample. Similar information are displayed by default in the console. Default is FALSE. |
txt_file |
Character string specifying the path, name and extension
(e.g. "PATH/NAME.extension") of the plain text file where the extremity
used for the correction step and the best offset for each sample shuold be
written. If the specified folder doesn't exist, it is automatically
created. If NULL (the default), the information are written in
"best_offset.txt" and saved in the working directory. This parameter
is considered only if |
The P-site offset (PO) is defined as the distance between the
extremities of a read and the first nucleotide of the P-site itself. The
function processes all samples separately starting from reads mapping on
the reference codon (either the start codon or the second to last codon,
see start
) of any annotated coding sequences. Read lengths-specific
POs are inferred in two steps. First, reads mapping on the reference codon
are grouped according to their length, each group corresponding to a bin.
Reads whose extremities are too close to the reference codon are discarded
(see flanking
). For each bin temporary 5' and 3' POs are defined as
the distances between the first nucleotide of the reference codon and the
nucleotide corresponding to the global maximum found in the profiles of the
5' and the 3' end at the left and at the right of the reference codon,
respectively. After the identification of the P-site for all reads aligning
on the reference codon, the POs corresponding to each length are assigned
to each read of the dataset. Second, the most frequent temporary POs
associated to the optimal extremity (see extremity
) and the
predominant bins are exploited as reference values for correcting the
temporary POs of smaller bins. Briefly, the correction step defines for
each length bin a new PO based on the local maximum, whose distance from
the reference codon is the closest to the most frequent temporary POs. For
further details please refer to the riboWaltz article (available
here).
A data table.
data(reads_list)
## Compute the P-site offset automatically selecting the optimal read
## extremity for the correction step and not plotting any metaprofile:
psite(reads_list, flanking = 6, extremity="auto")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.