psite_info | R Documentation |
This function provides additional reads information according to the position
of the P-site identfied by psite
. It attaches to each data
table in a list four columns reporting i) the P-site position with respect to
the 1st nucleotide of the transcript, ii) the P-site position with respect to
the start and the stop codon of the annotated coding sequence (if any) and
iii) the region of the transcript (5' UTR, CDS, 3' UTR) that includes the
P-site. Please note: 1) for transcripts not associated to any annotated CDS
the P-site position with respect to the start and the stop codon is
set to NA; 2) P-sites of short reads (<20 nts) might be located very close to
the 5' or 3' extremity, with no biological meaning and causing potential
downstream issues; for these reasons, all read lengths showing this feature
will be removed. Optionally, additional columns reporting the three
nucleotides covered by the P-site, the A-site and the E-site are attached,
based on FASTA files or BSgenome data packages containing the transcript
nucleotide sequences.
psite_info(
data,
offset,
site = NULL,
fastapath = NULL,
fasta_genome = TRUE,
refseq_sep = NULL,
bsgenome = NULL,
gtfpath = NULL,
txdb = NULL,
dataSource = NA,
organism = NA,
output_class = "datatable"
)
data |
Either list of data tables or GRangesList object from
|
offset |
Data table from |
site |
Either "psite, "asite", "esite" or a combination of these
strings. It specifies if additional column(s) reporting the three
nucleotides covered by the ribosome P-site ("psite"), A-site ("asite") and
E-site ("esite") should be added. Note: either |
fastapath |
fastapath Character string specifying the FASTA file used in
the alignment step, including its path, name and extension. This file can
contain reference nucleotide sequences either of genome asseblies
(chromosome sequences) or of transcripts (see |
fasta_genome |
Logical value whether the FASTA file specified by
|
refseq_sep |
Character specifying the separator between reference
sequences' name and additional information to discard, stored in the
headers of the FASTA file specified by |
bsgenome |
Character string specifying the BSgenome data package with
the genome sequences to be loaded. If not already present in the system, it
is automatically installed through the biocLite.R script (check the list of
available BSgenome data packages by running the
|
gtfpath |
Character string specifying the location of a GTF file,
including its path, name and extension. Please make sure the GTF file and
the sequences specified by |
txdb |
Character string specifying the TxDb annotation package to be
loaded. If not already present in the system, it is automatically installed
through the biocLite.R script (check
here
the list of available TxDb annotation packages). Please make sure the TxDb
annotation package and the sequences specified by |
dataSource |
Optional character string describing the origin of the GTF
data file. This parameter is considered only if |
organism |
Optional character string reporting the genus and species of
the organism of the GTF data file. This parameter is considered only if
|
output_class |
Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable". |
riboWaltz only works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Most reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. BAM based on transcript coordinates can be generated in two ways: i) aligning directly against transcript sequences; ii) aligning against sequences of genome assemblies i.e. standard chromosome sequences, thus requiring the outputs to be translated in transcript coordinates. The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner, with its option –quantMode TranscriptomeSAM (see Chapter 6 of its manual), is an example of tool providing such a feature.
A list of data tables or a GRangesList object.
data(reads_list)
data(psite_offset)
data(mm81cdna)
reads_psite_list <- psite_info(reads_list, psite_offset)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.