psite_info: Update reads information according to the inferred P-sites.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

psite_info

R Documentation

Update reads information according to the inferred P-sites.

Description

This function provides additional reads information according to the position of the P-site identfied by psite. It attaches to each data table in a list four columns reporting i) the P-site position with respect to the 1st nucleotide of the transcript, ii) the P-site position with respect to the start and the stop codon of the annotated coding sequence (if any) and iii) the region of the transcript (5' UTR, CDS, 3' UTR) that includes the P-site. Please note: 1) for transcripts not associated to any annotated CDS the P-site position with respect to the start and the stop codon is set to NA; 2) P-sites of short reads (<20 nts) might be located very close to the 5' or 3' extremity, with no biological meaning and causing potential downstream issues; for these reasons, all read lengths showing this feature will be removed. Optionally, additional columns reporting the three nucleotides covered by the P-site, the A-site and the E-site are attached, based on FASTA files or BSgenome data packages containing the transcript nucleotide sequences.

Usage

psite_info(
  data,
  offset,
  site = NULL,
  fastapath = NULL,
  fasta_genome = TRUE,
  refseq_sep = NULL,
  bsgenome = NULL,
  gtfpath = NULL,
  txdb = NULL,
  dataSource = NA,
  organism = NA,
  output_class = "datatable"
)

Arguments

`data`	Either list of data tables or GRangesList object from `bamtolist`, `bedtolist`, `duplicates_filter` or `length_filter`.
`offset`	Data table from `psite`.
`site`	Either "psite, "asite", "esite" or a combination of these strings. It specifies if additional column(s) reporting the three nucleotides covered by the ribosome P-site ("psite"), A-site ("asite") and E-site ("esite") should be added. Note: either `fastapath` or `bsgenome` is required for this purpose. Default is NULL.
`fastapath`	fastapath Character string specifying the FASTA file used in the alignment step, including its path, name and extension. This file can contain reference nucleotide sequences either of genome asseblies (chromosome sequences) or of transcripts (see `Details` and `fasta_genome`). Please make sure the sequences derive from the same release of the annotation file used in the `create_annotation` function. Note: either `fastapath` or `bsgenome` is required to generate additional column(s) specified by `site`. Default is NULL.
`fasta_genome`	Logical value whether the FASTA file specified by `fastapath` contains nucleotide sequences of genome asseblies (chromosome sequences). If TRUE (the default), an annotation object is required (see `gtfpath` and `txdb`). FALSE implies nucleotide sequences of transcripts are provided instead.
`refseq_sep`	Character specifying the separator between reference sequences' name and additional information to discard, stored in the headers of the FASTA file specified by `fastapath` (if any). It might be required for matching the reference sequences' identifiers reported in the input list of data tables. All characters before the first occurrence of the specified separator are kept. Default is NULL i.e. no string splitting is performed.
`bsgenome`	Character string specifying the BSgenome data package with the genome sequences to be loaded. If not already present in the system, it is automatically installed through the biocLite.R script (check the list of available BSgenome data packages by running the `available.genomes` function of the BSgenome package). This parameter must be coupled with an annotation object (see `gtfpath` and `txdb`). Please make sure the sequences included in the specified BSgenome data pakage are in agreement with the sequences used in the alignment step. Note: either `fastapath` or `bsgenome` is required to generate additional column(s) specified by `site`. Default is NULL.
`gtfpath`	Character string specifying the location of a GTF file, including its path, name and extension. Please make sure the GTF file and the sequences specified by `fastapath` or `bsgenome` derive from the same release. Note that either `gtfpath` or `txdb` is required if and only if nucleotide sequences of genome assemblies (chromosome sequences) are provided (see `fastapath` or `bsgenome`). Default is NULL.
`txdb`	Character string specifying the TxDb annotation package to be loaded. If not already present in the system, it is automatically installed through the biocLite.R script (check here the list of available TxDb annotation packages). Please make sure the TxDb annotation package and the sequences specified by `fastapath` or `bsgenome` derive from the same release. Note that either `gtfpath` or `txdb` is required if and only if nucleotide sequences of genome assemblies (chromosome sequences) are provided (see `fastapath` or `bsgenome`). Default is NULL.
`dataSource`	Optional character string describing the origin of the GTF data file. This parameter is considered only if `gtfpath` is specified. For more information about this parameter please refer to the description of dataSource of the `makeTxDbFromGFF` function included in the `GenomicFeatures` package.
`organism`	Optional character string reporting the genus and species of the organism of the GTF data file. This parameter is considered only if `gtfpath` is specified. For more information about this parameter please refer to the description of organism of the `makeTxDbFromGFF` function included in the `GenomicFeatures` package.
`output_class`	Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable".

Details

riboWaltz only works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Most reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. BAM based on transcript coordinates can be generated in two ways: i) aligning directly against transcript sequences; ii) aligning against sequences of genome assemblies i.e. standard chromosome sequences, thus requiring the outputs to be translated in transcript coordinates. The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner, with its option –quantMode TranscriptomeSAM (see Chapter 6 of its manual), is an example of tool providing such a feature.

Value

A list of data tables or a GRangesList object.

Examples

data(reads_list)
data(psite_offset)
data(mm81cdna)

reads_psite_list <- psite_info(reads_list, psite_offset)

LabTranslationalArchitectomics/riboWaltz documentation built on Feb. 25, 2025, 10:17 p.m.

LabTranslationalArchitectomics/riboWaltz index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LabTranslationalArchitectomics/riboWaltz
Optimization of ribosome P-site positioning in ribosome profiling data

psite_info: Update reads information according to the inferred P-sites.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

Update reads information according to the inferred P-sites.

Description

Usage

Arguments

Details

Value

Examples

Related to psite_info in LabTranslationalArchitectomics/riboWaltz...

R Package Documentation

Browse R Packages

We want your feedback!

LabTranslationalArchitectomics/riboWaltz Optimization of ribosome P-site positioning in ribosome profiling data

psite_info: Update reads information according to the inferred P-sites. In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data

Update reads information according to the inferred P-sites.

Description

Usage

Arguments

Details

Value

Examples

Related to psite_info in LabTranslationalArchitectomics/riboWaltz...

R Package Documentation

Browse R Packages

We want your feedback!

LabTranslationalArchitectomics/riboWaltz
Optimization of ribosome P-site positioning in ribosome profiling data

psite_info: Update reads information according to the inferred P-sites.
In LabTranslationalArchitectomics/riboWaltz: Optimization of ribosome P-site positioning in ribosome profiling data