bedtolist: From BED files to lists of data tables or GRangesList...

bedtolistR Documentation

From BED files to lists of data tables or GRangesList objects.

Description

This function reads one or multiple BED files, as generated by bamtobed, converting them into data tables or GRanges objects, arranged in a list or a GRangesList, respectively. In both cases two columns are attached to the original data containing, for each read, the leftmost and rightmost position of the annotated CDS of the reference sequence (if any) with respect to its 1st nucleotide. Please note: start and stop codon positions for transcripts without annotated CDS are set to 0.

Usage

bedtolist(
  bedfolder,
  annotation,
  transcript_align = TRUE,
  name_samples = NULL,
  refseq_sep = FALSE,
  output_class = "datatable"
)

Arguments

bedfolder

Character string specifying the path to the folder storing BED files as generated by bamtobed.

annotation

Data table as generated by create_annotation. Please make sure the name of reference transcripts in the annotation data table match those in the BED files (see also refseq_sep).

transcript_align

Logical value whether BED files in bedfolder come from a transcriptome alignment (intended as an alignment against reference transcript sequences, see Details). If TRUE (the default), reads mapping on the negative strand should not be present and, if any, they are automatically removed.

name_samples

Named character string vector specifying the desired name for the output list elements. A character string for each BED file in bedfolder is required. Plase be careful to name each element of the vector after the correct corresponding BED file in bedfolder, leaving their path and extension out. No specific order is required. Default is NULL i.e. list elements are named after the name of the BED files, leaving their path and extension out.

refseq_sep

Character specifying the separator between reference sequences' name and additional information to discard, stored in the same field (see Details). All characters before the first occurrence of the specified separator are kept. Default is NULL i.e. no string splitting is performed.

output_class

Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable".

Details

riboWaltz only works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Most reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. Nevertheless, BAM based on transcript coordinates can be generated in two ways: i) aligning directly against transcript sequences; ii) aligning against standard chromosome sequences, requiring the outputs to be translated in transcript coordinates. The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner, with its option –quantMode TranscriptomeSAM (see Chapter 6 of its manual), is an example of tool providing such a feature.

refseq_sep is intended to lighten the identifiers of the reference sequences included in the final data table or to modify them to match those in the annotation table. Many details about the reference sequence such as their version (usually dot-separated), their length, name variants, associated gene/transcript/protein names (usually pipe-separated) might indeed be stored in the FASTA file used for the alignment and automatically transferred in the BAM.

Value

A list of data tables or a GRangesList object.

Examples

## path_bed <- "path/to/BED/files"
## bedtolist(bedfolder = path_bed, annotation = mm81cdna)

LabTranslationalArchitectomics/riboWaltz documentation built on Jan. 17, 2024, 12:18 p.m.