bedtolist | R Documentation |
This function reads one or multiple BED files, as generated by
bamtobed
, converting them into data tables or GRanges objects,
arranged in a list or a GRangesList, respectively. In both cases two columns
are attached to the original data containing, for each read, the leftmost
and rightmost position of the annotated CDS of the reference sequence (if
any) with respect to its 1st nucleotide. Please note: start and stop codon
positions for transcripts without annotated CDS are set to 0.
bedtolist(
bedfolder,
annotation,
transcript_align = TRUE,
name_samples = NULL,
refseq_sep = FALSE,
output_class = "datatable"
)
bedfolder |
Character string specifying the path to the folder storing
BED files as generated by |
annotation |
Data table as generated by |
transcript_align |
Logical value whether BED files in |
name_samples |
Named character string vector specifying the desired name
for the output list elements. A character string for each BED file in
|
refseq_sep |
Character specifying the separator between reference
sequences' name and additional information to discard, stored in the same
field (see |
output_class |
Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable". |
riboWaltz only works for read alignments based on transcript coordinates. This choice is due to the main purpose of RiboSeq assays to study translational events through the isolation and sequencing of ribosome protected fragments. Most reads from RiboSeq are supposed to map on mRNAs and not on introns and intergenic regions. Nevertheless, BAM based on transcript coordinates can be generated in two ways: i) aligning directly against transcript sequences; ii) aligning against standard chromosome sequences, requiring the outputs to be translated in transcript coordinates. The first option can be easily handled by many aligners (e.g. Bowtie), given a reference FASTA file where each sequence represents a transcript, from the beginning of the 5' UTR to the end of the 3' UTR. The second procedure is based on reference FASTA files where each sequence represents a chromosome, usually coupled with comprehensive gene annotation files (GTF or GFF). The STAR aligner, with its option –quantMode TranscriptomeSAM (see Chapter 6 of its manual), is an example of tool providing such a feature.
refseq_sep
is intended to lighten the identifiers of the reference
sequences included in the final data table or to modify them to match those
in the annotation table. Many details about the reference sequence such as
their version (usually dot-separated), their length, name variants,
associated gene/transcript/protein names (usually pipe-separated) might
indeed be stored in the FASTA file used for the alignment and automatically
transferred in the BAM.
A list of data tables or a GRangesList object.
## path_bed <- "path/to/BED/files"
## bedtolist(bedfolder = path_bed, annotation = mm81cdna)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.