CDS: Generate a CDS coordinates table

Description Usage Arguments Details Value

View source: R/CDS-coordinates.R

Description

CDS takes transcript annotation tables in UCSC format and reshapes them to have coordinates for each exon represented on a single row, rather than collapsed into a comma separated string in a single cell.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19

Arguments

tx

A URL to a genome's transcript reference file. This table must have tab separated fields and contain, identifiers for each transcript, chromosome, strand, CDS start/end, and exon start/end information. There should be only one row per transcript and exon start/end columns should contain comma separated cooordinates for each exon in the transcript. An example file can be found here.

gene

A URL to a tab separated file that maps transcript identifiers to common gene names. An example file can be found here.

tx_cols

A character vector of expected column names for the known-gene reference file. Required columns: "tx", "chr", "strand", "cds_start", "cds_end", "exon_start" and "exon_end". All other columns will be ignored.

gene_cols

A character vector of expected column names for the cross-reference file. Required columns: "tx" and "gene". All other columns will be ignored.

shift_start

Number of bases to shift the start positions. Defaults to 1 as this is necessary for compatibility with Biostrings::getSeq which includes the start position in the returned sequence and begins counting bases at 1.

shift_end

Number of bases to shift the end positions. Defaults to 0.

Details

The output of CDS should meet the following standards (1) each row should represent the coordinates of a single exon, (2) exons should be numbered in order with reference to the transcript's strand (e.g. the first exon should include the start codon). The absolute numbering is unimportant so long as they are numbered in the correct order. (3) The first and last exon coordinates should begin with the start codon and end with the stop codon.

To save the trouble of looking up URLs, pre-defined CDS builders are provided. They are named CDS_<Species>_<data-source>_<genome-assembly-ID>().

Value

A data.frame with the following columns where each row represents a single exon:


CicciaLab/iSTOP documentation built on May 9, 2021, 4:55 p.m.