Description Usage Arguments Details Value
View source: R/CDS-coordinates.R
CDS
takes transcript annotation tables in UCSC format
and reshapes them to have coordinates for each exon represented on a single
row, rather than collapsed into a comma separated string in a single cell.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | CDS(tx, gene, tx_cols, gene_cols, shift_start = 1L, shift_end = 0L)
CDS_example()
CDS_Celegans_UCSC_ce11()
CDS_Dmelanogaster_UCSC_dm6()
CDS_Drerio_UCSC_danRer10()
CDS_Hsapiens_UCSC_hg38()
CDS_Mmusculus_UCSC_mm10()
CDS_Rnorvegicus_UCSC_rn6()
CDS_Scerevisiae_UCSC_sacCer3()
CDS_Athaliana_BioMart_plantsmart28()
|
tx |
A URL to a genome's transcript reference file. This table must have tab separated fields and contain, identifiers for each transcript, chromosome, strand, CDS start/end, and exon start/end information. There should be only one row per transcript and exon start/end columns should contain comma separated cooordinates for each exon in the transcript. An example file can be found here. |
gene |
A URL to a tab separated file that maps transcript identifiers to common gene names. An example file can be found here. |
tx_cols |
A character vector of expected column names for the known-gene reference file. Required columns: "tx", "chr", "strand", "cds_start", "cds_end", "exon_start" and "exon_end". All other columns will be ignored. |
gene_cols |
A character vector of expected column names for the cross-reference file. Required columns: "tx" and "gene". All other columns will be ignored. |
shift_start |
Number of bases to shift the start positions. Defaults to 1 as this is necessary for compatibility with Biostrings::getSeq which includes the start position in the returned sequence and begins counting bases at 1. |
shift_end |
Number of bases to shift the end positions. Defaults to 0. |
The output of CDS
should meet the following standards (1) each row should
represent the coordinates of a single exon, (2) exons should be numbered in
order with reference to the transcript's strand (e.g. the first exon should
include the start codon). The absolute numbering is unimportant so long as they
are numbered in the correct order. (3) The first and last exon coordinates should
begin with the start codon and end with the stop codon.
To save the trouble of looking up URLs, pre-defined CDS builders
are provided. They are named CDS_<Species>_<data-source>_<genome-assembly-ID>()
.
A data.frame with the following columns where each row represents a single exon:
COLUMN-NAME DATA-TYPE
DESCRIPTION
tx chr
Transcript symbol
gene chr
Gene symbol
exon int
Exon rank in gene (lowest contains ATG, highest contains native Stop)
chr chr
Chromosome
strand chr
Strand (+/-)
start int
CDS coordinate start (always <= end)
end int
CDS coordinate end (always >= start)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.