read_cds: Import CDS as Biostrings or data.table object
In biomartr: Genomic Data Retrieval

read_cds

R Documentation

Import CDS as Biostrings or data.table object

This function reads an organism specific CDS stored in a defined file format.

read_cds(
  file,
  format = "fasta",
  obj.type = "Biostrings",
  delete_corrupt = FALSE,
  ...
)

`file`	a character string specifying the path to the file storing the CDS.
`format`	a character string specifying the file format used to store the genome, e.g. `format = "fasta"` (default) or `format = "gbk"`.
`obj.type`	a character string specifying the object stype in which the genomic sequence shall be represented. Either as `obj.type = "Biostrings"` (default) or as `obj.type = "data.table"`.
`delete_corrupt`	a logical value specifying whether potential CDS sequences that cannot be divided by 3 shall be be excluded from the the dataset. Default is `delete_corrupt = FALSE`.
`...`	additional arguments that are used by `read.fasta`.

The read.cds function takes a string specifying the path to the cds file of interest as first argument.

It is possible to read in different proteome file standards such as fasta or genebank.

CDS stored in fasta files can be downloaded from http://www.ensembl.org/info/data/ftp/index.html.

A data.table storing the gene id in the first column and the corresponding sequence as string in the second column.

Hajk-Georg Drost