read.cds: Read the CDS of a given organism

Description Usage Arguments Details Value Author(s) Examples

Description

Read an organism specific Coding Sequence (CDS) file stored in fasta or fastq format. In case the input file includes corrupt sequences (= sequences that do not fulfill the triplet criteria) users can specify the delete.corrupt = TRUE argument to remove corrupt sequences from the input file.

Usage

1
read.cds(file, format, delete.corrupt = FALSE, ...)

Arguments

file

a character string specifying the path to the file storing the CDS.

format

a character string specifying the file format used to store the CDS, e.g. "fasta", "fatsq".

delete.corrupt

a logical value indicating whether corrupt base triplets should be removed from the input file.

...

additional arguments that are used by the readDNAStringSet function.

Details

The read.cds function takes a string specifying the path to the cds file of interest as first argument.

For example, CDS files fulfilling the fasta file format can be downloaded from http://www.ensembl.org/info/data/ftp/index.html.

Alternatively users

Value

A data.frame storing the gene id in the first column, the corresponding sequence as string in the second column, and the sequence length in the third column.

Author(s)

Hajk-Georg Drost

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
### Example Non-Corrupt File
# reading a cds file stored in fasta format
Ath.cds <- read.cds(system.file('seqs/ortho_thal_cds.fasta', package = 'seqreadr'),
                    format = "fasta")

dplyr::glimpse(Ath.cds)

### Example Corrupt File
# reading a cds file stored in fasta format
Ath.cds <- read.cds(system.file('seqs/ortho_thal_cds_corrupt.fasta', package = 'seqreadr'),
                    format         = "fasta",
                    delete.corrupt = TRUE)
                    
dplyr::glimpse(Ath.cds)                   

HajkD/seqreadr documentation built on May 6, 2019, 10:55 p.m.