import_cds: Import the coding sequences from a fasta file

Description Usage Arguments Details Value Author(s) Examples

View source: R/import_cds.R

Description

This function reads an organism specific CDS stored in a defined file format.

Usage

1
import_cds(file, format, delete_corrupt_cds = TRUE, ...)

Arguments

file

a character string specifying the path to the file storing the CDS.

format

a character string specifying the file format used to store the CDS, e.g. "fasta", "fatsq".

delete_corrupt_cds

a logical value indicating whether sequences with corrupt base triplets should be removed from the input file. This is the case when the length of coding sequences cannot be divided by 3 and thus the coding sequence contains at least one corrupt base triplet.

...

additional arguments that are used by the readDNAStringSet function.

Details

The import_cds function takes a string specifying the path to the cds file of interest as first argument.

It is possible to read in different proteome file standards such as fasta or fastq.

CDS stored in fasta files can be downloaded from http://www.ensembl.org/info/data/ftp/index.html.

Value

A data.table storing the gene id in the first column and the corresponding sequence as string in the second column.

Author(s)

Hajk-Georg Drost

Examples

1
2
3
4
5
6
7
8
## Not run: 
# reading a cds file stored in fasta format
Ath_cds <- import_cds(system.file('seqs/ortho_thal_cds.fasta', package = 'homologr'),
                    format = "fasta")
# look at results
Ath_cds

## End(Not run)

drostlab/homologr documentation built on Sept. 28, 2020, 12:44 a.m.