findGenes: Finding coding genes

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/external.R

Description

Finding coding genes in genomic DNA using the Prodigal software.

Usage

1
2
3
4
5
6
7
8
9
findGenes(
  genome.file,
  faa.file = "",
  ffn.file = "",
  proc = "single",
  trans.tab = 11,
  mask.N = FALSE,
  bypass.SD = FALSE
)

Arguments

genome.file

A FASTA file with the genome sequence(s).

faa.file

If provided, prodigal will output all proteins to this fasta-file (text).

ffn.file

If provided, prodigal will output all DNA sequences to this fasta-file (text).

proc

Either "single" or "meta", see below.

trans.tab

Either 11 or 4 (see below).

mask.N

Turn on masking of N's (logical)

bypass.SD

Bypass Shine-Dalgarno filter (logical)

Details

The external software Prodigal is used to scan through a prokaryotic genome to detect the protein coding genes. This free software can be installed from https://github.com/hyattpd/Prodigal.

In addition to the standard output from this function, FASTA files with protein and/or DNA sequences may be produced directly by providing filenames in faa.file and ffn.file.

The input proc allows you to specify if the input data should be treated as a single genome (default) or as a metagenome.

The translation table is by default 11 (the standard code), but table 4 should be used for Mycoplasma etc.

The mask.N will prevent genes having runs of N inside. The bypass.SD turn off the search for a Shine-Dalgarno motif.

Value

A GFF-table (see readGFF for details) with one row for each detected coding gene.

Note

The prodigal software must be installed on the system for this function to work, i.e. the command system("prodigal -h") must be recognized as a valid command if you run it in the Console window.

Author(s)

Lars Snipen and Kristian Hovde Liland.

See Also

readGFF, gff2fasta.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Not run: 
# This example requires the external prodigal software
# Using a genome file in this package.
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")

# Searching for coding sequences, this is Mycoplasma (trans.tab = 4)
gff.tbl <- findGenes(genome.file, trans.tab = 4)

# Retrieving the sequences
genome <- readFasta(genome.file)
cds.tbl <- gff2fasta(gff.tbl, genome)

## End(Not run)

microseq documentation built on July 8, 2020, 7:18 p.m.