findGenes: Finding coding genes
In microseq: Basic Biological Sequence Handling

findGenes

R Documentation

Finding coding genes

Description

Finding coding genes in genomic DNA using the Prodigal software.

Usage

findGenes(
  genome,
  prodigal.exe = "prodigal",
  faa.file = "",
  ffn.file = "",
  proc = "single",
  trans.tab = 11,
  mask.N = FALSE,
  bypass.SD = FALSE
)

Arguments

`genome`	A table with columns Header and Sequence, containing the genome sequence(s).
`prodigal.exe`	Command to run the external software prodigal on the system (text).
`faa.file`	If provided, prodigal will output all proteins to this fasta-file (text).
`ffn.file`	If provided, prodigal will output all DNA sequences to this fasta-file (text).
`proc`	Either `"single"` or `"meta"`, see below.
`trans.tab`	Either 11 or 4 (see below).
`mask.N`	Turn on masking of N's (logical)
`bypass.SD`	Bypass Shine-Dalgarno filter (logical)

Details

The external software Prodigal is used to scan through a prokaryotic genome to detect the protein coding genes. The text in prodigal.exe must contain the exact command to invoke barrnap on the system.

In addition to the standard output from this function, FASTA files with protein and/or DNA sequences may be produced directly by providing filenames in faa.file and ffn.file.

The input proc allows you to specify if the input data should be treated as a single genome (default) or as a metagenome. In the latter case the genome are (un-binned) contigs.

The translation table is by default 11 (the standard code), but table 4 should be used for Mycoplasma etc.

The mask.N will prevent genes having runs of N inside. The bypass.SD turn off the search for a Shine-Dalgarno motif.

Value

A GFF-table (see readGFF for details) with one row for each detected coding gene.

Note

The prodigal software must be installed on the system for this function to work, i.e. the command ‘⁠system("prodigal -h")⁠’ must be recognized as a valid command if you run it in the Console window.

Author(s)

Lars Snipen and Kristian Hovde Liland.

Examples

## Not run: 
# This example requires the external prodigal software
# Using a genome file in this package.
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")

# Searching for coding sequences, this is Mycoplasma (trans.tab = 4)
genome <- readFasta(genome.file)
gff.tbl <- findGenes(genome, trans.tab = 4)

# Retrieving the sequences
cds.tbl <- gff2fasta(gff.tbl, genome)

# You may use the pipe operator
library(ggplot2)
readFasta(genome.file) %>% 
  findGenes(trans.tab = 4) %>% 
  filter(Score >= 50) %>% 
  ggplot() +
  geom_histogram(aes(x = Score), bins = 25)

## End(Not run)

microseq documentation built on Aug. 21, 2023, 5:10 p.m.