doMLST: Perform MLST Analysis Over a List of Genomes

Description Usage Arguments Value Author(s) References

View source: R/quickMLST.R

Description

Takes a list of genome fasta files and perform blastn searches to identify the sequence type for each of the genes/loci available in a mlst scheme.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
doMLST(
  infiles,
  org = "leptospira",
  scheme = 1L,
  schemeFastas = NULL,
  schemeProfile = NULL,
  write = "new",
  ddir = paste0("pubmlst", "_", org, "_", scheme),
  fdir = paste0("alleles", "_", org, "_", scheme),
  n_threads = 1L,
  pid = 90L,
  scov = 1
)

Arguments

infiles

A vector of genome sequence file paths in fasta format. Could also be multifasta files of predicted Open Reading Frames.

org

A valid organism from pubmlst.org. Run listPubmlst_orgs to see available ones.

scheme

integer. The scheme id number for a given organism. Run listPubmlst_schemes to see available schemes for certain org.

schemeFastas

A vector with the path to fasta sequences from each loci of the specified mlst scheme. If it is NULL (default), sequences are downloaded from http://rest.pubmlst.org to dir.

schemeProfile

The path to the profile file (.tab). If left NULL then it is downloaded from http://rest.pubmlst.org to dir.

write

character. One of "new" (Default), "all" or "none". The fist one writes only new alleles found (not reported in pubmlst.org), the second writes all alleles found, and "none" do not write any file.

ddir

A non-existing directory where to download the loci fasta files in case they are not provided by the user. Default: paste0('pubmlst','_',org,'_',scheme)

fdir

A non-existing directory where to write fasta files of found sequences (see write). (Default: paste0('alleles','_',org,'_',scheme)).

n_threads

integer. The number of cores to use. Each job consist on the process for one genome. Blastn searches will use 1 core per job.

pid

Percentage identity threshold to be consider as a novel allele. An integer <= 100. (Default: 90).

scov

Subject coverage threshold to be consider as a novel allele. A numeric between 0 and 1. Not recomended to set it below 0.7 . (Default 1.0) # @details

Value

An object of class "mlst", which consists on a list of 2 dataframes. The first one is the alleles called for the infiles and the ST detected, whereas the second is the scheme profile. The "result" data.frame shows one genome per row and one gene from the selected scheme per column. The last column is the ST detected for each genome. "NA" values means that no allele were found in the respective genome. A 'u' (from 'unknown') plus an integer means that the allele found was not reported in pubmlst.org database; a fasta file with the new allele is written in this case if "write" is set to "new" or "all" (see above). New alleles are compared with each other so the names are properly chosen in case 2 or more of those novel alleles were the same (in that cases, the names would be the same). A number indicates the allele id number of the reported alleles in pubmlst.org . The second data.frame is the profile definition chosen, the logic is the same as the first one but in this case rows are the combination of alleles reported at pubmlst.org . A series of attributes are also given in the "mlst" object, mainly refering to the parameters used by the function.

Author(s)

Ignacio Ferres

References

Altschul, Gish, Miller, Myers & Lipman. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Jolley KA, Bray JE, Maiden MCJ. A RESTful application programming interface for the PubMLST molecular typing and genome databases. Database: The Journal of Biological Databases and Curation. 2017;2017:bax060. doi:10.1093/database/bax060.


iferres/MLSTar documentation built on Dec. 30, 2020, 2:42 p.m.