View source: R/func__findSeq.R
findSeq | R Documentation |
This function determines presence of a query sequence in a list of assembly graphs or FASTA files (readable to Bandage). It aims to answer the question that how frequent a query is found in a collection of assemblies. This function does not work on Window OS unless the Linux commandline "cut" is enabled.
findSeq( query = NULL, assemblies = NULL, bandage.path = "./bandage", blast.params = "-task megablast", bandage.params = "--ifilter 95 --evfilter 1e-3 --pathnodes 6 --minhitcov 0.98 --minpatlen 0.98 --maxpatlen 1.02", n.cores = -1, del.temp = TRUE )
query |
Path to a FASTA file, which may contain multiple query sequences. |
assemblies |
A data frame, a character matrix or a CSV file whose first two columns provide strain names and paths to assembly files. These two columns may be named Strain and Assembly for instance. This argument can also be a path to a CSV file (with a header line for column names) for this data frame. For Bandage, a valid assembly file can be either a SPAdes FASTG file or a FASTA file. This function searches the query in every assembly file. Users may use a spreadsheet to create a CSV file for this data frame and import it into R. |
bandage.path |
Path to Bandage, without any backslash or forward slash terminating this parameter. |
blast.params |
Parameters passed directly to BLAST through the option "–blastp" of Bandage. Run "bandage –helpall" for details. Default: megablast. |
bandage.params |
Parameters passed directly to Bandage. Run "bandage –helpall" as well to see all valid parameters. These parameters controls how Bandage identifies a query. |
n.cores |
Number of computational cores that will be used in parallel for this function. It follows the same convention defined in the function findPhysLink. For simplicity, set it to zero to automatically detect and use all available cores; set it to -1 to leave one core out (recommended unless this function is executed through an SLURM job system). |
del.temp |
A logical parameter determing whether to keep all temporal files under the current working directory. Default: removing all of these files. |
A single data frame of identified query paths, one (the top hit) for each assembly. NA values are present if no query path is found at all in an assembly.
Yu Wan (wanyuac@126.com)
paths <- findSeq(query = "integrons.fna", assemblies = a, bandage.path = "apps/Bandage", bandage.params = "--ifilter 95 --evfilter 1e-3 --pathnodes 6 --minhitcov 0.98 --minpatlen 0.98 --maxpatlen 1.02", n.cores = 4, del.temp = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.