itsx: Extract Ribosomal RNA Gene Regions from Eukaryotic DNA

Description Usage Arguments Details Value

View source: R/itsx.R

Description

Calls the external program ITSx, which must be installed and on the path. For more information on installation, algorithms, or options, see http://microbiology.se/software/itsx/.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
itsx(
  in_file,
  out_root = tempfile("itsx"),
  taxon = "fungi",
  e_value = 1e-05,
  s_value = 0,
  n_value = 2,
  selection_priority = c("score", "sum", "domains", "eval"),
  search_eval = 0.01,
  search_score = NULL,
  allow_single_domain = c(1e-09, 0),
  allow_reorder = FALSE,
  complement = TRUE,
  cpu = 1,
  multi_thread = cpu > 1,
  heuristics = FALSE,
  nhmmer = FALSE,
  summary = TRUE,
  graphical = TRUE,
  fasta = TRUE,
  preserve = FALSE,
  save_regions = c("ITS1", "ITS2"),
  anchor = 0,
  require_anchor = 0,
  only_full = FALSE,
  partial = 0,
  concat = FALSE,
  minlen = 0,
  positions = TRUE,
  table = FALSE,
  detailed_results = FALSE,
  not_found = TRUE,
  truncate = TRUE,
  silent = FALSE,
  graph_scale = 0,
  save_raw = FALSE,
  read_function = NULL
)

Arguments

in_file

Name of a fasta file to read sequences from. Alternatively, a member of classes ShortRead, DNAStringSet or SeqFastadna, in which case the sequences will be written to a temporary .fasta file.

out_root

Root name for all output files. The default will put these in a pseudorandomly generated file name in the R temp directory, which is probably not what you want if you intend to access them later.

taxon

The taxonomic group(s) to attempt to find rDNA from. See the manual for ITSx for available options. In contrast to the default for ITSx ("all") the default for this function is "fungi".

e_value

E-value cutoff for inclusion in results.

s_value

Score cutoff for inclusion in results.

n_value

Number of domains required for inclusion in results.

selection_priority

Priority for determining sequence origin.

search_eval

Actual E-value cutoff for HMMER search. Only one of search_eval and search_score may be supplied.

search_score

Actual score cutoff for HMMER search. Only one of search_eval and search_score may be supplied.

allow_single_domain

Allow inclusion of sequences where only one domain match is found. Either FALSE or a (double, integer) pair giving inclusion criteria for e_value and Score.

allow_reorder

Allow inclusion of sequences where the domains do not occur in the expected order.

complement

Search the reverse complement of each sequence also.

cpu

Number of threads to use.

multi_thread

Whether to use multiple threads or not.

heuristics

Use heuristic filtering to speed up HMMER search.

nhmmer

Use nhmmer instead of hmmsearch.

summary

Output summary results in [out_root].summary.txt.

graphical

Output graphical results in [out_root].graph.

fasta

Output full ITS (ITS1 + 5.8S + ITS2) results in [out_root].full.fasta.

preserve

Use original sequence headers instead of writing new ones.

save_regions

Additional reagions to save output for; Options are one or more of "SSU", "ITS1", "5.8S", "ITS2", "LSU", or "all" or "none". There will be output in [out_root].SSU.fasta, [out_root].ITS1.fasta, etc.

anchor

Number of extra bases included at the beginning and end of each region. Only one of anchor and require_anchor may be given.

require_anchor

As anchor, but the anchor bases are required to be present for the region to be included in output. Only one of anchor and require_anchor may be given.

only_full

Limit output to full length ITS1 and ITS2 regions.

partial

Save additional files for partial regions. The argument give the minimum number of bases required. These will be saved as [out_root].full_and_partial.fasta for the full ITS region, and as [out_root].SSU.full_and_partial.fasta, [out_root].ITS1.full_and_partial.fasta, etc. for the other regions.

concat

Output concatenated ITS1 and ITS2 regions in [out_root].concat.fasta.

minlen

Minimum length for ITS regions to be included in concat.

positions

Output a table of positions where HMMER matches were detected in [out_root].positions.txt.

table

Output a table of ITS results in [out_root].hmmer.table.

detailed_results

Output detailed results in [out_root].extraction.results.

not_found

Output a list of sequences for which no match was found in [out_root]_no_detections.txt.

truncate

Remove ends of ITS sequences if they extend beyond the ITS region.

silent

Supress printing of information to screen.

graph_scale

Sets the scale of the graphical output.

save_raw

Save raw data from searches in directory [out_root]_ITSx_raw_output.

read_function

A function which can be used to read a fasta format file, such as ShortRead::readFasta, Biostrings::readDNAStringSet, or seqinr::read.fasta. If a value is given, then all output files will be read and then deleted, using the given function to read fasta files. If no fasta output is requested, then this behavior can be triggered by giving any other value, for example TRUE (but also FALSE!)

Details

For basic usage, this function only checks the arguments and gernerates the function call. However, to simplify integration with R-based workflows, it is also possible to specify the in_file as one of several R classes that hold DNA sequences , and by specifying a read_function, the output file(s) can be automatically read into R and returned as members of a list.

Value

The return value of the ITSx program, or if read_function is given, a list containing the contents of all files which were created (except the raw output).


brendanf/rITSx documentation built on April 6, 2020, 4:37 p.m.