Home

/

GitHub

/

brendanf/inferrnal

/

cmsearch: Search for Covariance Models (CM) in a set of sequences.

cmsearch: Search for Covariance Models (CM) in a set of sequences.
In brendanf/inferrnal: Interface to Call Programs from Infernal RNA Covariance Model Package

View source: R/inferrnal.R

cmsearch

R Documentation

Search for Covariance Models (CM) in a set of sequences.

Description

This function calls "cmsearch" from Infernal. Infernal must be installed. The function is focused on retrieving the hits table and, optionally, producing an alignment.

Usage

cmsearch(
  cm,
  seq,
  glocal = TRUE,
  Z = NULL,
  output = NULL,
  alignment = NULL,
  acc = FALSE,
  noali = TRUE,
  notextw = FALSE,
  textw = NULL,
  verbose = FALSE,
  E = NULL,
  t = NULL,
  incE = NULL,
  incT = NULL,
  cut_ga = FALSE,
  cut_nc = FALSE,
  cut_tc = FALSE,
  filter_strategy = NULL,
  FZ = NULL,
  Fmid = NULL,
  notrunc = FALSE,
  anytrunc = FALSE,
  nonull3 = FALSE,
  mxsize = NULL,
  smxsize = NULL,
  cyk = FALSE,
  acyk = FALSE,
  wcx = NULL,
  toponly = TRUE,
  bottomonly = FALSE,
  tformat = NULL,
  cpu = NULL,
  stall = FALSE,
  mpi = FALSE,
  quiet = TRUE,
  table_function = c("fwf", "table", "lines")
)

Arguments

`cm`	(character of length 1) Path to the covariance model (.cm) file. The covariance model must include calibration data from running "`cmcalibrate`".
`seq`	(filename, character vector, `Biostrings::XStringSet`, or `ShortRead::ShortRead` Sequences to search with the CM. If a filename, the file can be of any type supported by Infernal.
`glocal`	(`logical` of length 1) Whether to run the search in glocal mode (global with respect to the model, local with respect to the sequence). When `TRUE`, the search is faster, but will fail to find matches with only partially homologous structure.
`Z`	(`numeric` scalar) Effective search space size in Mb for the purposes of calculating E-values.
`output`	(`character` filename) File to send the human-readable output to.
`alignment`	(`character` filename) A file to save the aligned hits to. If given, the alignment is saved in Stockholm format with annotations for secondary structure, posterior probablility, etc.
`acc`	(`logical` scalar) Use accessions instead of names in the main output.
`noali`	(`logical` scalar) Omit the alignment section from the main output.
`notextw`	(`logical` scalar) Unlimit the length of each line in the main output.
`textw`	(`numeric` scalar) Set the main output’s line length limit in characters per line. Default: `120`.
`verbose`	(`logical` scalar)
`E`	(`numeric` scalar) Maximum E-value for reporting in per-target output. Default: `10.0`
`t`	(`numeric` scalar) Maximum bit score for reporting in per-target reporting. Default: `NULL`
`incE`	(`numeric` scalar) Maximum E-value for hit inclusion. Default: `0.01`
`incT`	(`numeric` scalar) Maximum bit score for hit inclusion. Default: `NULL`
`cut_ga`	(`logical` scalar) Use the GA (gathering) bit scores defined in the CM to set hit reporting and inclusion thresholds.
`cut_nc`	(`logical` scalar) Use the NC (noise cutoff) bit score thresholds defined in the CM to set hit reporting and inclusion thresholds.
`cut_tc`	(`logical` scalar) Use the TC (trusted cutoff) bit score thresholds defined in the CM to set hit reporting and inclusion thresholds.
`filter_strategy`	(`character` string) Filtering strategy for the acceleration pipeline. Options, from slowest (most sensitive) to fastest (least sensitive) are `"max"`, `"nohmm"`, `"mid"`, `"default"`, `"rfam"`, `"hmmonly"`.
`FZ`	(`numeric` scalar) Effective database size in Mb for determining filter thresholds.
`Fmid`	(`numeric` scalar) HMM filter thresholds for the "mid" filtering strategy. Default: `0.02`
`notrunc`	(`logical` scalar) Turn off truncated hit detection.
`anytrunc`	(`logical` scalar) Allow truncated hits to begin and end at any position in a target sequence.
`nonull3`	(`logical` scalar) Turn off the null3 CM score corrections for biased composition.
`mxsize`	(`numeric` scalar) Maximum allowable CM DP matrix size in megabytes. Default: `128`
`smxsize`	(`numeric` scalar) Maximum allowable CM search DP matrix size in megabytes. Default: `128`
`cyk`	(`logical` scalar) Use the CYK algorithm, not Inside, to determine the final score of all hits.
`acyk`	(`logical` scalar) Use the CYK algorithm to align hits.
`wcx`	(`numeric` scalar) Expected maximum length of a hit, as a multiplier of consensus length of the model.
`toponly`	(`logical` scalar) Only search the top (Watson) strand of target sequences.
`bottomonly`	(`logical` scalar) Only search the bottom (Crick) strand of target sequences.
`tformat`	(`character` string) Format of the sequences in "`seq`". Options are `"fasta"`, `"embl"`, `"genbank"`, `"ddbj"`, `"stockholm"`, `"pfam"`, `"a2m"`, `"afa"`, `"clustal"`, and `"phylip"`. Default: autodetect
`cpu`	(`integer` scalar) The number of threads to use in the search.
`stall`	(`logical` scalar) For debugging the MPI master/worker version: pause after start, to enable the developer toattach debuggers to the running master and worker(s) processes
`mpi`	(`logical` scalar) Run in MPI master/worker mode, using mpirun.
`quiet`	(`logical` scalar) Suppress standard output of `cmsearch`, which can be long.
`table_function`	(`character` string) Either `"fwf"` (default), `"table"`, or `"lines"`; whether to read the hits table using `utils::read.fwf()`, `utils::read.table()`, or `readLines()`. Neither of the two parsers works in all cases; `utils::read.fwf()` fails when more than one CM is present in the CM file, while `utils::read.table()` fails when any of the fields contain spaces.

Value

a base::data.frame with columns:

target_name (character) the name of the target sequence
taget_accession(character) the target's accession number
query_name(character) the name of the query CM
query_accession(character) the query CM's accession number
mdl(character) the model type ("cm" or "hmm")
mdl_from(integer) the start location of the hit in the model
mdl_to(integer) the end location of the hit in the model
seq_from(integer) the start location of the hit in the sequence
seq_to(integer) the end location of the hit in the sequence
strand(character) the strand the hit was found on ("+" or "-")
trunc)(character) whether the hit is truncated, and where ("no", "5'", "3'", "5'&3'", or "-" for hmm hits).
pass(integer) which algorithm pass the hit was found on.
gc(numeric) GC content of the hit
bias(numeric) biased composition correction. See the Infernal documentation.
score(numeric) bit-score of the hit, including the biased composition correction.
E_value(numeric) Expectation value for the hit.

inc

(character) "!" if the sequence meets the inclusion threshold, "?" if it only meets the reporting threshold.

Examples

    # search sequences from a fasta file for Rfam RF00002 (5.8S rRNA)
    cm <- cm_5_8S()
    sampfasta <- sample_rRNA_fasta()
    cmsearch(cm = cm, seq = sampfasta, cpu = 1)
    # also works if the fasta file has already been loaded
    samp <- Biostrings::readDNAStringSet(sampfasta)
    cmsearch(cm = cm, seq = samp, cpu = 1)

brendanf/inferrnal documentation built on Nov. 26, 2024, 4:24 p.m.

brendanf/inferrnal index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

brendanf/inferrnal
Interface to Call Programs from Infernal RNA Covariance Model Package

cmsearch: Search for Covariance Models (CM) in a set of sequences.
In brendanf/inferrnal: Interface to Call Programs from Infernal RNA Covariance Model Package

Search for Covariance Models (CM) in a set of sequences.

Description

Usage

Arguments

Value

Examples

Related to cmsearch in brendanf/inferrnal...

R Package Documentation

Browse R Packages

We want your feedback!

brendanf/inferrnal Interface to Call Programs from Infernal RNA Covariance Model Package

cmsearch: Search for Covariance Models (CM) in a set of sequences. In brendanf/inferrnal: Interface to Call Programs from Infernal RNA Covariance Model Package

Search for Covariance Models (CM) in a set of sequences.

Description

Usage

Arguments

Value

Examples

Related to cmsearch in brendanf/inferrnal...

R Package Documentation

Browse R Packages

We want your feedback!

brendanf/inferrnal
Interface to Call Programs from Infernal RNA Covariance Model Package

cmsearch: Search for Covariance Models (CM) in a set of sequences.
In brendanf/inferrnal: Interface to Call Programs from Infernal RNA Covariance Model Package