cmalign: Align sequences to a covariance model

View source: R/inferrnal.R

cmalignR Documentation

Align sequences to a covariance model

Description

This function calls cmalign from Infernal. Infernal must be installed and on the path. Not all options are included.

Usage

cmalign(
  cmfile,
  seq,
  global = TRUE,
  algorithm = NULL,
  sample = FALSE,
  seed = NULL,
  notrunc = FALSE,
  sub = FALSE,
  hbanded = TRUE,
  tau = NULL,
  mxsize = NULL,
  fixedtau = FALSE,
  maxtau = NULL,
  small = FALSE,
  sfile = NULL,
  tfile = NULL,
  ifile = NULL,
  elfile = NULL,
  mapali = NULL,
  mapstr = FALSE,
  dnaout = FALSE,
  noprob = FALSE,
  matchonly = FALSE,
  ileaved = FALSE,
  regress = NULL,
  verbose = FALSE,
  cpu = NULL,
  mpi = FALSE,
  extra = NULL,
  glocal = global
)

Arguments

cmfile

(character filename) path to a covariance model file

seq

(character filename, character vector, Biostrings::XStringSet, or ShortRead::ShortRead sequences to align to the covariance model. This may be given as a character path to a fasta file, the sequences as a character vector, or an object of class DNAStringSet, RNAStringSet, or ShortRead. For cmalign, the sequences should be known a priori to represent the same region as the CM; to find the region in longer sequences and align it, use the alignment option of cmsearch().

global

(logical scalar) If TRUE, align in global mode. See Infernal documentation for more information.

algorithm

(character string) Alignment algorithm. Options are "optacc" and "cyk". Default: "optacc"

sample

(logical scalar) Sample an alignment from the posterior distribution of alignments.

seed

(integer scalar) Random seed for sampling an alignment.

notrunc

(logical scalar) Turn off truncated alignment algorithms.

sub

(logical scalar) turn on the sub model construction and alignment procedure

hbanded

(logical scalar) Accelerate alignment by pruning away regions of the CMDP matrix that are deemed negligible by an HMM. Default: TRUE

tau

(numeric scalar) Tail loss probability used during HMM band calculation. Default: 1E-7

mxsize

(numeric scalar) The maximum DP matrix size, in Mb. Maximum potential memory usage is approximately cpu*mxsize, although this is usually not realized. See Infernal documentation for more information.

fixedtau

(logical scalar) Turn off HMM band tightening.

maxtau

(numeric scalar) Maximum allowed value for tau during band tightening. Default: 0.05

small

(logical scalar) Use the divide and conquer CYK alignment algorithm, greatly reducing memory consumption.

sfile

(character filename) Dump per-sequence alignment score and timing information to file.

tfile

(character filename) Dump tabular sequence tracebacks for each individual sequence to a file.

ifile

(character filename) Dump per-sequence insert information to file.

elfile

(character filename) Dump per-sequence EL state (local end) insert information to file.

mapali

(character filename) Read the alignment from the file used to build the model aligns it as a single object to the CM, along with sequences in "seq".

mapstr

(logical scalar) Must be used in combination with mapali. Propogate structural information for any pseudoknots that exist in mapali to the output alignment.

dnaout

(logical scalar) Output the alignments as DNA sequence alignments, instead of RNA ones.

noprob

(logical scalar) Do not annotate the output alignment with posterior probabilities.

matchonly

(logical scalar) Only include match columns in the output alignment, do not include any insertions relativeto the consensus model.

ileaved

(logical scalar) Output the alignment in interleaved Stockholm format of a fixed width that may be more convenient for examination.

regress

(character filename) Save an additional copy of the output alignment with no author information to file.

verbose

(logical scalar) Output additional information in the tabular scores output.

cpu

(integer scalar) The number of cpus to use.

mpi

(logical scalar) Run as an MPI parallel program.

extra

(character vector) Additional advanced options to pass to cmalign.

glocal

(logical scalar) (Deprecated) see "global".

Details

One of the easiest places to obtain CMs is Rfam.

Value

the aligned sequences, as returned by read_stockholm_msa().

Examples

    # align a set of unaligned 5.8S rRNA sequences to the Rfam CM.
    cm <- cm_5_8S()
    unaln <- sample_rRNA_5_8S()
    cmalign(cm, unaln, cpu = 1)
    # also works if the fasta file has already been loaded
    unaln <- Biostrings::readRNAStringSet(unaln)
    cmalign(cm, unaln, cpu = 1)

brendanf/inferrnal documentation built on Sept. 30, 2024, 6:40 a.m.