knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
R Interface to Call Programs from Infernal RNA Covariance Model Package
Covariance Models (CM) are stochastic models of RNA sequence and secondary
structure.
Infernal (INFERence of RNA ALignment)[^2] is a
software package with various command-line tools related to CMs.
inferrnal
(with two "r
"s) is a lightweight R interface which calls the
Infernal tools and imports the results to R.
It is developed independently from Infernal, and Infernal must be installed in
order for it to function.
Note that Infernal does not work on Windows.
[^2]: Nawrocki, E.P., Eddy, S.R., 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935.
The required Infernal package can be installed from Bioconda:
conda install -c bioconda infernal
or in Debian/Ubuntu Linux using apt:
sudo apt-get install infernal
or using Homebrew in MacOS:
brew tap brewsci/bio brew install infernal
For other installation options, including source installation, see the Infernal homepage.
inferrnal
After Infernal is installed, the released version of inferrnal
can be
installed from Bioconductor:
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("inferrnal")
Submission to Bioconductor is pending.
Alternatively, the development version of inferrnal
can be installed from
GitHub with:
if(!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes") remotes::install_github("brendanf/inferrnal")
So far three of the tools are implemented: cmsearch
, cmalign
, and cmbuild
.
In order to search, we need a CM.
Rfam has a wide variety.
For this example, we will use the eukaryotic 5.8S rRNA, with the Rfam ID
RF00002
.
The CM is available from Rfam, but
it is also included as example data in inferrnal
.
library(inferrnal) cm <- cm_5_8S()
We also need some sequences to search. The sample data is from a soil metabarcoding study focused on fungi[^1]. The targeted region includes 5.8S as well as some of the surrounding rDNA regions.
[^1]: Furneaux, B., Bahram, M., Rosling, A., Yorou, N.S., Ryberg, M., 2021. Long- and short-read metabarcoding technologies reveal similar spatiotemporal structures in fungal communities. Molecular Ecology Resources 21, 1833–1849.
sampfasta <- sample_rRNA_fasta()
Use cmsearch()
to locate the 5.8S RNA in each sequence.
library(inferrnal) cmsearch(cm = cm, seq = sampfasta, cpu = 1)
Instead of passing a file name, you can also supply a DNAStringSet
or
RNAStringSet
object from the Biostrings
package.
sampseqs <- Biostrings::readDNAStringSet(sampfasta) cmsearch(cm = cm, seq = sampseqs, cpu = 1)
cmsearch
, by default, returns a table with information about each hit.
However, it can optionally also output an alignment of the hits to a file
in Stockholm format.
alnfile <- tempfile("alignment-", fileext = ".stk") cmsearch(cm = cm, seq = sampseqs, alignment = alnfile)
inferrnal
includes a parser for Stockholm alignments, which also
imports column annotations.
msa <- read_stockholm_msa(alnfile)
read_stockholm_msa
returns an object of class
StockholmRNAMultipleAlignment
.
It is also possible to load DNA or AA alignments using an optional
type=
argument to read_stockholm_msa
.
msa
In addition to the alignment itself, the Stockholm output includes the consensus secondary structure and the reference annotation, as defined in the CM, as column ("GC") annotations named "SS_cons" and "RF", as well as the posterior probability that each base is aligned in the correct position as residue ("GR") annotations named "PP". For more information about these annotations, including the encoding of secondary structure and posterior probabilities, see the Infernal documentation.
If you have sequences which have already been trimmed to contain only the RNA
defined by the CM (possibly truncated, but not extended), then you can align
them to the CM using cmalign
.
This is much faster than cmsearch
.
This example uses the results of cmsearch
from the previous section, after
removing gaps.
unaln <- sample_rRNA_5_8S() unaln_seq <- Biostrings::readRNAStringSet(unaln) unaln_seq aln <- cmalign(cm, unaln_seq, cpu = 1) aln
cmbuild
is used to create new CMs from annotated multiple sequence alignments.
To illustrate the process, we use the seed alignment for the 5.8S rRNA CM from
RFAM.
It is included as a sample file in inferrnal
.
new_cm <- file.path(tempdir(), "5_8S.cm") cmbuild(new_cm, msafile = stk_5_8S(), force = TRUE, quiet = FALSE)
This CM is not calibrated, so it cannot be used for cmsearch
,
but it can be used in cmalign
.
aln2 <- cmalign(new_cm, unaln_seq, cpu = 1)
The resulting alignment is the same as the one created using the CM from RFAM,
because they are based on the same seed alignment, and used the same (default)
options for cmbuild
.
all.equal(aln, aln2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.