knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%"
)

R Interface to Call Programs from Infernal RNA Covariance Model Package

Covariance Models (CM) are stochastic models of RNA sequence and secondary structure. Infernal (INFERence of RNA ALignment)[^2] is a software package with various command-line tools related to CMs. inferrnal (with two "r"s) is a lightweight R interface which calls the Infernal tools and imports the results to R. It is developed independently from Infernal, and Infernal must be installed in order for it to function. Note that Infernal does not work on Windows.

[^2]: Nawrocki, E.P., Eddy, S.R., 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935.

Installation

Installing Infernal

The required Infernal package can be installed from Bioconda:

conda install -c bioconda infernal

or in Debian/Ubuntu Linux using apt:

sudo apt-get install infernal

or using Homebrew in MacOS:

brew tap brewsci/bio
brew install infernal

For other installation options, including source installation, see the Infernal homepage.

Installing inferrnal

After Infernal is installed, the development version of inferrnal can be installed from GitHub with:

if(!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")
remotes::install_github("brendanf/inferrnal")

Examples

So far three of the tools are implemented: cmsearch, cmalign, and cmbuild.

cmsearch

In order to search, we need a CM. Rfam has a wide variety. For this example, we will use the eukaryotic 5.8S rRNA, with the Rfam ID RF00002. The CM is available from Rfam, but it is also included as example data in inferrnal.

library(inferrnal)
cm <- cm_5_8S()

We also need some sequences to search. The sample data is from a soil metabarcoding study focused on fungi[^1]. The targeted region includes 5.8S as well as some of the surrounding rDNA regions.

[^1]: Furneaux, B., Bahram, M., Rosling, A., Yorou, N.S., Ryberg, M., 2021. Long- and short-read metabarcoding technologies reveal similar spatiotemporal structures in fungal communities. Molecular Ecology Resources 21, 1833–1849.

sampfasta <- sample_rRNA_fasta()

Use cmsearch() to locate the 5.8S RNA in each sequence.

library(inferrnal)
cmsearch(cm = cm, seq = sampfasta, cpu = 1)

Instead of passing a file name, you can also supply a DNAStringSet or RNAStringSet object from the Biostrings package.

sampseqs <- Biostrings::readDNAStringSet(sampfasta)
cmsearch(cm = cm, seq = sampseqs, cpu = 1)

cmsearch, by default, returns a table with information about each hit. However, it can optionally also output an alignment of the hits to a file in Stockholm format.

alnfile <- tempfile("alignment-", fileext = ".stk")
cmsearch(cm = cm, seq = sampseqs, alignment = alnfile)

inferrnal includes a parser for Stockholm alignments, which also imports column annotations.

msa <- read_stockholm_msa(alnfile)

read_stockholm_msa returns an object of class StockholmRNAMultipleAlignment. It is also possible to load DNA or AA alignments using an optional type= argument to read_stockholm_msa.

msa

In addition to the alignment itself, the Stockholm output includes the consensus secondary structure and the reference annotation, as defined in the CM, as column ("GC") annotations named "SS_cons" and "RF", as well as the posterior probability that each base is aligned in the correct position as residue ("GR") annotations named "PP". For more information about these annotations, including the encoding of secondary structure and posterior probabilities, see the Infernal documentation.

cmalign

If you have sequences which have already been trimmed to contain only the RNA defined by the CM (possibly truncated, but not extended), then you can align them to the CM using cmalign. This is much faster than cmsearch. This example uses the results of cmsearch from the previous section, after removing gaps.

unaln <- sample_rRNA_5_8S()
unaln_seq <- Biostrings::readRNAStringSet(unaln)
unaln_seq
aln <- cmalign(cm, unaln_seq, cpu = 1)
aln

cmbuild

cmbuild is used to create new CMs from annotated multiple sequence alignments. To illustrate the process, we use the seed alignment for the 5.8S rRNA CM from RFAM. It is included as a sample file in inferrnal.

new_cm <- file.path(tempdir(), "5_8S.cm")
cmbuild(new_cm, msafile = stk_5_8S(), force = TRUE, quiet = FALSE)

This CM is not calibrated, so it cannot be used for cmsearch, but it can be used in cmalign.

aln2 <- cmalign(new_cm, unaln_seq, cpu = 1)

The resulting alignment is the same as the one created using the CM from RFAM, because they are based on the same seed alignment, and used the same (default) options for cmbuild.

all.equal(aln, aln2)

Session Info

sessionInfo()


brendanf/inferrnal documentation built on Feb. 4, 2023, 4:49 p.m.