rfamSequenceSearch: Performs a sequence search of the Rfam database

Description Usage Arguments Value References Examples

View source: R/rfaRm_searchFunctions.R

Description

Performs a search of the Rfam database by a provided RNA sequence, and retrieves high-scoring hits of Rfam families with different regions of the provided sequence.

Usage

1
rfamSequenceSearch(sequence, fragmentsOverlap=1000, clanCompetitionFilter=TRUE, clanOverlapThreshold=0.5)

Arguments

sequence

string with an RNA sequence to be searched against the Rfam database. Should contain only standard RNA symbols (i.e., "A", "U", "G" and "C"), and no spaces or newlines.

fragmentsOverlap

when a sequence larger than 10000 nucleotides is provided, it is internally split into smaller fragments before using them to search the Rfam database. This argument controls the number of overlapping bases between consecutive fragments.

clanCompetitionFilter

logical indicating if results should be reduced through a clan competition filter, which removes overlapping hits if they belong to Rfam families of the same clan and have an overlap above a certain threshold.

clanOverlapThreshold

number indicating the minimum overlap between two hits (as a fraction of the smallest hit) to remove the hit with the worst e-value if their families belong to the same Rfam clan.

Value

A nested list where each element of the top-level list represents a high-scoring hit with the Rfam families. Each of the top-level list elements is a list in itself, containing the following elements that describe the hit:

rfamAccession

Rfam accession of the Rfam family with which a hit was found

bitScore

Bit score for the hit with an Rfam family

eValue

Expectation value for the with an Rfam family

alignmentStartPositionQuerySequence

Start position in the query sequence of the sequence region that resulted in the hit

alignmentEndPositionQuerySequence

End position in the query sequence of the sequence region that resulted in the hit

alignmentStartPositionHitSequence

Start position in the Rfam family consensus sequence region with which the hit was found

alignmentEndPositionHitSequence

End position in the Rfam family consensus sequence region with which the hit was found

alignmentQuerySequence

Sequence region of the query RNA sequence provided to search the Rfam database, aligned with the corresponding region of the consensus sequence of the Rfam family with which the hit was found

alignmentMatch

String describing the matches between the aligned regions of the query sequence and the consensus sequence of the Rfam family

alignmentHitSequence

Sequence region of the consensus sequence of the Rfam family with which the hit was found, aligned to the corresponding region of the query RNA sequence

alignmentSecondaryStructure

Secondary structure of the region of the consensus sequence of the Rfam family with which the hit was found

References

Ioanna Kalvari, Joanna Argasinska, Natalia Quinones-Olvera, Eric P Nawrocki, Elena Rivas, Sean R Eddy, Alex Bateman, Robert D Finn, Anton I Petrov, Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D335–D342, https://doi.org/10.1093/nar/gkx1038

https://docs.rfam.org/en/latest/api.html

https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/rna_structure_notations.html

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Search the Rfam database for hits with a specific sequence, and store the
# results in a nested list

searchHits <- rfamSequenceSearch("GGAUCUUCGGGGCAGGGUGAAAUUCCCGACCGGUGGUAUAGUCCAC
GAAAGUAUUUGCUUUGAUUUGGUGAAAUUCCAAAACCGACAGUAGAGUCUGGAUGAGAGAAGAUUC")

# Check number of high-scoring hits

length(searchHits)

# Extract the Rfam family accession and ID for the first hit

searchHits[[1]]$rfamAccession
searchHits[[1]]$rfamID

LaraSellesVidal/rfaRm documentation built on Aug. 8, 2021, 7:25 p.m.