merge_5_8S: Repair poor 5.8S detection in ITSx results using Infernal...

View source: R/lsux.R

merge_5_8SR Documentation

Repair poor 5.8S detection in ITSx results using Infernal results.

Description

The 3' end of the 5.8S RNA is quite variable, and it is sometimes not detected by the HMM used in ITSx. Using a CMs, Infernal is able to more reliably delimit 5.8S. This function uses the results of cmsearch to fill in missing 5.8S annotations in ITSx results. It also updates the end of ITS1 and beginning of ITS2 to match.

Usage

merge_5_8S(itsx_result, csearch_result)

Arguments

itsx_result

(data.frame) "positions" results from itsx

csearch_result

(data.frame) results from cmsearch run on the same set of sequences, using a 5.8S RNA model such as RF00002

Value

a tibble in the same format output by itsx, with updated positions for the boundaries of 5.8S.

Examples

# load sample data from inferrnal
seqfile <- system.file(
    file.path("extdata", "sample.fasta"),
    package = "inferrnal"
)
# ITSx has trouble with some of the reads
seq <- Biostrings::readDNAStringSet(seqfile)[c(1,4,20,32,33,43,46,49)]
# the result from ITSx is included as a dataset to avoid a package
# dependency, but this is the code to generate it.
#itsx_result <- rITSx::itsx(
#    in_file = seq,
#    positions = TRUE,
#    complement = FALSE,
#    cpu = 1,
#    read_function = Biostrings::readDNAStringSet
#)
pos <- itsx_result$positions
pos[pos$region == "5_8S",]
# find 5.8S using cmsearch
cm_5_8S <- system.file(
    file.path("extdata", "RF00002.cm"),
    package = "inferrnal"
)
cm_result <- inferrnal::cmsearch(seq, cm = cm_5_8S, cpu = 1)
# combine the results
merge_pos <- merge_5_8S(pos, cm_result)
merge_pos[merge_pos$region == "5_8S",]

brendanf/LSUx documentation built on April 7, 2024, 9:27 p.m.