lsux | R Documentation |
Extracts alternating variable and conserved domains from the contiguous rDNA regions which form the eukaryotic ribosomal large subunit, i.e. 5.8S RNA, ITS2, and 28S RNA. For the purposes of this document, this region will be referred to as the 32S precursor RNA, as in humans, although its actual size in Svedberg units varies between lineages.
lsux(
seq,
cm_5.8S = system.file(file.path("extdata", "RF00002.cm"), package = "inferrnal"),
cm_32S = system.file(file.path("extdata", "fungi_32S.cm"), package = "LSUx"),
glocal = TRUE,
global = FALSE,
ITS1 = FALSE,
cpu = NULL,
mxsize = NULL,
quiet = TRUE
)
seq |
(single filename readable by
|
cm_5.8S |
(filename) covariance model for 5.8S rRNA |
cm_32S |
(filename) covariance model for 32S pre-rRNA (5.8S, ITS2, and LSU) |
glocal |
( |
global |
( |
ITS1 |
( |
cpu |
( |
mxsize |
( |
quiet |
( |
Input sequences should contain, at a minimum, a significant fraction of the
5.8S RNA, which is used to define the 5' end of 32S. Any base pairs before
the 5' end of 5.8S will be considered to be ITS1 (ITS1 = TRUE
) or
discarded (ITS1 = FALSE
). Input sequences should not extend past the
end of the 32S model at the 3' end.
LSUx requires two covariance models: one for 5.8S, which is used in
inferrnal::cmsearch()
, and one for 32S, which is used in
inferrnal::cmalign()
.
The 5.8S model can be
RF00002 from Rfam (the default),
or an equivalent. It must be calibrated using cmcalibrate
from
Infernal (e.g., via inferrnal::cmcalibrate()
).
The 32S model must include annotations in the reference line ("#=GC RF" in
the seed alignment) to distinguish conserved and variable regions. The
annotations should be sequential characters in the range "1..9A..Z"
for conserved domains, "v"
for variable domains, and "."
for
unaligned gaps in the seed alignment.
In the output, the conserved domains will be named "5_8S", "LSU1", "LSU2",
...; the variable domains will be named "ITS2", "V1", "V2", ...
Two example models are included,
both based on the
RDP fungal LSU CM,
and annotated with variable regions according to Raué (1988).
The first, system.file(file.path("extdata", "fungal_32S.cm"), package = "LSUx")
, includes the full LSU region. The second,
system.file(file.path("extdata", "fungal_32S_LR5.cm"), package = "LSUx")
, is truncated at the binding site of the LR5 primer, and
should be faster for input sequences which do not extend past that point.
The seed alignments are also provided.
If generating similar truncated alignments with different endpoints, it is
critical to remove unpaired secondary structure elements from the
"#=GC SS_cons"
line of the seed alignment.
a tibble::tibble
with one row for each region found for
each input sequence.
The columns are:
seq_id
(character
) the sequence name from
seq
length
(integer
)the length of the original sequence in base pairs
region
(character
) the name of the found
domain. Can be "5_8S"
, "ITS2"
, "LSU1"
,
"V2"
, "LSU2"
, "V3"
, etc.
start
(integer
)the starting base for that domain in this sequence.
end
(integer
)as start
, but giving the
end base for the domain.
# the sample data was amplified with primers ITS1 and LR5, so the truncated
# cm is appropriate.
seq <- system.file("extdata/sample.fasta", package = "inferrnal")
cm_32S_trunc <- system.file(
file.path("extdata", "fungi_32S_LR5.cm"),
package = "LSUx"
)
lsux(seq, cm_32S = cm_32S_trunc, ITS1 = TRUE, cpu = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.