Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/triplex.search.R
The triplex.search
function identifies potential intramolecular
triplex-forming sequences in DNA.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | triplex.search(
dna,
type = 0:7,
min_score = 15,
p_value = 0.05,
min_len = 6,
max_len = 25,
min_loop = 3,
max_loop = 10,
seq_type = 'eukaryotic',
score_table = 'default',
group_table = 'default',
lambda_par = 'default',
lambda_apar = 'default',
mu_par = 'default',
mu_apar = 'default',
rn_par = 'default',
rn_apar = 'default',
dtwist_pen = 'default',
ins_pen = 'default',
iso_pen = 'default',
iso_bonus = 'default',
mis_pen = 'default')
|
dna |
A |
type |
Vector of triplex types (0..7) to be searched for. |
min_score |
Minimal score treshold. |
p_value |
Acceptable P-value. |
min_len |
Minimal triplex length. |
max_len |
Maximal triplex length. |
min_loop |
Minimal triplex loop length. Can not be lower than one. |
max_loop |
Maximal triplex loop length. |
seq_type |
Type of input sequence. Possible options: prokaryotic, eukaryotic. |
score_table |
Scoring table for parallel and antiparallel triplex types. Default
is the same as |
group_table |
Isomorphic group table for parallel and antiparallel triplex types. Default
is the same as |
lambda_par |
Lambda for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.8892, for eukaryotic 0.8433. |
lambda_apar |
Lambda for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.8092, for eukaryotic 0.6910. |
mu_par |
Mu for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 7.4805, for eukaryotic 0.8433. |
mu_apar |
Mu for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 7.6569, for eukaryotic 7.9611. |
rn_par |
Hit ratio (reported hits to sequence length) for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.0406, for eukaryotic 0.0304. |
rn_apar |
Hit ratio (reported hits to sequence length) for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.0273, for eukaryotic 0.0405. |
dtwist_pen |
Dtwist penalization, default is 7. |
ins_pen |
Insertion penalization, default is 9. |
iso_pen |
Isomorphic group change penalization, default is 5. |
iso_bonus |
Isomorphic group stay bonus, default is 0. |
mis_pen |
Mismatch penalization, default is 7. |
The triplex.search
function identifies potential intramolecular
triplex-forming sequences in DNA sequence represented as
a DNAString
object.
Based on triplex position (forward or reverse strand) and its third strand
orientation, up to 8 types of triplexes are distinguished by the function (see
the following figure). By default, the function detects all 8 types, however
this behavior can be changed by setting the type
parameter to any value
or a subset of values in the range 0 to 7.
Detected triplexes are returned as instances of the
TriplexViews
class,
which represents the basic container for storing a set of views on the same
input sequence similarly to the XStringViews
object (in fact
TriplexViews
only extends the XStringViews
class
with a number of displayed columns). Each triplex view is defined by start
and end locations, width, score, P-value, number of insertions, type, strand, loop
start and loop end. Please note, that the strand orientation depends on
triplex type only. The triplex.search
function assumes that the input
DNA sequence represents the forward strand.
Basic requirements for the shape or length of detected triplexes can be
defined using four parameters: min_len
, max_len
,
min_loop
and max_loop
. While min_len
and max_len
specify the length of the triplex stem composed of individual triplets,
min_loop
and max_loop
parameters define the range of lengths
for the unpaired loop at the top of the triplex. A graphical representation of
these parameters is shown in the following figure. Please note, these
parameters also impact the overall computation time. For longer triplexes,
larger space has to be explored and thus more computation time is consumed.
The quality of each triplex is represented by its score value. A higher score
value represents a higher-quality triplex. This quality is decreased by several
types of imperfections at the level of triplets, such as character mismatch,
insertion, deletion, isomorphic group change etc. Penalization constants for
these imperfections can be setup using the following parameters: mis_pen
,
ins_pen
, iso_pen
, iso_bonus
and
dtwist_pen
. Detailed information about the scoring function and
penalization parameters can be found in (Lexa et al., 2011). It is highly
recommended to see (Lexa et al., 2011) prior to changing any penalization
parameters.
The triplex.search
function can output a large list containing tens of
thousands of potential triplexes. The size of these results can be reduced
using two filtration mechanisms: (1) by specifying the minimal acceptable
score value using the min_score
parameter or (2) by specifying the
maximum acceptable P-value of results using the p_value
parameter. The
P-value represents the probability of occurrence of detected triplexes in
random sequence. By default, only triplexes with P-value equal or less than
0.05 are reported. Calculation of P-value depends on two extreme value
distribution parameters lambda
and mi
. By default, these
parameters are set up for searching in human genome sequences. It is highly
recommended to see (Lexa et al., 2011) prior to changing either of the
lambda
and mi
parameters.
Instance of TriplexViews
object based on
XStringViews
class.
If you modify the penalization options (dtwist_pen
, ins_pen
,
iso_pen
, iso_bonus
, mis_pen
), scoring tables (score_table
)
or isogroup tables (group_table
),
you should consider changing also default P-value constants (lambda
,
mu
and rn
) to get relevant P-values.
Matej Lexa, Tomas Martinek, Jiri Hon
Lexa, M., Martinek, T., Burgetova, I., Kopecek, D., Brazdova, M.: A dynamic programming algorithm for identification of triplex-forming sequences, In: Bioinformatics, Vol. 27, No. 18, 2011, Oxford, GB, p. 2510-2517, ISSN 1367-4803
TriplexViews
,
triplex.score.table
triplex.group.table
triplex.diagram
,
triplex.3D
,
triplex.alignment
1 2 3 4 5 6 7 8 9 10 11 | # GAA triplet repeats involved in Friedreichs's ataxia
seq <- DNAString("GAAGAAGAAGAAGAAGAAGAAGAAGAAGAA")
# Search specific triplex types (see details section)
triplex.search(seq, type=c(2,3), min_score=10, p_value=1)
# Search all triplex types
t <- triplex.search(seq, min_score=10, p_value=1)
# Sort triplexes by score
t[order(score(t), decreasing=TRUE)]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.