Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/triplex.search.R

The `triplex.search`

function identifies potential intramolecular
triplex-forming sequences in DNA.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
triplex.search(
dna,
type = 0:7,
min_score = 15,
p_value = 0.05,
min_len = 6,
max_len = 25,
min_loop = 3,
max_loop = 10,
seq_type = 'eukaryotic',
score_table = 'default',
group_table = 'default',
lambda_par = 'default',
lambda_apar = 'default',
mu_par = 'default',
mu_apar = 'default',
rn_par = 'default',
rn_apar = 'default',
dtwist_pen = 'default',
ins_pen = 'default',
iso_pen = 'default',
iso_bonus = 'default',
mis_pen = 'default')
``` |

`dna` |
A |

`type` |
Vector of triplex types (0..7) to be searched for. |

`min_score` |
Minimal score treshold. |

`p_value` |
Acceptable P-value. |

`min_len` |
Minimal triplex length. |

`max_len` |
Maximal triplex length. |

`min_loop` |
Minimal triplex loop length. Can not be lower than one. |

`max_loop` |
Maximal triplex loop length. |

`seq_type` |
Type of input sequence. Possible options: prokaryotic, eukaryotic. |

`score_table` |
Scoring table for parallel and antiparallel triplex types. Default
is the same as |

`group_table` |
Isomorphic group table for parallel and antiparallel triplex types. Default
is the same as |

`lambda_par` |
Lambda for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.8892, for eukaryotic 0.8433. |

`lambda_apar` |
Lambda for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.8092, for eukaryotic 0.6910. |

`mu_par` |
Mu for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 7.4805, for eukaryotic 0.8433. |

`mu_apar` |
Mu for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 7.6569, for eukaryotic 7.9611. |

`rn_par` |
Hit ratio (reported hits to sequence length) for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.0406, for eukaryotic 0.0304. |

`rn_apar` |
Hit ratio (reported hits to sequence length) for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.0273, for eukaryotic 0.0405. |

`dtwist_pen` |
Dtwist penalization, default is 7. |

`ins_pen` |
Insertion penalization, default is 9. |

`iso_pen` |
Isomorphic group change penalization, default is 5. |

`iso_bonus` |
Isomorphic group stay bonus, default is 0. |

`mis_pen` |
Mismatch penalization, default is 7. |

The `triplex.search`

function identifies potential intramolecular
triplex-forming sequences in DNA sequence represented as
a `DNAString`

object.

Based on triplex position (forward or reverse strand) and its third strand
orientation, up to 8 types of triplexes are distinguished by the function (see
the following figure). By default, the function detects all 8 types, however
this behavior can be changed by setting the `type`

parameter to any value
or a subset of values in the range 0 to 7.

Detected triplexes are returned as instances of the
`TriplexViews`

class,
which represents the basic container for storing a set of views on the same
input sequence similarly to the `XStringViews`

object (in fact
`TriplexViews`

only extends the `XStringViews`

class
with a number of displayed columns). Each triplex view is defined by start
and end locations, width, score, P-value, number of insertions, type, strand, loop
start and loop end. Please note, that the strand orientation depends on
triplex type only. The `triplex.search`

function assumes that the input
DNA sequence represents the forward strand.

Basic requirements for the shape or length of detected triplexes can be
defined using four parameters: `min_len`

, `max_len`

,
`min_loop`

and `max_loop`

. While `min_len`

and `max_len`

specify the length of the triplex stem composed of individual triplets,
`min_loop`

and `max_loop`

parameters define the range of lengths
for the unpaired loop at the top of the triplex. A graphical representation of
these parameters is shown in the following figure. Please note, these
parameters also impact the overall computation time. For longer triplexes,
larger space has to be explored and thus more computation time is consumed.

The quality of each triplex is represented by its score value. A higher score
value represents a higher-quality triplex. This quality is decreased by several
types of imperfections at the level of triplets, such as character mismatch,
insertion, deletion, isomorphic group change etc. Penalization constants for
these imperfections can be setup using the following parameters: `mis_pen`

,
`ins_pen`

, `iso_pen`

, `iso_bonus`

and
`dtwist_pen`

. Detailed information about the scoring function and
penalization parameters can be found in (Lexa et al., 2011). It is highly
recommended to see (Lexa et al., 2011) prior to changing any penalization
parameters.

The `triplex.search`

function can output a large list containing tens of
thousands of potential triplexes. The size of these results can be reduced
using two filtration mechanisms: (1) by specifying the minimal acceptable
score value using the `min_score`

parameter or (2) by specifying the
maximum acceptable P-value of results using the `p_value`

parameter. The
P-value represents the probability of occurrence of detected triplexes in
random sequence. By default, only triplexes with P-value equal or less than
0.05 are reported. Calculation of P-value depends on two extreme value
distribution parameters `lambda`

and `mi`

. By default, these
parameters are set up for searching in human genome sequences. It is highly
recommended to see (Lexa et al., 2011) prior to changing either of the
`lambda`

and `mi`

parameters.

Instance of `TriplexViews`

object based on
`XStringViews`

class.

If you modify the penalization options (`dtwist_pen`

, `ins_pen`

,
`iso_pen`

, `iso_bonus`

, `mis_pen`

), scoring tables (`score_table`

)
or isogroup tables (`group_table`

),
you should consider changing also default P-value constants (`lambda`

,
`mu`

and `rn`

) to get relevant P-values.

Matej Lexa, Tomas Martinek, Jiri Hon

Lexa, M., Martinek, T., Burgetova, I., Kopecek, D., Brazdova, M.: *A
dynamic programming algorithm for identification of triplex-forming
sequences*, In: Bioinformatics, Vol. 27, No. 18, 2011, Oxford, GB, p.
2510-2517, ISSN 1367-4803

`TriplexViews`

,
`triplex.score.table`

`triplex.group.table`

`triplex.diagram`

,
`triplex.3D`

,
`triplex.alignment`

1 2 3 4 5 6 7 8 9 10 11 | ```
# GAA triplet repeats involved in Friedreichs's ataxia
seq <- DNAString("GAAGAAGAAGAAGAAGAAGAAGAAGAAGAA")
# Search specific triplex types (see details section)
triplex.search(seq, type=c(2,3), min_score=10, p_value=1)
# Search all triplex types
t <- triplex.search(seq, min_score=10, p_value=1)
# Sort triplexes by score
t[order(score(t), decreasing=TRUE)]
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.