triplex.search: Search intramolecular triplex-forming sequences in DNA
In triplex: Search and visualize intramolecular triplex-forming sequences in DNA

Description Usage Arguments Details Value Note Author(s) References See Also Examples

The triplex.search function identifies potential intramolecular triplex-forming sequences in DNA.

triplex.search(
  dna, 
  type        = 0:7,
  min_score   = 15,
  p_value     = 0.05,
  min_len     = 6,
  max_len     = 25,
  min_loop    = 3,
  max_loop    = 10,
  seq_type    = 'eukaryotic',
  score_table = 'default',
  group_table = 'default',
  lambda_par  = 'default',
  lambda_apar = 'default',
  mu_par      = 'default',
  mu_apar     = 'default',
  rn_par      = 'default',
  rn_apar     = 'default',
  dtwist_pen  = 'default',
  ins_pen     = 'default',
  iso_pen     = 'default',
  iso_bonus   = 'default',
  mis_pen     = 'default')

`dna`	A `DNAString` object.
`type`	Vector of triplex types (0..7) to be searched for.
`min_score`	Minimal score treshold.
`p_value`	Acceptable P-value.
`min_len`	Minimal triplex length.
`max_len`	Maximal triplex length.
`min_loop`	Minimal triplex loop length. Can not be lower than one.
`max_loop`	Maximal triplex loop length.
`seq_type`	Type of input sequence. Possible options: prokaryotic, eukaryotic.
`score_table`	Scoring table for parallel and antiparallel triplex types. Default is the same as `triplex.score.table` output. Before changing this option, please read `triplex.score.table` help carefully.
`group_table`	Isomorphic group table for parallel and antiparallel triplex types. Default is the same as `triplex.group.table` output. Before changing this option, please read `triplex.group.table` help carefully.
`lambda_par`	Lambda for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.8892, for eukaryotic 0.8433.
`lambda_apar`	Lambda for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.8092, for eukaryotic 0.6910.
`mu_par`	Mu for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 7.4805, for eukaryotic 0.8433.
`mu_apar`	Mu for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 7.6569, for eukaryotic 7.9611.
`rn_par`	Hit ratio (reported hits to sequence length) for parallel triplex types 0,1,2,3. Default for prokaryotic sequence is 0.0406, for eukaryotic 0.0304.
`rn_apar`	Hit ratio (reported hits to sequence length) for antiparallel triplex types 4,5,6,7. Default for prokaryotic sequence is 0.0273, for eukaryotic 0.0405.
`dtwist_pen`	Dtwist penalization, default is 7.
`ins_pen`	Insertion penalization, default is 9.
`iso_pen`	Isomorphic group change penalization, default is 5.
`iso_bonus`	Isomorphic group stay bonus, default is 0.
`mis_pen`	Mismatch penalization, default is 7.

The triplex.search function identifies potential intramolecular triplex-forming sequences in DNA sequence represented as a DNAString object.

Based on triplex position (forward or reverse strand) and its third strand orientation, up to 8 types of triplexes are distinguished by the function (see the following figure). By default, the function detects all 8 types, however this behavior can be changed by setting the type parameter to any value or a subset of values in the range 0 to 7.

Figure 1: Triplex types

Detected triplexes are returned as instances of the TriplexViews class, which represents the basic container for storing a set of views on the same input sequence similarly to the XStringViews object (in fact TriplexViews only extends the XStringViews class with a number of displayed columns). Each triplex view is defined by start and end locations, width, score, P-value, number of insertions, type, strand, loop start and loop end. Please note, that the strand orientation depends on triplex type only. The triplex.search function assumes that the input DNA sequence represents the forward strand.

Basic requirements for the shape or length of detected triplexes can be defined using four parameters: min_len, max_len, min_loop and max_loop. While min_len and max_len specify the length of the triplex stem composed of individual triplets, min_loop and max_loop parameters define the range of lengths for the unpaired loop at the top of the triplex. A graphical representation of these parameters is shown in the following figure. Please note, these parameters also impact the overall computation time. For longer triplexes, larger space has to be explored and thus more computation time is consumed.

Figure 2: Triplex scheme

The quality of each triplex is represented by its score value. A higher score value represents a higher-quality triplex. This quality is decreased by several types of imperfections at the level of triplets, such as character mismatch, insertion, deletion, isomorphic group change etc. Penalization constants for these imperfections can be setup using the following parameters: mis_pen, ins_pen, iso_pen, iso_bonus and dtwist_pen. Detailed information about the scoring function and penalization parameters can be found in (Lexa et al., 2011). It is highly recommended to see (Lexa et al., 2011) prior to changing any penalization parameters.

The triplex.search function can output a large list containing tens of thousands of potential triplexes. The size of these results can be reduced using two filtration mechanisms: (1) by specifying the minimal acceptable score value using the min_score parameter or (2) by specifying the maximum acceptable P-value of results using the p_value parameter. The P-value represents the probability of occurrence of detected triplexes in random sequence. By default, only triplexes with P-value equal or less than 0.05 are reported. Calculation of P-value depends on two extreme value distribution parameters lambda and mi. By default, these parameters are set up for searching in human genome sequences. It is highly recommended to see (Lexa et al., 2011) prior to changing either of the lambda and mi parameters.

Instance of TriplexViews object based on XStringViews class.

If you modify the penalization options (dtwist_pen, ins_pen, iso_pen, iso_bonus, mis_pen), scoring tables (score_table) or isogroup tables (group_table), you should consider changing also default P-value constants (lambda, mu and rn) to get relevant P-values.

Matej Lexa, Tomas Martinek, Jiri Hon

Lexa, M., Martinek, T., Burgetova, I., Kopecek, D., Brazdova, M.: A dynamic programming algorithm for identification of triplex-forming sequences, In: Bioinformatics, Vol. 27, No. 18, 2011, Oxford, GB, p. 2510-2517, ISSN 1367-4803

TriplexViews, triplex.score.table triplex.group.table triplex.diagram, triplex.3D, triplex.alignment

# GAA triplet repeats involved in Friedreichs's ataxia
seq <- DNAString("GAAGAAGAAGAAGAAGAAGAAGAAGAAGAA")

# Search specific triplex types (see details section)
triplex.search(seq, type=c(2,3), min_score=10, p_value=1)

# Search all triplex types
t <- triplex.search(seq, min_score=10, p_value=1)

# Sort triplexes by score
t[order(score(t), decreasing=TRUE)]