G4iMGrinder: Detect and analyze possible G-quadruplex, i-Motifs and their...

Description Usage Arguments Value Column Meanings Note Author(s) References Examples

View source: R/G4iM.Grinder.Funs.R

Description

A function to detect and analyse quadruplex sequences in a genome. G4iM Grinder can be applied as a tool for possible G-quadruplex, i-Motif and higher-order structure identification, characterization and punctuation as the probability of in vitro formation and biological relevance. The search algorithm is highly configurable in all of the process steps.

Usage

1
2
3
4
G4iMGrinder(Name, Sequence, DNA = TRUE, Complementary = TRUE, RunComposition = "G", BulgeSize = 1, MaxIL = 3, MaxRunSize = 5, MinRunSize = 3, MinNRuns = 4,
MaxNRuns = 0, MaxPQSSize = 33, MinPQSSize = 15, MaxLoopSize = 10, MinLoopSize = 0, LoopSeq = c("G", "T", "A", "C"), Method2 = TRUE, Method3 = TRUE, G4hunter = TRUE,
cGcC = FALSE, PQSfinder = TRUE, Bt = 14, Pb = 17, Fm = 3, Em = 1, Ts = 4, Et = 1, Is = -19, Ei = 1, Ls = -16, ET = 1, WeightParameters = c(50, 50, 0),
FreqWeight = 0, KnownQuadruplex = TRUE, KnownNOTQuadruplex = FALSE,  RunFormula = FALSE, NCores = 1, Verborrea = TRUE)

Arguments

Name

character, name of the DNA or RNA sequence to grind.

Sequence

character, DNA or RNA sequence to grind composed of the nucleotide arrangement.

DNA

logical, controls if the sequence is DNA or RNA. The factory-fresh default is TRUE assuming the sequence is DNA.

Complementary

logical, controls if the Complementary strand should be created and analyzed. The factory-fresh default is TRUE.

RunComposition

character, nucleotide that composes the runs. RunComposition == "G" for G-quadruplex and RunComposition == "C" for i-Motifs. Any other nucleotide (or letter) can be imputed. The factory-fresh default is "G".

BulgeSize

integer, number of acceptable non-RunComposition nucleotides to exist within runs. The factory-fresh default is 1. Used by M1.

MaxRunSize

integer, max. number of RunComposition-nucleotides that compose a run. The factory-fresh default is 5. Used by M2.

MinRunSize

integer, min. number of RunComposition-nucleotides that compose a run. The factory-fresh default is 5. Used by M1.

MaxLoopSize

integer, max. number of nucleotides that may exist between runs to assume relationship. The factory-fresh default is 10. Used by M2 and M3.

MinLoopSize

integer, min. number of nucleotides that may exist between runs to assume relationship. The factory-fresh default is 0. Used by M2 and M3.

MaxNRuns

integer, max. number of runs that compose a structure. The factory-fresh default is 0. When MaxNRuns < MinNRuns, G4iM Grinder will evade using this MaxNRuns variable in the search algorithm, allowing looking for structures with more than the traditional 4 run. Used by M2.

MinNRuns

integer, min. number of runs that compose a structure. The factory-fresh default is 4. Used by M2 and M3.

MaxPQSSize

integer, max. number of nucleotides that compose a structure. The factory-fresh default is 33. Used by M2.

MinPQSSize

integer, min. number of nucleotides that compose a structure. The factory-fresh default is 15. Used by Method2 and M3.

MaxIL

integer, total number of nucleotides to allow to exist in between all the RunComposition-run of a structure. The factory-fresh default is 3. Used by M2.

Method2

logical, to apply method 2 (M2A) of analysis to the sequence search results. This will search for structures with defined size and runs. Depends on variables: MinNRuns, MinPQSSize, MaxNRuns and MaxPQSSize. Will also give frequency counts of each structure detected (M2B). The factory-fresh default is TRUE

Method3

logical, to apply method 3 (M3A) to the sequence search results. Search for structures with unrestricted size and numbers of runs. Useful for searching higher forming structures. Depends on variables: MinNRuns and MinPQSSize. Will also give frequency counts of each structure detected (M3B). The factory-fresh default is TRUE

LoopSeq

character, vector that defines what nucleotide and/or nucleotide pattern to quantify in each structure detected. The factory-fresh default is c("G", "C", "A", "T") but multi-character patterns are accepted, like GGG.

WeightParameters

vector of three integers, where each of the integers are the weighted value of each possible scoring system: G4hunter, PQSfinder and cGcC (in that order). Depends on the scoring system to be TRUE, or else it will force here its value to 0. The ‘factory-fresh’ default is c(0.5, 0.5, 0), were the final score of a structure will hence be the average between G4hunter and PQSfinder. cGcC scoring value should always be 0 as its algorithm score system is not in the 0-100 range.

G4hunter

logical, to apply G4hunter algorithm as a scoring mechanism of in vitro probability of formation. The factory-fresh default is TRUE.

cGcC

logical, to apply cGcC algorithm as a scoring mechanism of in vitro probability of formation. Only for RNA sequences. The factory-fresh default is FALSE.

PQSfinder

logical, to apply an adaptation of PQSfinder algorithm as a scoring mechanism of in vitro probability of formation. The factory-fresh default is TRUE.

Bt

integer, tetrad stacking bonus constant used for the PQSfinder adaptation calculations. The factory-fresh default is 14.

Pb

integer, inter-Loop penalization constant used for the PQSfinder adaptation calculations. The factory-fresh default is 17.

Fm

integer, loop length penalization constant used for the PQSfinder adaptation calculations. The factory-fresh default is 3.

Em

integer, loop length exponential constant used for the PQSfinder adaptation calculations. The factory-fresh default is 1.

Ts

integer, tetrad supplement constant used for the PQSfinder adaptation calculations. The factory-fresh default is 4.

Et

integer, inter-Loop supplement constant used for the PQSfinder adaptation calculations. The factory-fresh default is -19.

Is

integer, loop supplement constant used for the PQSfinder adaptation calculations. The factory-fresh default is -16.

Ei

integer, tetrad exponential constant used for the PQSfinder adaptation calculations. The factory-fresh default is 1.

Ls

integer, inter-Loop exponential constant used for the PQSfinder adaptation calculations. The factory-fresh default is 1.

ET

integer, total formula exponential constant used for the PQSfinder adaptation calculations. The factory-fresh default is 1.

KnownQuadruplex

logical, controls if G4iM Grinder should compare the results with a list of known sequences that have already been demonstrated to form in vitro. Only for RunComposition = "G" or RunComposition = "C". The factory-fresh default is TRUE.

KnownNOTQuadruplex

logical, controls if G4iM Grinder should compare the results with a list of known sequences that have already been demonstrated to NOT form in vitro. Only for RunComposition = "G" or RunComposition = "C". The factory-fresh default is FALSE.

FreqWeight

integer, an arbitrary constant to which calculate the importance of the structure frequency. Useful only for M2B and M3B, were frequency of the structures are calculated and a new score is computed considering structure frequency.The factory-fresh default is 0.

RunFormula

logical, should the formula of the PQS be calculated. The factory-fresh default is FALSE.

NCores

integer, number of Cores to cede to the function for parallel computation. The factory-fresh default is 1

Verborrea

logical, allow the function to update the user with its progress. The factory-fresh default is TRUE

Value

The result of G4iM Grinder is a List

Configuration

A data.frame with the variables and configuration used by G4iM Grinder.

FunTime

A data.frame with the time taken by each part of G4iM Grinder.

PQSM2a

A data.frame with the results found using M2A (method 2a). Only if Method2 == TRUE.

PQSM2b

A data.frame with the results found using M2B considering frequency (method 2b). Only if Method2 == TRUE.

PQSM3a

A data.frame with the results found using M3A (method 3a). Only if Method3 == TRUE.

PQSM3b

A data.frame with the results found using M3B considering frequency (method 3b). Only if Method3 == TRUE.

Column Meanings

Start: integer, start position of the sequence in the genome. Only for M2A and M3A.

Finish: integer, end position of the sequence in the genome. Only for M2A and M3A.

Freq: integer, sequence frequency of appearance in the genome. Only for M2B and M3B.

Runs: integer, number of runs (G-runs for PQS, C-runs for PiQS) in the sequence.

IL: integer, number of bulges in the sequence.

mRun: numeric, average run size.

Sequence: character, sequence nucleotide arrangement.

Length: integer, size in nucleotides of the sequence.

Strand: character, strand position of the sequence. "+" is the original and "-" is the complementary strand. Only if Complementary = TRUE.

G4Hunter: numeric, sequence score by G4Hunter. Only if G4Hunter = TRUE.

pqsfinder: numeric, sequence score by GiG's PQSfinder. Only if pqsfinder = TRUE.

cGcC: numeric, sequence score by GiG's cGcC. Only if cGcC = TRUE.

Score: numeric, sequence score combining all selected scores and sequence frequency.

Conf.Quad.Seqs: character, name and times found (in parenthesis) of known-to-form quadruplexes in the sequence. DNA Known-to-form structures have asterisk (*) after the number of times detected. RNA known-to-form structures have a circumflex (^) after the times detected. Only if KnownQuadruplex = TRUE.

Conf.NOT.Quad.Seqs: character, name and times found (in parenthesis) of known-NOT-to-form quadruplexes in the sequence. DNA Known-NOT-to-form structures have asterisk (*) after the number of times detected. RNA known-NOT-to-form structures have a circumflex (^) after the times detected.Only if KnownNOTQuadruplex = TRUE.

Note

M1 is Method 1. M2 is Method 2. M3 is method 3.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche,E. and Morales,J.C. (2019) G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  library(G4iMGrinder)

  #Retrieving a Sequence
  {if(!require("seqinr")){install.packages("seqinr")}}
  Name <- "LmajorESTs"
  Sequence <-
    paste0(read.fasta(file = url("http://tritrypdb.org/common/downloads/release-36/Lmajor/fasta/TriTrypDB-36_Lmajor_ESTs.fasta"),
                      as.string = TRUE, legacy.mode = TRUE, seqonly = TRUE, strip.desc = TRUE, seqtype = "DNA" ), collapse = "")

  #For G-quadruplex search in DNA.
  Rs <- G4iMGrinder(Name = Name, Sequence = Sequence)

  #For G-quadruplex search in RNA.
  Rs <- G4iMGrinder(Name = Name, Sequence = Sequence, DNA = FALSE, cGcC = TRUE)

  #For i-Motifs search in DNA.
  Rs <- G4iMGrinder(Name = Name, Sequence = Sequence, RunComposition == "C")

  #For flexible G-quadruplex search in DNA.
  Rs <- G4iMGrinder(Name = Name, Sequence = Sequence, BulgeSize = 2, MaxLoopSize = 20, MaxIL = 10)

  #Visualization of Results
  View(Rs$PQSM2a)  # To view M2A (Method2a) results. Size dependent structures with overlapping.
  View(Rs$PQSM2b)  # To view M2B (Method2b) results. Method2 with frequency considerations.
  View(Rs$PQSM3a)  # To view M3A (Method3a) results. Size independent structures without overlapping.
  View(Rs$PQSM3b)  # To view M3B (Method3b) results. Method3 with frequency considerations.

EfresBR/G4iMGrinder documentation built on June 11, 2021, 2:57 a.m.