G4iMGrinder: Detect and analyze potential G-quadruplexes, i-Motifs, and...

View source: R/G4iM.Grinder.Funs.R

G4iMGrinderR Documentation

Detect and analyze potential G-quadruplexes, i-Motifs, and higher-order structures in DNA or RNA sequences

Description

G4iM Grinder is a flexible search engine and characterization tool designed to detect and analyze sequences capable of forming G-quadruplexes (Potential Quadruplex Sequences, PQSs), i-Motifs (Potential i-Motif Sequences, PiMS), or other higher-order quadruplex-like structures in DNA or RNA. It provides multiple “methods” to search for these motifs, allowing extensive configurability. Users can tailor the search to match specific criteria and search within their results for known quadruplex-forming or non-quadruplex-forming sequences. The results include both raw findings and frequency-weighted summaries.

Usage

G4iMGrinder(
  Name,
  Sequence,
  DNA = TRUE,
  Complementary = TRUE,
  RunComposition = "G",
  BulgeSize = 1,
  MaxIL = 3,
  MaxRunSize = 5,
  MinRunSize = 3,
  MinNRuns = 4,
  MaxNRuns = 0,
  MaxPQSSize = 33,
  MinPQSSize = 15,
  MaxLoopSize = 10,
  MinLoopSize = 0,
  LoopSeq = c("G", "T", "A", "C"),
  Method2 = TRUE,
  Method3 = TRUE,
  G4hunter = TRUE,
  cGcC = FALSE,
  PQSfinder = TRUE,
  Bt = 14,
  Pb = 17,
  Fm = 3,
  Em = 1,
  Ts = 4,
  Et = -19,
  Is = -16,
  Ei = 1,
  Ls = 1,
  ET = 1,
  WeightParameters = c(0.5, 0.5, 0),
  FreqWeight = 0,
  KnownQuadruplex = TRUE,
  KnownNOTQuadruplex = FALSE,
  RunFormula = FALSE,
  NCores = 1,
  Verborrea = TRUE
)

Arguments

Name

character. Name of the DNA or RNA sequence under analysis.

Sequence

character. The nucleotide sequence to be examined. Must be composed of valid DNA or RNA bases.

DNA

logical. Indicates whether Sequence is DNA (TRUE) or RNA (FALSE). Defaults to TRUE.

Complementary

logical. If TRUE, the complementary strand is generated and analyzed in parallel. Defaults to TRUE.

RunComposition

character. Nucleotide(s) used to define the “runs.” Typically "G" for G-quadruplex search or "C" for i-Motif search. Defaults to "G".

BulgeSize

integer. Number of allowed non-RunComposition nucleotides within a run (used by M1). Defaults to 1.

MaxIL

integer. Total number of additional nucleotides allowed between runs (used by M2). Defaults to 3.

MaxRunSize

integer. Maximum length of a run. Defaults to 5 (used by M2).

MinRunSize

integer. Minimum length of a run. Defaults to 3 (used by M1).

MinNRuns

integer. Minimum number of runs required to form a structure. Defaults to 4 (used by M2 and M3).

MaxNRuns

integer. Maximum number of runs that compose a structure. Defaults to 0, which disables an upper limit for run count (used by M2).

MaxPQSSize

integer. Maximum total length of a putative quadruplex structure. Defaults to 33 (used by M2).

MinPQSSize

integer. Minimum total length of a putative quadruplex structure. Defaults to 15 (used by M2 and M3).

MaxLoopSize

integer. Maximum number of nucleotides allowed in each loop (used by M2 and M3). Defaults to 10.

MinLoopSize

integer. Minimum number of nucleotides allowed in each loop (used by M2 and M3). Defaults to 0.

LoopSeq

character vector. Defines the nucleotide(s) or pattern(s) to measure or highlight within detected structures. Defaults to c("G", "T", "A", "C").

Method2

logical. If TRUE, enables Method 2 (M2), which searches for size-defined structures and computes frequency (M2A and M2B). Defaults to TRUE.

Method3

logical. If TRUE, enables Method 3 (M3), which searches for size-unrestricted structures and computes frequency (M3A and M3B). Defaults to TRUE.

G4hunter

logical. If TRUE, applies the G4Hunter scoring system. Defaults to TRUE.

cGcC

logical. If TRUE, applies the cGcC scoring system (valid for RNA). Defaults to FALSE.

PQSfinder

logical. If TRUE, applies an adaptation of the PQSfinder scoring system. Defaults to TRUE.

Bt

integer. Tetrad stacking bonus for PQSfinder calculations. Defaults to 14.

Pb

integer. Inter-Loop penalization constant for PQSfinder calculations. Defaults to 17.

Fm

integer. Loop length penalization constant for PQSfinder calculations. Defaults to 3.

Em

integer. Loop length exponential constant for PQSfinder calculations. Defaults to 1.

Ts

integer. Tetrad supplement constant for PQSfinder calculations. Defaults to 4.

Et

integer. Inter-Loop supplement constant for PQSfinder calculations. Defaults to -19.

Is

integer. Loop supplement constant for PQSfinder calculations. Defaults to -16.

Ei

integer. Tetrad exponential constant for PQSfinder calculations. Defaults to 1.

Ls

integer. Inter-Loop exponential constant for PQSfinder calculations. Defaults to 1.

ET

integer. Total formula exponential constant for PQSfinder calculations. Defaults to 1.

WeightParameters

numeric vector of length 3. Weights for combining G4hunter, PQSfinder, and cGcC scores (in that order). Defaults to c(0.5, 0.5, 0), producing an average of the first two.

FreqWeight

numeric. Weight factor for incorporating structure frequency in the final score (relevant to M2B and M3B). Defaults to 0.

KnownQuadruplex

logical. If TRUE, matches results against known sequences that have been shown to form G-quadruplex or i-Motif in vitro. Defaults to TRUE.

KnownNOTQuadruplex

logical. If TRUE, matches results against known sequences shown not to form quadruplexes. Defaults to FALSE.

RunFormula

logical. If TRUE, calculates and reports a symbolic formula for each detected PQS. Defaults to FALSE.

NCores

integer. Number of cores to use for parallel processing. Defaults to 1.

Verborrea

logical. If TRUE, prints verbose messages about progress. Defaults to TRUE.

Value

A list containing:

Configuration

A data.frame of the parameters used in the run.

FunTime

A data.frame with timing information for each step.

PQSM2a

A data.frame of the M2A (size-defined) results, if Method2 = TRUE.

PQSM2b

A data.frame of the M2B (frequency-weighted) results, if Method2 = TRUE.

PQSM3a

A data.frame of the M3A (unrestricted-size) results, if Method3 = TRUE.

PQSM3b

A data.frame of the M3B (frequency-weighted) results, if Method3 = TRUE.

Column Meanings

Start

Integer. Start position in Sequence (for M2A/M3A).

Finish

Integer. End position in Sequence (for M2A/M3A).

Freq

Integer. Frequency of occurrence (for M2B/M3B).

Runs

Integer. Number of runs (e.g., G-runs in G-quadruplex).

IL

Integer. Number of bulges or irregularities.

mRun

Numeric. Average run size.

Sequence

Character. The identified motif sequence.

Length

Integer. Total length of the identified structure.

Strand

Character. Indicates “+” (original) or “–” (complementary) strand, if Complementary = TRUE.

G4Hunter

Numeric. Score assigned by the G4Hunter algorithm (if G4hunter = TRUE).

pqsfinder

Numeric. Score from the PQSfinder adaptation (if PQSfinder = TRUE).

cGcC

Numeric. Score from the cGcC algorithm (if cGcC = TRUE).

Score

Numeric. Combined overall score, integrating all selected scoring methods plus frequency weighting.

Conf.Quad.Seqs

Character. Known quadruplex-forming sequences detected, with counts. DNA hits have “*” after the count; RNA hits have “^”.

Conf.NOT.Quad.Seqs

Character. Known non-quadruplex sequences detected, with counts. DNA hits have “*” after the count; RNA hits have “^”.

Note

M1 stands for Method 1; M2 stands for Method 2; M3 stands for Method 3.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche, E. and Morales, J. C. (2019). G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2. DOI: 10.1093/nargab/lqz005

https://academic.oup.com/nargab/article/2/1/lqz005/5576141

Examples

  library(G4iMGrinder)

  # Example: retrieve a DNA sequence and run basic G4 search
  if (!require("seqinr")) {
    install.packages("seqinr")
    library(seqinr)
  }

  Name <- "LmajorESTs"
  Sequence <- paste0(
    read.fasta(
      file = url("http://tritrypdb.org/common/downloads/release-36/Lmajor/fasta/TriTrypDB-36_Lmajor_ESTs.fasta"),
      as.string = TRUE, legacy.mode = TRUE, seqonly = TRUE,
      strip.desc = TRUE, seqtype = "DNA"
    ),
    collapse = ""
  )

  # G-quadruplex search on DNA
  resultDNA <- G4iMGrinder(Name = Name, Sequence = Sequence)

  # G-quadruplex search on RNA (with cGcC scoring)
  resultRNA <- G4iMGrinder(Name = Name, Sequence = Sequence, DNA = FALSE, cGcC = TRUE)

  # i-Motif search in DNA
  resultIMotif <- G4iMGrinder(Name = Name, Sequence = Sequence, RunComposition = "C")

  # Customized search with bulge allowance and larger loop sizes ## More bulges and smaller G-runs (GG) increases significantly computation time
  resultCustom <- G4iMGrinder(
    Name = Name,
    Sequence = Sequence,
    BulgeSize = 2,
    MaxLoopSize = 20,
    MaxIL = 10
  )

  # Viewing results
  View(resultDNA$PQSM2a)  # M2A results
  View(resultDNA$PQSM2b)  # M2B results (with frequency weighting)
  View(resultDNA$PQSM3a)  # M3A results (unrestricted-size search)
  View(resultDNA$PQSM3b)  # M3B results (unrestricted-size with frequency weighting)

EfresBR/G4iMGrinder documentation built on June 12, 2025, 3:52 a.m.