GiG.Seq.Analysis: Genomic sequence nucleotide run analyzer
In EfresBR/G4iMGrinder: G4iMGrinder: G4 Quadruplex, i-Motif and higher-order structures in DNA and RNA sequences search and analysis tool.

View source: R/G4iM.Grinder.Funs.R

GiG.Seq.Analysis

R Documentation

Genomic sequence nucleotide run analyzer

Description

GiG.Seq.Analysis examines a DNA or RNA sequence (and optionally its complementary strand) to identify and count runs of Guanines and Cytosines (G and C). It detects both perfect runs (e.g., GGGG) and imperfect runs (with bulges) of sizes ranging from 2 (e.g., GG; G-run of size 2) to 4 (e.g.,GGGG), in an non-overlapping way. The function can return raw counts or densities (per a specified length, e.g., per 100,000 nucleotides) for direct comparisons across sequences of different lengths. G and C runs are searched independently.

Usage

GiG.Seq.Analysis(
  Name,
  Sequence,
  DNA = TRUE,
  Complementary = TRUE,
  Density = 1e+05,
  byDensity = TRUE
)

Arguments

`Name`	`character`. Name or identifier of the DNA/RNA sequence to analyze.
`Sequence`	`character`. The nucleotide sequence to analyze (A, C, G, T, U, or N).
`DNA`	`logical`. Indicates whether `Sequence` is DNA (`TRUE`) or RNA (`FALSE`). Defaults to `TRUE`.
`Complementary`	`logical`. If `TRUE`, the complementary strand is created and analyzed alongside the original strand. Defaults to `TRUE`.
`Density`	`integer`. The scaling factor for returning densities rather than raw counts. Defaults to `100000`, so results are reported per 100,000 nucleotides if `byDensity = TRUE`.
`byDensity`	`logical`. If `TRUE`, results are returned as densities, i.e., `(Density * runCounts) / sequenceLength`. If `FALSE`, raw run counts are returned. Defaults to `TRUE`.

Details

By default, this function specifically looks for runs of “G” and “C,” counting perfect runs (e.g., G2, G3, G4) and imperfect runs. Runs are analyzed in a sequential, non-overlapping manner. For example, a run of “GGGG” is counted only as G4, not as G4 plus any subset runs like G2 or G3.

Value

A one-row data.frame summarizing the run analysis:

Column Meanings

Name: Identifier for the analyzed sequence, matching the Name argument.
DNA: Logical: TRUE if the sequence was treated as DNA; FALSE if RNA.
Length: Total length of Sequence.
Complementary: TRUE if the complementary strand was analyzed as well, otherwise FALSE.
G%seq, C%seq, A%seq, UT%seq, N%seq: Percentages of each nucleotide type within Sequence. (U and T are combined under UT%seq if DNA=FALSE.)
G2, G3, G2X, G3X: Counts or densities of perfect (G2, G3) and perfect+imperfect (G2X, G3X) G-runs of length 2 to 4, according to the method 1 approach in G4-iM Grinder.
C2, C3, C2X, C3X: Counts or densities of perfect (C2, C3) and perfect+imperfect (C2X, C3X) C-runs of length 2 to 4, according to the same method 1 approach.

Note

Any additional notes or implementation details can be placed here.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche, E. and Morales, J. C. (2019). G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2. DOI: 10.1093/nargab/lqz005

https://academic.oup.com/nargab/article/2/1/lqz005/5576141

Examples

# Creating a random nucleotide sequence of length 10,000
Seq <- paste0(
  sample(
    c("G", "C", "T", "A", "N"),
    10000,
    prob = c(1, 1, 0.6, 0.6, 0.01),
    replace = TRUE
  ),
  collapse = ""
)

# Running the analysis with default parameters
Rs <- GiG.Seq.Analysis(
  Name = "RandomSeq",
  Sequence = Seq,
  DNA = TRUE,
  Complementary = TRUE,
  byDensity = TRUE
)

# Analyzing a second sequence and storing results in the same data frame
Seq2 <- paste0(
  sample(
    c("G", "C", "T", "A", "N"),
    10000,
    prob = c(1, 1, 0.6, 0.6, 0.01),
    replace = TRUE
  ),
  collapse = ""
)
Rs[2, ] <- GiG.Seq.Analysis(
  Name = "RandomSeq2",
  Sequence = Seq2,
  DNA = TRUE,
  Complementary = TRUE,
  Density = 1e5,
  byDensity = TRUE
)

print(Rs)

EfresBR/G4iMGrinder documentation built on June 12, 2025, 3:52 a.m.