GiG.Seq.Analysis: Genomic sequence nucleotide run analyzer

View source: R/G4iM.Grinder.Funs.R

GiG.Seq.AnalysisR Documentation

Genomic sequence nucleotide run analyzer

Description

GiG.Seq.Analysis examines a DNA or RNA sequence (and optionally its complementary strand) to identify and count runs of Guanines and Cytosines (G and C). It detects both perfect runs (e.g., GGGG) and imperfect runs (with bulges) of sizes ranging from 2 (e.g., GG; G-run of size 2) to 4 (e.g.,GGGG), in an non-overlapping way. The function can return raw counts or densities (per a specified length, e.g., per 100,000 nucleotides) for direct comparisons across sequences of different lengths. G and C runs are searched independently.

Usage

GiG.Seq.Analysis(
  Name,
  Sequence,
  DNA = TRUE,
  Complementary = TRUE,
  Density = 1e+05,
  byDensity = TRUE
)

Arguments

Name

character. Name or identifier of the DNA/RNA sequence to analyze.

Sequence

character. The nucleotide sequence to analyze (A, C, G, T, U, or N).

DNA

logical. Indicates whether Sequence is DNA (TRUE) or RNA (FALSE). Defaults to TRUE.

Complementary

logical. If TRUE, the complementary strand is created and analyzed alongside the original strand. Defaults to TRUE.

Density

integer. The scaling factor for returning densities rather than raw counts. Defaults to 100000, so results are reported per 100,000 nucleotides if byDensity = TRUE.

byDensity

logical. If TRUE, results are returned as densities, i.e., (Density * runCounts) / sequenceLength. If FALSE, raw run counts are returned. Defaults to TRUE.

Details

By default, this function specifically looks for runs of “G” and “C,” counting perfect runs (e.g., G2, G3, G4) and imperfect runs. Runs are analyzed in a sequential, non-overlapping manner. For example, a run of “GGGG” is counted only as G4, not as G4 plus any subset runs like G2 or G3.

Value

A one-row data.frame summarizing the run analysis:

Column Meanings

Name

Identifier for the analyzed sequence, matching the Name argument.

DNA

Logical: TRUE if the sequence was treated as DNA; FALSE if RNA.

Length

Total length of Sequence.

Complementary

TRUE if the complementary strand was analyzed as well, otherwise FALSE.

G%seq, C%seq, A%seq, UT%seq, N%seq

Percentages of each nucleotide type within Sequence. (U and T are combined under UT%seq if DNA=FALSE.)

G2, G3, G2X, G3X

Counts or densities of perfect (G2, G3) and perfect+imperfect (G2X, G3X) G-runs of length 2 to 4, according to the method 1 approach in G4-iM Grinder.

C2, C3, C2X, C3X

Counts or densities of perfect (C2, C3) and perfect+imperfect (C2X, C3X) C-runs of length 2 to 4, according to the same method 1 approach.

Note

Any additional notes or implementation details can be placed here.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche, E. and Morales, J. C. (2019). G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2. DOI: 10.1093/nargab/lqz005

https://academic.oup.com/nargab/article/2/1/lqz005/5576141

See Also

G4iMGrinder for broader G4/i-Motif detection and scoring.

Examples

# Creating a random nucleotide sequence of length 10,000
Seq <- paste0(
  sample(
    c("G", "C", "T", "A", "N"),
    10000,
    prob = c(1, 1, 0.6, 0.6, 0.01),
    replace = TRUE
  ),
  collapse = ""
)

# Running the analysis with default parameters
Rs <- GiG.Seq.Analysis(
  Name = "RandomSeq",
  Sequence = Seq,
  DNA = TRUE,
  Complementary = TRUE,
  byDensity = TRUE
)

# Analyzing a second sequence and storing results in the same data frame
Seq2 <- paste0(
  sample(
    c("G", "C", "T", "A", "N"),
    10000,
    prob = c(1, 1, 0.6, 0.6, 0.01),
    replace = TRUE
  ),
  collapse = ""
)
Rs[2, ] <- GiG.Seq.Analysis(
  Name = "RandomSeq2",
  Sequence = Seq2,
  DNA = TRUE,
  Complementary = TRUE,
  Density = 1e5,
  byDensity = TRUE
)

print(Rs)

EfresBR/G4iMGrinder documentation built on June 12, 2025, 3:52 a.m.