GiGList.Analysis: Summary generator for a G4-iM Grinder result list (GiGList)
In EfresBR/G4iMGrinder: G4iMGrinder: G4 Quadruplex, i-Motif and higher-order structures in DNA and RNA sequences search and analysis tool.

View source: R/G4iM.Grinder.Funs.R

GiGList.Analysis

R Documentation

Summary generator for a G4-iM Grinder result list (GiGList)

Description

GiGList.Analysis produces a one-row summary of the results from a G4iMGrinder output list (referred to as a “GiGList”). It aggregates counts/densities of putative quadruplex or i-Motif structures, applying optional filters on minimum score, frequency, and structure length. The resulting summary facilitates high-level comparisons across different genomes or experimental conditions.

Usage

GiGList.Analysis(
  GiGList,
  iden,
  ScoreMin = c(20, 40),
  FreqMin = 10,
  LengthMin = 50,
  Density = 100000,
  byDensity = TRUE
)

Arguments

`GiGList`	`list`. A `G4iMGrinder` result list, typically named GiGList, to be analyzed or summarized.
`iden`	`character`. Identification tag or label for the DNA/RNA source of the `GiGList`.
`ScoreMin`	`integer` or vector of `integer` values. One or more minimum score thresholds to filter structures. - For G-quadruplex searches, any structure with a score `>= ScoreMin[i]` passes the filter. - For i-Motif searches, the score sign is reversed (multiplied by -1) to be comparable with G-quadruplex thresholds. A separate column is created for each value in `ScoreMin`. Defaults to `c(20, 40)`, representing “medium” (≥ 20) and “high” (≥ 40) formation probability for G-quadruplexes (or ≤ -20 and ≤ -40 for i-Motifs).
`FreqMin`	`integer` or vector of `integer` values. One or more minimum frequency thresholds to filter structures based on how often they appear in the genome. A separate column is created for each threshold in `FreqMin`. Defaults to `10`, i.e., returning results with `frequency >= 10`.
`LengthMin`	`integer` or vector of `integer` values. One or more minimum length thresholds, applied only to results from Method 3 (M3a/M3b). A separate column is created for each threshold in `LengthMin`. Defaults to `50`, i.e., returning `length >= 50`.
`Density`	`integer`. Scaling factor for returning densities rather than raw counts. Defaults to `100000` (results per 100,000 nucleotides if `byDensity = TRUE`).
`byDensity`	`logical`. If `TRUE`, the function reports densities, computed as `(Density * count) / genomeLength`. If `FALSE`, raw counts are returned instead. Defaults to `TRUE`.

Details

Filters on score, frequency, and length can be applied simultaneously, producing separate result columns for each filter. Known quadruplex-forming or known NOT-to-form-quadruplex sequences (if included in the GiGList) are also reported. Additionally, columns indicating the percentage of “unique” structures (frequency = 1) are computed as a measure of overall structural redundancy.

Value

A one-row data.frame summarizing the GiGList, with potential columns such as:

`Name`	Character, derived from the name assigned to `GiGList`.
`iden`	Character, the `iden` input labeling the sequence source.
`Length`	Integer, total genomic length from the `GiGList`.
`SeqG`, `SeqC`	Numeric, percentage of G or C in the analyzed sequence(s).
`nM2a`, `nM2b`, `nM3a`, `nM3b`	Count or density of structures found by Methods 2a, 2b, 3a, or 3b respectively (if those methods were used).
`...S{\|x\|}`, `...F{\|y\|}`, `...L{\|z\|}`	Suffixes denoting columns filtered by score (`x`), frequency (`y`), or length (`z`).
`...KTFQ`, `...KNTFQ`	Suffixes denoting structures that match known-to-form or known-NOT-to-form quadruplex sequences.
`...UniqPercent`	Columns indicating the percentage of unique structures within each method.
`Config`	A character column summarizing the configuration of analysis or filter settings.

Column Meanings

Name: Name of the G4iMGrinder result list, inherited from GiGList.
iden: Identification tag, matching iden argument.
Length: Total length of the sequence(s) as computed by G4iMGrinder.
SeqG, SeqC: Percentages of G and C nucleotides in the genome.
nM2a, nM2b, nM3a, nM3b: Counts or densities of structures detected by each method (2a, 2b, 3a, 3b).
.S|X|, .F|X|, .L|X|: Suffixes appended for each filter: “S” for score, “F” for frequency, “L” for length. X indicates the numeric threshold (e.g., 20, 10, or 50).
.KTFQ, .KNTFQ: Columns indicating, for each method, how many structures are known-to-form quadruplexes or known-NOT-to-form quadruplexes (counts or densities).
.UniqPercent: Percentage of unique results (frequency=1) within each method.
Config: A text string describing the filters and configurations used in the summary.

Warning

Column names depend on the analysis conditions (methods used, filter thresholds, etc.). Combining or binding multiple GiGList.Analysis summaries requires consistent parameters. Results may not align if different G4iMGrinder settings are used.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche, E. and Morales, J. C. (2019). G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2. DOI: 10.1093/nargab/lqz005

https://academic.oup.com/nargab/article/2/1/lqz005/5576141

Examples

# Example usage:

# Suppose 'Rs' is a G4iMGrinder result list (GiGList) for a DNA G-quadruplex search
Rs <- G4iMGrinder(Name = "TestGenome", Sequence = "ACGT...")

# Summarize with default filters
Summary <- GiGList.Analysis(
  GiGList = Rs,
  iden = "Parasite"
)

print(Summary)

EfresBR/G4iMGrinder documentation built on June 12, 2025, 3:52 a.m.