GiGList.Analysis: Summary generator for a G4-iM Grinder result list (GiGList)

View source: R/G4iM.Grinder.Funs.R

GiGList.AnalysisR Documentation

Summary generator for a G4-iM Grinder result list (GiGList)

Description

GiGList.Analysis produces a one-row summary of the results from a G4iMGrinder output list (referred to as a “GiGList”). It aggregates counts/densities of putative quadruplex or i-Motif structures, applying optional filters on minimum score, frequency, and structure length. The resulting summary facilitates high-level comparisons across different genomes or experimental conditions.

Usage

GiGList.Analysis(
  GiGList,
  iden,
  ScoreMin = c(20, 40),
  FreqMin = 10,
  LengthMin = 50,
  Density = 100000,
  byDensity = TRUE
)

Arguments

GiGList

list. A G4iMGrinder result list, typically named GiGList, to be analyzed or summarized.

iden

character. Identification tag or label for the DNA/RNA source of the GiGList.

ScoreMin

integer or vector of integer values. One or more minimum score thresholds to filter structures. - For G-quadruplex searches, any structure with a score >= ScoreMin[i] passes the filter. - For i-Motif searches, the score sign is reversed (multiplied by -1) to be comparable with G-quadruplex thresholds. A separate column is created for each value in ScoreMin. Defaults to c(20, 40), representing “medium” (≥ 20) and “high” (≥ 40) formation probability for G-quadruplexes (or ≤ -20 and ≤ -40 for i-Motifs).

FreqMin

integer or vector of integer values. One or more minimum frequency thresholds to filter structures based on how often they appear in the genome. A separate column is created for each threshold in FreqMin. Defaults to 10, i.e., returning results with frequency >= 10.

LengthMin

integer or vector of integer values. One or more minimum length thresholds, applied only to results from Method 3 (M3a/M3b). A separate column is created for each threshold in LengthMin. Defaults to 50, i.e., returning length >= 50.

Density

integer. Scaling factor for returning densities rather than raw counts. Defaults to 100000 (results per 100,000 nucleotides if byDensity = TRUE).

byDensity

logical. If TRUE, the function reports densities, computed as (Density * count) / genomeLength. If FALSE, raw counts are returned instead. Defaults to TRUE.

Details

Filters on score, frequency, and length can be applied simultaneously, producing separate result columns for each filter. Known quadruplex-forming or known NOT-to-form-quadruplex sequences (if included in the GiGList) are also reported. Additionally, columns indicating the percentage of “unique” structures (frequency = 1) are computed as a measure of overall structural redundancy.

Value

A one-row data.frame summarizing the GiGList, with potential columns such as:

Name

Character, derived from the name assigned to GiGList.

iden

Character, the iden input labeling the sequence source.

Length

Integer, total genomic length from the GiGList.

SeqG, SeqC

Numeric, percentage of G or C in the analyzed sequence(s).

nM2a, nM2b, nM3a, nM3b

Count or density of structures found by Methods 2a, 2b, 3a, or 3b respectively (if those methods were used).

...S{|x|}, ...F{|y|}, ...L{|z|}

Suffixes denoting columns filtered by score (x), frequency (y), or length (z).

...KTFQ, ...KNTFQ

Suffixes denoting structures that match known-to-form or known-NOT-to-form quadruplex sequences.

...UniqPercent

Columns indicating the percentage of unique structures within each method.

Config

A character column summarizing the configuration of analysis or filter settings.

Column Meanings

Name

Name of the G4iMGrinder result list, inherited from GiGList.

iden

Identification tag, matching iden argument.

Length

Total length of the sequence(s) as computed by G4iMGrinder.

SeqG, SeqC

Percentages of G and C nucleotides in the genome.

nM2a, nM2b, nM3a, nM3b

Counts or densities of structures detected by each method (2a, 2b, 3a, 3b).

.S|X|, .F|X|, .L|X|

Suffixes appended for each filter: “S” for score, “F” for frequency, “L” for length. X indicates the numeric threshold (e.g., 20, 10, or 50).

.KTFQ, .KNTFQ

Columns indicating, for each method, how many structures are known-to-form quadruplexes or known-NOT-to-form quadruplexes (counts or densities).

.UniqPercent

Percentage of unique results (frequency=1) within each method.

Config

A text string describing the filters and configurations used in the summary.

Warning

Column names depend on the analysis conditions (methods used, filter thresholds, etc.). Combining or binding multiple GiGList.Analysis summaries requires consistent parameters. Results may not align if different G4iMGrinder settings are used.

Author(s)

Efres Belmonte-Reche

References

Belmonte-Reche, E. and Morales, J. C. (2019). G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, 2. DOI: 10.1093/nargab/lqz005

https://academic.oup.com/nargab/article/2/1/lqz005/5576141

See Also

G4iMGrinder for generating the GiGList, and other related utilities.

Examples

# Example usage:

# Suppose 'Rs' is a G4iMGrinder result list (GiGList) for a DNA G-quadruplex search
Rs <- G4iMGrinder(Name = "TestGenome", Sequence = "ACGT...")

# Summarize with default filters
Summary <- GiGList.Analysis(
  GiGList = Rs,
  iden = "Parasite"
)

print(Summary)

EfresBR/G4iMGrinder documentation built on June 12, 2025, 3:52 a.m.