grin.stats: Execute GRIN Statistical Framework

View source: R/grin.stats.R

grin.statsR Documentation

Execute GRIN Statistical Framework

Description

Executes the Genomic Random Interval (GRIN) statistical framework to determine whether a specific genomic locus (gene or regulatory region) is significantly affected by either individual or a constellation of multiple lesion types.

Usage

grin.stats(lsn.data, gene.data = NULL, chr.size = NULL, genome.version = NULL)

Arguments

lsn.data

A data.frame containing lesion data formatted for GRIN analysis. It must include the following five columns:

  • ID: Sample or patient identifier.

  • chrom: Chromosome on which the lesion is located.

  • loc.start: Genomic start coordinate of the lesion.

  • loc.end: Genomic end coordinate of the lesion.

  • lsn.type: Lesion type (e.g., gain, loss, mutation, fusion, etc...).

For Single Nucleotide Variants (SNVs), loc.start and loc.end should be the same. For Copy Number Alterations (CNAs) such as gains and deletions, these fields represent the lesion start and end positions (lesion boundary). Structural rearrangements (e.g., translocations, inversions) should be represented by two entries (two separate rows), one for each breakpoint. An example dataset is available in the GRIN2 package (lesion_data.rda).

gene.data

A data.frame containing gene annotation data. Must include the following columns:

  • gene: Ensembl gene ID.

  • chrom: Chromosome where the gene is located.

  • loc.start: Gene start position.

  • loc.end: Gene end position.

This data can be user-provided or retrieved automatically via get.ensembl.annotation() if genome.version is specified.

chr.size

A data.frame specifying chromosome sizes. Must contain:

  • chrom: Chromosome number.

  • size: Chromosome length in base pairs.

The data can be user-provided or directly retrieved using get.chrom.length() if genome.version is specified.

genome.version

Optional. If gene annotation and chromosome size files are not provided, users can specify a supported genome assembly to retrieve these files automatically. Currently, the package only support "Human_GRCh38" genome assembly.

Details

The GRIN algorithm evaluates each locus to determine whether the observed frequency and distribution of lesions is greater than expected by chance. This is modeled using a convolution of independent, non-identical Bernoulli distributions, accounting for lesion type, locus size, and chromosome context.

For each gene, the function calculates:

  • A p-value for the enrichment of lesion events

  • An FDR-adjusted q-value using the Pounds & Cheng (2006) method

  • Significance of multi-lesion constellation patterns (e.g., p-value for a locus being affected by 1, 2, etc., lesion types)

Value

A list containing:

gene.hits

A data.frame of GRIN results for each gene, including annotation, subject/hit counts by lesion type, and p/q-values for individual and multi-lesion constellation significance.

lsn.data

The original lesion input data.

gene.data

The original gene annotation input data.

gene.lsn.data

A data.frame where each row represents a gene-lesion overlap. Includes columns "gene" (Ensembl ID) and "ID" (sample ID).

chr.size

The chromosome size reference table used in computations.

gene.index

Indexes linking genes to rows in gene.lsn.data by chromosome.

lsn.index

Indexes linking lesions to rows in gene.lsn.data.

Author(s)

Abdelrahman Elsayed abdelrahman.elsayed@stjude.org, Stanley Pounds stanley.pounds@stjude.org

References

Pounds, S. et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data.

Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.

See Also

prep.gene.lsn.data, find.gene.lsn.overlaps, count.hits, prob.hits

Examples

data(lesion_data)
data(hg38_gene_annotation)
data(hg38_chrom_size)

# Example1: Run GRIN with user-supplied annotation and chromosome size:
grin.results <- grin.stats(lesion_data,
                           hg38_gene_annotation,
                           hg38_chrom_size)

# Example 2: User can specify genome version to automatically retrieve annotation
# and chromosome size data:
# grin.results <- grin.stats(lesion_data,
#                            genome.version = "Human_GRCh38")

GRIN2 documentation built on June 17, 2025, 9:11 a.m.