prob.hits: Find Probability of Locus Hit
In GRIN2: Genomic Random Interval (GRIN)

prob.hits

R Documentation

Find Probability of Locus Hit

Description

Computes the probability that each genomic locus (e.g., gene or regulatory region) is affected by one or more types of genomic lesions. This function estimates statistical significance for lesion enrichment using a convolution of independent but non-identical Bernoulli distributions.

Usage

prob.hits(hit.cnt, chr.size = NULL)

Arguments

`hit.cnt`	A list returned by the `count.hits()` function, containing the number of subjects and hits affecting each locus by lesion type.
`chr.size`	A `data.frame` containing chromosome sizes for all 22 autosomes and the X and Y chromosomes. It must include two columns: `"chrom"` for chromosome number, and `"size"` for chromosome lengths in base pairs.

Details

This function estimates a p-value for each locus based on the probability of observing the observed number of lesions (or more) by chance, under a model where lesion events are treated as independent Bernoulli trials.

For each lesion type, the model considers heterogeneity in lesion probability across loci based on their genomic context (e.g., locus size, chromosome size). These probabilities are then combined using a convolution of Bernoulli distributions to estimate the likelihood of observing the actual hit counts.

In addition, the function calculates:

FDR-adjusted q-values using the method of Pounds and Cheng (2006), which estimates the proportion of true null hypotheses.
p- and q-values for multi-lesion constellation hits, i.e., the probability that a locus is affected by one (p1), two (p2), or more types of lesions simultaneously.

Value

A list with the following components:

`gene.hits`	A `data.frame` containing GRIN statistical results. Includes gene annotations, the number of subjects and hits by lesion type, and the computed p-values and FDR-adjusted q-values for lesion enrichment across one or more lesion types.
`lsn.data`	Original input lesion data.
`gene.data`	Original input gene annotation data.
`gene.lsn.data`	A `data.frame` in which each row corresponds to a gene overlapped by a specific lesion. Includes columns for Ensembl gene ID (`gene`) and patient/sample ID (`ID`).
`chr.size`	Chromosome size information used in the computation.
`gene.index`	A `data.frame` indexing rows in `gene.lsn.data` corresponding to each chromosome.
`lsn.index`	A `data.frame` indexing rows in `gene.lsn.data` corresponding to each lesion.

Author(s)

Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org

References

Pounds, S. et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data.

Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.

Examples

data(lesion_data)
data(hg38_gene_annotation)
data(hg38_chrom_size)

# 1) Prepare gene and lesion data:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)

# 2) Identify overlapping gene-lesion events:
gene.lsn.overlap <- find.gene.lsn.overlaps(prep.gene.lsn)

# 3) Count number of subjects and lesions affecting each gene:
count.subj.hits <- count.hits(gene.lsn.overlap)

# 4) Compute p- and q-values for lesion enrichment per gene:
hits.prob <- prob.hits(count.subj.hits, hg38_chrom_size)

GRIN2 documentation built on June 17, 2025, 9:11 a.m.