prob.hits | R Documentation |
Computes the probability that each genomic locus (e.g., gene or regulatory region) is affected by one or more types of genomic lesions. This function estimates statistical significance for lesion enrichment using a convolution of independent but non-identical Bernoulli distributions.
prob.hits(hit.cnt, chr.size = NULL)
hit.cnt |
A list returned by the |
chr.size |
A |
This function estimates a p-value for each locus based on the probability of observing the observed number of lesions (or more) by chance, under a model where lesion events are treated as independent Bernoulli trials.
For each lesion type, the model considers heterogeneity in lesion probability across loci based on their genomic context (e.g., locus size, chromosome size). These probabilities are then combined using a convolution of Bernoulli distributions to estimate the likelihood of observing the actual hit counts.
In addition, the function calculates:
FDR-adjusted q-values using the method of Pounds and Cheng (2006), which estimates the proportion of true null hypotheses.
p- and q-values for multi-lesion constellation hits, i.e., the probability that a locus is affected by one (p1
), two (p2
), or more types of lesions simultaneously.
A list with the following components:
gene.hits |
A |
lsn.data |
Original input lesion data. |
gene.data |
Original input gene annotation data. |
gene.lsn.data |
A |
chr.size |
Chromosome size information used in the computation. |
gene.index |
A |
lsn.index |
A |
Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org
Pounds, S. et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data.
Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.
prep.gene.lsn.data
, find.gene.lsn.overlaps
, count.hits
data(lesion_data)
data(hg38_gene_annotation)
data(hg38_chrom_size)
# 1) Prepare gene and lesion data:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)
# 2) Identify overlapping gene-lesion events:
gene.lsn.overlap <- find.gene.lsn.overlaps(prep.gene.lsn)
# 3) Count number of subjects and lesions affecting each gene:
count.subj.hits <- count.hits(gene.lsn.overlap)
# 4) Compute p- and q-values for lesion enrichment per gene:
hits.prob <- prob.hits(count.subj.hits, hg38_chrom_size)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.