prep.gene.lsn.data: Prepare Gene and Lesion Data for GRIN Analysis
In GRIN2: Genomic Random Interval (GRIN)

prep.gene.lsn.data

R Documentation

Prepare Gene and Lesion Data for GRIN Analysis

Description

Prepares and indexes gene and lesion data for downstream GRIN (Genomic Random Interval) analysis. This function merges and orders gene and lesion coordinates to support efficient computation of overlaps between genes and all different types of genomic lesions (structural or sequence lesions).

Usage

prep.gene.lsn.data(lsn.data, gene.data, mess.freq = 10)

Arguments

lsn.data

A data.frame containing lesion data in GRIN-compatible format. Must include the following five columns:

ID: Unique patient identifier.
chrom: Chromosome on which the lesion is located.
loc.start: Start position of the lesion in base pairs.
loc.end: End position of the lesion in base pairs.
lsn.type: Type of lesion (e.g., gain, loss, mutation, fusion, etc...).

gene.data

A data.frame containing gene annotation data with the following four required columns:

gene: Ensembl gene ID.
chrom: Chromosome on which the gene is located.
loc.start: Start position of the gene in base pairs.
loc.end: End position of the gene in base pairs.

mess.freq

Integer specifying the frequency at which progress messages are displayed. Messages are printed every mess.freq-th lesion block processed (default is 10).

Details

This function performs pre-processing by ordering and indexing both gene and lesion data. It combines gene and lesion coordinates into a unified structure, marking each with a specific code (cty) that identifies whether the row represents a gene or lesion. This merged data is then used in the find.gene.lsn.overlaps() function to detect gene-lesion overlaps.

Value

A list with the following components:

lsn.data: Original lesion data.
gene.data: Original gene annotation data.
gene.lsn.data: Combined and ordered data.frame of gene and lesion intervals. The cty column encodes position type: 1 = gene start, 2 = lesion start, 3 = lesion end, 4 = gene end.
gene.index: Index data.frame indicating the start and end rows for each chromosome within gene.lsn.data for genes.
lsn.index: Index data.frame indicating the start and end rows for each lesion (grouped by type, chromosome, and subject) within gene.lsn.data.

Author(s)

Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org

References

Pounds, S., et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data. Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.

Examples

data(lesion_data)
data(hg38_gene_annotation)

# Prepare gene and lesion data for GRIN analysis:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)

GRIN2 documentation built on June 17, 2025, 9:11 a.m.