prep.gene.lsn.data: Prepare Gene and Lesion Data for GRIN Analysis

View source: R/prep.gene.lsn.data.R

prep.gene.lsn.dataR Documentation

Prepare Gene and Lesion Data for GRIN Analysis

Description

Prepares and indexes gene and lesion data for downstream GRIN (Genomic Random Interval) analysis. This function merges and orders gene and lesion coordinates to support efficient computation of overlaps between genes and all different types of genomic lesions (structural or sequence lesions).

Usage

prep.gene.lsn.data(lsn.data, gene.data, mess.freq = 10)

Arguments

lsn.data

A data.frame containing lesion data in GRIN-compatible format. Must include the following five columns:

ID

Unique patient identifier.

chrom

Chromosome on which the lesion is located.

loc.start

Start position of the lesion in base pairs.

loc.end

End position of the lesion in base pairs.

lsn.type

Type of lesion (e.g., gain, loss, mutation, fusion, etc...).

gene.data

A data.frame containing gene annotation data with the following four required columns:

gene

Ensembl gene ID.

chrom

Chromosome on which the gene is located.

loc.start

Start position of the gene in base pairs.

loc.end

End position of the gene in base pairs.

mess.freq

Integer specifying the frequency at which progress messages are displayed. Messages are printed every mess.freq-th lesion block processed (default is 10).

Details

This function performs pre-processing by ordering and indexing both gene and lesion data. It combines gene and lesion coordinates into a unified structure, marking each with a specific code (cty) that identifies whether the row represents a gene or lesion. This merged data is then used in the find.gene.lsn.overlaps() function to detect gene-lesion overlaps.

Value

A list with the following components:

lsn.data

Original lesion data.

gene.data

Original gene annotation data.

gene.lsn.data

Combined and ordered data.frame of gene and lesion intervals. The cty column encodes position type: 1 = gene start, 2 = lesion start, 3 = lesion end, 4 = gene end.

gene.index

Index data.frame indicating the start and end rows for each chromosome within gene.lsn.data for genes.

lsn.index

Index data.frame indicating the start and end rows for each lesion (grouped by type, chromosome, and subject) within gene.lsn.data.

Author(s)

Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org

References

Pounds, S., et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data. Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.

See Also

order.index.gene.data, order.index.lsn.data, find.gene.lsn.overlaps

Examples

data(lesion_data)
data(hg38_gene_annotation)

# Prepare gene and lesion data for GRIN analysis:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)

GRIN2 documentation built on June 17, 2025, 9:11 a.m.