annotate: Annotate CNV Regions with Gene Symbols

View source: R/annotate.R

annotateR Documentation

Annotate CNV Regions with Gene Symbols

Description

Finds the overlap between a gene annotation file and a recurrent CNV file using genomic ranges, and annotates each CNV region with the corresponding gene symbol. Requires the GenomicRanges package.

Usage

annotate(
  genes_file,
  risk_file,
  output_dir = ".",
  seqnames_field_genes = "Chr",
  start_field_genes = "Start",
  end_field_genes = "End",
  gene_symbol_field = "GeneSymbol",
  seqnames_field_risk = "Chr",
  start_field_risk = "Start",
  end_field_risk = "End",
  sample_field = "Sample",
  segment_mean_field = "Segment_Mean"
)

Arguments

genes_file

Character. Path to the gene annotation CSV file. Must contain chromosome, start, end, and gene symbol columns (see parameters below for defaults).

risk_file

Character. Path to the recurrent CNV CSV file (e.g., the file path returned by recurrent). Must contain sample, chromosome, start, end, and segment mean columns.

output_dir

Character. Directory where the annotated CSV will be saved. Default is the current directory (".").

seqnames_field_genes

Character. Column name for chromosome in the gene file. Default is "Chr".

start_field_genes

Character. Column name for start position in the gene file. Default is "Start".

end_field_genes

Character. Column name for end position in the gene file. Default is "End".

gene_symbol_field

Character. Column name for gene symbols in the gene file. Default is "GeneSymbol".

seqnames_field_risk

Character. Column name for chromosome in the CNV file. Default is "Chr".

start_field_risk

Character. Column name for start position in the CNV file. Default is "Start".

end_field_risk

Character. Column name for end position in the CNV file. Default is "End".

sample_field

Character. Column name for sample IDs in the CNV file. Default is "Sample".

segment_mean_field

Character. Column name for segment mean values in the CNV file. Default is "Segment_Mean".

Details

This function uses GenomicRanges::findOverlaps with type = "within" to find genes that fall entirely within each CNV region. This function is cancer-type agnostic and can be applied to CNV data from any solid tumour with a compatible gene annotation reference file.

Value

A data frame containing annotated CNV regions with columns: Sample, GeneSymbol, Segment_Mean, Chr, Start, End. The result is also written to a timestamped CSV file in output_dir.

Examples

genes_file <- system.file("extdata", "gene_annotation.csv",
                           package = "RiskyCNV")
cnv_file   <- system.file("extdata", "annotated_cnv.csv",
                           package = "RiskyCNV")
annotated  <- annotate(
  genes_file = genes_file,
  risk_file  = cnv_file,
  output_dir = tempdir()
)
head(annotated)


RiskyCNV documentation built on June 5, 2026, 5:07 p.m.