gl.find.genes.for.loci: Map loci (SNPs) to the nearest gene feature from a GFF

View source: R/gl.find.genes.for.loci.r

gl.find.genes.for.lociR Documentation

Map loci (SNPs) to the nearest gene feature from a GFF

Description

Given a SNP genlight object and a GFF3 annotation file, find the closest gene (or transcript, if requested) for each input locus. If a locus falls within a gene interval, that gene is considered the closest with distance 0.

Usage

gl.find.genes.for.loci(
  x,
  gff.file,
  loci,
  include_types = c("gene", "pseudogene"),
  fallback_to_mrna = TRUE,
  save2tmp = FALSE,
  verbose = NULL
)

Arguments

x

A SNP genlight object with mapped loci. Must contain per-locus x$chromosome and x$position. [required]

gff.file

Path to a GFF3 file (either plain or with a .gz alongside). [required]

loci

Character vector of locus names to map. Must match locNames(x). [required]

include_types

Character vector of GFF types to treat as "gene" features. Defaults to c("gene","pseudogene").

fallback_to_mrna

Logical. If no rows match include_types, use transcript features c("mRNA","transcript") as proxies. [default TRUE]

save2tmp

Logical: save the result table to tempdir() (retrievable with gl.list.reports / gl.print.reports). [default FALSE]

verbose

Verbosity: 0-5 (see gl.set.verbosity()). [default from gl.check.verbosity()]

Details

The function parses common keys in the GFF attributes column (e.g., ID, Name, gene, product, Parent) to provide informative gene labels. Closeness is measured on the same sequence (chromosome/contig) as: - 0 if the locus is within [gene_start, gene_end] - otherwise, the minimum bp distance to the interval edges

If multiple genes are exactly equally close, a deterministic tie-break is applied: closest to gene midpoint, then shorter gene length, then lexicographic gene_id.

Value

A data.table with one row per input locus and columns: locus, chrom, pos, gene_start, gene_end, gene_type, gene_id, gene_name, gene_symbol, gene_product, gene_attributes, distance_bp, nearest_side. 'distance_bp' is the absolute distance in bp; 'nearest_side' is "inside", "left" (locus < gene_start), or "right" (locus > gene_end) in coordinate space.

See Also

Other annotation and mapping helpers: gl.find.loci.in.genes()

Examples

## Not run: 
res <- gl.find.genes.for.loci(
  x = testset.gl,
  gff.file = "species.gff3",
  loci = c("locus_12","locus_51","locus_89")
)

## End(Not run)


dartR.popgen documentation built on March 16, 2026, 9:07 a.m.