View source: R/gl.find.genes.for.loci.r
| gl.find.genes.for.loci | R Documentation |
Given a SNP genlight object and a GFF3 annotation file, find the closest gene (or transcript, if requested) for each input locus. If a locus falls within a gene interval, that gene is considered the closest with distance 0.
gl.find.genes.for.loci(
x,
gff.file,
loci,
include_types = c("gene", "pseudogene"),
fallback_to_mrna = TRUE,
save2tmp = FALSE,
verbose = NULL
)
x |
A SNP genlight object with mapped loci. Must contain per-locus x$chromosome and x$position. [required] |
gff.file |
Path to a GFF3 file (either plain or with a .gz alongside). [required] |
loci |
Character vector of locus names to map. Must match locNames(x). [required] |
include_types |
Character vector of GFF types to treat as "gene" features. Defaults to c("gene","pseudogene"). |
fallback_to_mrna |
Logical. If no rows match include_types, use transcript features c("mRNA","transcript") as proxies. [default TRUE] |
save2tmp |
Logical: save the result table to tempdir() (retrievable with gl.list.reports / gl.print.reports). [default FALSE] |
verbose |
Verbosity: 0-5 (see gl.set.verbosity()). [default from gl.check.verbosity()] |
The function parses common keys in the GFF attributes column (e.g., ID, Name, gene, product, Parent) to provide informative gene labels. Closeness is measured on the same sequence (chromosome/contig) as: - 0 if the locus is within [gene_start, gene_end] - otherwise, the minimum bp distance to the interval edges
If multiple genes are exactly equally close, a deterministic tie-break is applied: closest to gene midpoint, then shorter gene length, then lexicographic gene_id.
A data.table with one row per input locus and columns: locus, chrom, pos, gene_start, gene_end, gene_type, gene_id, gene_name, gene_symbol, gene_product, gene_attributes, distance_bp, nearest_side. 'distance_bp' is the absolute distance in bp; 'nearest_side' is "inside", "left" (locus < gene_start), or "right" (locus > gene_end) in coordinate space.
Other annotation and mapping helpers:
gl.find.loci.in.genes()
## Not run:
res <- gl.find.genes.for.loci(
x = testset.gl,
gff.file = "species.gff3",
loci = c("locus_12","locus_51","locus_89")
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.