View source: R/gl.find.loci.in.genes.r
| gl.find.loci.in.genes | R Documentation |
Given a genlight object with mapped loci (chromosome and position) and a gene annotation file (GFF, plain or gz), this function returns the locus names whose genomic positions overlap features of type "gene" whose attributes match a user-supplied pattern (e.g., "MHC", "major histocompatibility").
gl.find.loci.in.genes(x, gff.file, gene, save2tmp = FALSE, verbose = NULL)
x |
Name of the genlight object containing SNP data [required]. |
gff.file |
Path to a GFF3 file (plain or with a companion .gz) [required]. |
gene |
Character pattern to detect target genes. Interpreted as a regular expression, case-insensitive matching is recommended via '(?i)' [required]. |
save2tmp |
Logical: save intermediate tables to tempdir for retrieval with gl.list.reports and gl.print.reports [default FALSE]. |
verbose |
Verbosity: 0=silent/fatal; 1=begin/end; 2=progress; 3=progress+summary; 5=full report [default 2 or as set by 'gl.set.verbosity()']. |
The function parses the GFF "attributes" column to extract common keys (e.g., 'Name', 'gene', 'product') and flags any "gene" features whose attributes match the supplied 'gene' pattern. It then uses interval overlap to identify input loci that fall inside those genes.
Required fields in 'x' for overlap are per-locus chromosome and base position, accessible as 'x$chromosome' and 'x$position', and locus names via 'locNames(x)'.
A character vector of locus names overlapping the matching gene intervals (in genomic coordinates).
Luis Mijangos (post to https://groups.google.com/d/forum/dartr)
Other annotation and mapping helpers:
gl.find.genes.for.loci()
## Not run:
# Regex for case-insensitive MHC:
mhc_loci <- gl.find.loci.in.genes(
x = testset.gl,
gff.file = "species.gff3",
gene = "(?i)major histocompatibility|\\bMHC\\b"
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.