set.genomic.region: Variants annotation based on gene positions
In Ravages: Rare Variant Analysis and Genetic Simulations

set.genomic.region

R Documentation

Variants annotation based on gene positions

Description

Attributes regions to variants based on given region positions

Usage

set.genomic.region(x, regions = genes.b37, flank.width = 0L, split = TRUE)

Arguments

`x`	A bed.matrix
`regions`	A dataframe in bed format (start is 0-based and end is 1-based) containing the fields : `Chr` (the chromosome of the gene), `Start` (the start position of the gene, 0-based), `End` (the end position of the gene, 1-based), and `Name` (the name of the gene - a factor),
`flank.width`	An integer: width of the flanking regions in base pairs downstream and upstream the regions.
`split`	Whether to split variants attributed to multiple regions by duplicating this variants, set at TRUE by default

Details

Warnings: regions$Name should be a factor containing UNIQUE names of the regions, ORDERED in the genome order.

We provide two data sets of autosomal humain genes, genes.b37 and genes.b38.

If x@snps$chr is not a vector of integers, it should be a factor with same levels as regions$Chr.

If flank.width is null, only the variants having their position between the regions$Start and the regions$End of a gene will be attributed to the corresponding gene. When two regions overlap, variants in the overlapping zone will be assigned to those two regions, separated by a comma.

If flank.width is a positive number, variants flank.width downstream or upstream a gene will be annotated annotated to this gene. You can use flank.width = Inf to have each variant attributed to the nearest gene.

If a variant is attributed to multiple genomic regions, it will be duplicated in the bed matrix with one row per genomic region if split = TRUE. Variants will have new IDs being CHR:POS:A1:A2:genomic.region.

Value

The same bed matrix as x with an additional column x@snps$genomic.region containing the annotation of each variant.

Examples

#Import 1000Genome data from region around LCT gene
x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim)

#Group variants within known genes
x <- set.genomic.region(x)

#Group variants within know genes +/- 500bp
x <- set.genomic.region(x, flank.width=500)

Ravages documentation built on April 1, 2023, 12:08 a.m.