set.genomic.region: Variants annotation based on gene positions

View source: R/set_genomic_region.r

set.genomic.regionR Documentation

Variants annotation based on gene positions

Description

Attributes regions to variants based on given region positions

Usage

set.genomic.region(x, regions = genes.b37, flank.width = 0L, split = TRUE)

Arguments

x

A bed.matrix

regions

A dataframe in bed format (start is 0-based and end is 1-based) containing the fields : Chr (the chromosome of the gene), Start (the start position of the gene, 0-based), End (the end position of the gene, 1-based), and Name (the name of the gene - a factor),

flank.width

An integer: width of the flanking regions in base pairs downstream and upstream the regions.

split

Whether to split variants attributed to multiple regions by duplicating this variants, set at TRUE by default

Details

Warnings: regions$Name should be a factor containing UNIQUE names of the regions, ORDERED in the genome order.

We provide two data sets of autosomal humain genes, genes.b37 and genes.b38.

If x@snps$chr is not a vector of integers, it should be a factor with same levels as regions$Chr.

If flank.width is null, only the variants having their position between the regions$Start and the regions$End of a gene will be attributed to the corresponding gene. When two regions overlap, variants in the overlapping zone will be assigned to those two regions, separated by a comma.

If flank.width is a positive number, variants flank.width downstream or upstream a gene will be annotated annotated to this gene. You can use flank.width = Inf to have each variant attributed to the nearest gene.

If a variant is attributed to multiple genomic regions, it will be duplicated in the bed matrix with one row per genomic region if split = TRUE. Variants will have new IDs being CHR:POS:A1:A2:genomic.region.

Value

The same bed matrix as x with an additional column x@snps$genomic.region containing the annotation of each variant.

See Also

genes.b37, genes.b38

Examples

#Import 1000Genome data from region around LCT gene
x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim)

#Group variants within known genes
x <- set.genomic.region(x)

#Group variants within know genes +/- 500bp
x <- set.genomic.region(x, flank.width=500)

Ravages documentation built on April 1, 2023, 12:08 a.m.