find_genes: Map SNPs to genes

Description Usage Arguments Details Value See Also Examples

Description

Given a data frame with SNPs (CpGs, etc.) and a data frame with genes return the first data frame with an additional column that contains for every SNP a comma-separated list with the identifiers of genes that contain the given SNP.

Usage

1
2
3
find_genes(d, genes, chr1 = "chr", pos = "pos", out = "genes",
           chr2 = chr1, start = "start", end = "end", id = "id",
           quiet = FALSE)

Arguments

d

Data frame with SNPs.

genes

Data frame with genes.

chr1

Name of column in ‘d’ containing chromosome number.

pos

Name of column in ‘d’ containing basepair position.

out

Name of column in output data frame that contains the comma-separated lists of genes.

chr2

Name of column in ‘genes’ containing chromosome number.

start

Name of column in ‘genes’ containing gene start positions.

end

Name of column in ‘genes’ containing gene end positions.

id

Name of column in ‘genes’ containing gene identifiers that will appear in the comma-separated lists of the new column ‘out’ in the output data frame.

quiet

If FALSE (default) progress messages are printed to the screen.

Details

Both ‘d’ and ‘genes’ can contain other columns beside ‘chr1’, ‘pos’, ‘chr2’, ‘start’, ‘end’, and ‘id’. The order of SNPs and genes within the data frames is of no importance. If a SNP does not belong to any of the genes, it will get a value of NA in the ‘out’ column of the resulting data frame. NAs in relevant columns of the SNP data frame lead to NAs in ‘out’ column in output. NAs in genes data frame are dropped before SNPs are matched to genes.

Value

Returns a data frame like ‘d’ but with an additional column called ‘out’ that contains for every SNP in ‘d’ a comma-separated list of identifiers of genes from the ‘id’ column in ‘genes’ that contain the given SNP.

See Also

match_intervals

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
library(genFun)

d <- read.table(textConnection("\
snp chr pos
rs1 1 1
rs2 1 5
rs3 2 3
rs4 3 4
", "r"), header = TRUE, stringsAsFactors = FALSE)

genes <- read.table(textConnection("\
id chr start end
g1 1 1 3
g2 1 1 6
g3 2 1 4
g4 4 1 5
", "r"), header = TRUE, stringsAsFactors = FALSE)

find_genes(d, genes)

cbaumbach/genFun documentation built on May 13, 2019, 1:47 p.m.