SNPediaR

BiocStyle::markdown()

About

SNPedia is a curated database containing information about thousands of SNPs. Related diseases, genotypes and references to relevant scientific publications are available trough their web. This site is powered by MediaWiki and information about each SNP is written in the corresponding wiki page.

The SNPediaR library provides tools for automatically search and download such pages. It also implements few functions to scrap some relevant information from the downloaded wiki text, and allows users to extend such parsing functionality.

Downloading pages

For a known set of pages, the function getPages downloads the corresponding wiki content using the MediaWiki web API.

We can for instance download the page Rs53576, corresponding to the rs53576 SNP doing:

library (SNPediaR)
pg <- getPages (titles = "Rs53576")
pg

We can use the same function to download several pages at a time, for instance we can download the 3 genotype pages corresponding with the same SNP: Rs53576(A;A), Rs53576(A;G) and Rs53576(G;G) as

pgs <- getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)"))
pgs

Extracting relevant information requires parsing the wiki text. Some utility functions are already implemented in our library for such purpose and any other can be implemented by users.

The function extractSnpTags for instance, extracts the "tabular" information from SNP pages:

extractSnpTags (pg$Rs53576)

The function extractGenotypeTags can be used to get the "tabular" information from genotype pages:

sapply (pgs, extractGenotypeTags)

This same parsing can also be done while downloading the pages, including the wiki processing function as an argument of the in the getPages query.

If for instance we are just interested in the alleles and the magnitude associated with each of the genotypes we can do:

getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)"),
          wikiParseFunction = extractGenotypeTags,
          tags = c ("allele1", "allele2", "magnitude"))

Customized parsing functions

Any wiki processing function can be included in the getPages. If a user wants for instance to extract all PubMed IDs from pages Rs53576 and Rs1815739, he or she can first define a parsing function like:

findPMID <- function (x) {
    x <- unlist (strsplit (x, split = "\n"))
    x <- grep ("PMID=", x, value = TRUE)
    x
}

and then call getPages as:

getPages (titles = c ("Rs53576", "Rs1815739"),
          wikiParseFunction = findPMID)

Categories

Besides the SNP and the genotype pages, some other interesting SNPedia resources are the category pages. They constitute indexes of all other pages which may be queried.

Most used categories are:

Full list of categories may be found here.

The function getCategoryElements is devised to query all elements under certain category. It can be used explore which is the available information in SNPedia.

We can get for instance all medical conditions

res <- getCategoryElements (category = "Is_a_medical_condition")
head (res)

and find out those related to cancer

grep ('cancer', res, value = TRUE)

Session info

sessionInfo ()

Created: 2015-09-27 | Revised: 2016-06-03 | Compiled r Sys.Date()



Try the SNPediaR package in your browser

Any scripts or data that you put into this service are public.

SNPediaR documentation built on Nov. 8, 2020, 5:08 p.m.