knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette describe how to use the variant endpoints which store information about SNPs used in the smarter genotype datasets.
library(smarterapi)
# required by this vignette library(pander)
One of the aim of this project is to manage genotypes in different assembly version. This means collect data from different assemblies (due to when data is generated), from different sources (Affymetrix, Illumina, WGS) and different file formats. Genotypes are normalized in order to be consistent accross data sources and stored in one genotype file for each specie.
Currently four assembly versions are managed, two for the sheep dataset and two for
the goat dataset. Information about assemblies data sources can be retrieved from
the backend info endpoint
through the get_smarter_info()
function:
info <- get_smarter_info() assemblies <- as.data.frame(t(as.data.frame(info$working_assemblies))) names(assemblies) <- c("name", "source")
pander::pander(assemblies)
get_smarter_variants()
have two mandatory parameters, species
and assembly
,
then it could accept additional parameters (see one of the
variant endpoints
to have more information). For example you can search variants for snp name or
rs id (if the latter exists):
snp <- get_smarter_variants( species = "Goat", assembly = "ARS1", query = list( name = "snp12965-scaffold1499-3295573" ) )
pander::pander(subset(snp, select = -c(`_id.$oid`, `sequence.IlluminaGoatSNP50`)))
Please, refer to the get_smarter_info()
working_assemblies
to have an idea
of the assemblies supported by the
SMARTER-database.
Data which come from SNPchiMp v.3
like the Sheep OAR3
assembly, support the illumina forward attribute.
For example the following SNP:
snp <- get_smarter_variants( species = "Sheep", assembly = "OAR3", query = list( rs_id = "rs10721092" ) )
pander::pander(subset(snp, select = -c(`_id.$oid`, `sequence.IlluminaOvineHDSNP`)))
Is T/C
on the forward strand of OAR3: this means that the reversed probe is
aligned to the genome (as you could infer from the bottom
illumina strand attribute
of this SNP.). Variants in the SMARTER-database are converted using the illumina top
coding convenction, so you will find this SNP as A/G
in the SMARTER-database while
on the reference sequence it's T/C
.
Variants endpoint support query by regions, using <chromosome>:<start>-<end>
as
format, for example:
variants <- get_smarter_variants( species = "Goat", assembly = "ARS1", query = list(region = "1:1-100000") )
pander::pander(subset(variants, select = -c(`_id.$oid`, `sequence.IlluminaGoatSNP50`)))
You can download all variants for a certain chip: please consider that it will require a lot of time and memory, since we store more than 600K SNPs in the smarter database. First, collect the available chips from the SMARTER-database, for example for the Sheep species:
sheep_chips <- get_smarter_supportedchips(query = list(species = "Sheep"))
pander::pander(subset(sheep_chips, select = -c(`_id.$oid`)))
Then collect all the SNPs for a certain chip by providing the SMARTER chip name. Please, consider that you will download more than 50K SNP for this chip and this will take a lot of time
variants <- get_smarter_variants( species = "Sheep", assembly = "OAR3", query = list( chip_name = "IlluminaOvineSNP50" ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.