In order to deal with this type of data, we have extended the resources available at the r CRANpkg("resourcer") package to VCF files. NOTE: PLINK files can be translated into VCF files using different pipelines. In R you can use r Biocpkg("SeqArray") to get VCF files.

We use the Genomic Data Storage (GDS) format which efficiently manage VCF files into the R environment. This extension requires to create a Client and a Resolver function that are located into the r Githubpkg("isglobal-brge", "dsOmics") package. The client function uses snpgdsVCF2GDS function implemented in r Biocpkg("SNPrelate") to coerce the VCF file to a GDS object. Then the GDS object is loaded into R as an object of class GdsGenotypeReader from r Biocpkg("GWASTools") package that facilitates downstream analyses.

The opal API server allows to incorporate this new type of resource as illustrated in Figure \@ref(fig:resourceVCF).

knitr::include_graphics(tools::file_path_as_absolute("../fig/opal_resource_VCF.png"))

It is important to notice that the URL should contain the tag method=biallelic.only&snpfirstdim=TRUE since these are required parameters of snpgdsVCF2GDS function. This is an example:

https://raw.githubusercontent.com/isglobal-brge/scoreInvHap/master/inst/extdata/example.vcf?method=biallelic.only&snpfirstdim=TRUE

In that case we indicate that only biallelic SNPs are considered ('method=biallelic.only') and that genotypes are stored in the individual-major mode, (i.e., list all SNPs for the first individual, and then list all SNPs for the second individual, etc) ('snpfirstdim=TRUE').



isglobal-brge/dsOmicsClient documentation built on March 20, 2023, 3:52 p.m.