variantspark: A 'Sparklyr' Extension for 'VariantSpark'

This is a 'sparklyr' extension integrating 'VariantSpark' and R. 'VariantSpark' is a framework based on 'scala' and 'spark' to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, 'VariantSpark' is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a 'spark' data frame.

Getting started

Package details

AuthorSamuel Macêdo [aut, cre], Javier Luraschi [aut]
MaintainerSamuel Macêdo <samuelmacedo@recife.ifpe.edu.br>
LicenseApache License 2.0 | file LICENSE
Version0.1.1
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("variantspark")

Try the variantspark package in your browser

Any scripts or data that you put into this service are public.

variantspark documentation built on June 13, 2019, 5:03 p.m.