This is a sparklyr extension integrating VariantSpark and R. VariantSpark is a framework based on scala and spark to analyze genome datasets, see <https://bioinformatics.csiro.au/>. It was tested on datasets with 3000 samples each one containing 80 million features in either unsupervised clustering approaches and supervised applications, like classification and regression. The genome datasets are usually writing in VCF, a specific text file format used in bioinformatics for storing gene sequence variations. So, VariantSaprk is a great tool for genome research, because it is able to read VCF files, run analyses and return the output in a spark data frame.
|Maintainer||Samuel Macêdo <[email protected]>|
|License||Apache License 2.0 | file LICENSE|
|Package repository||View on GitHub|
Install the latest version of this package by entering the following in R:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.