knitr::opts_chunk$set( collapse = TRUE, echo = TRUE, eval = FALSE, comment = "#>" ) library(magrittr) library(readr)
HLSGUtils
contains functions that can run the objects of the GenomicsUtils Scala project.
This function opens a Spark session and runs a Scala class with input arguments that are given from the R function. First, we need to have Apache Spark on our machine.
GenomicsUtils
is based on Spark version of 3.1.2 and Hadoop 2.7.
This page helps Choose your version and download the source.
On Linux, downloading and extracting Spark source files are done by these commands.:
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.1.3/spark-3.1.3-bin-hadoop2.7.tgz tar xvf spark-3.1.3-bin-hadoop2.7.tgz mv spark-3.1.3-bin-hadoop2.7 /opt/spark
Insert the following line into the /.bashrc
file to add the spark software file location to the PATH variable.
export PATH=$PATH:/opt/spark/bin
To active new setting, use the following command for sourcing the ~/.bashrc
file.
source ~/.bashrc
It is also necessary to define SPARK_HOME
in system.
export SPARK_HOME="/opt/spark" echo $SPARK_HOME #! check result
To see environments variables we can also use R command:
Sys.setenv(SPARK_HOME = "/opt/spark") Sys.getenv("SPARK_HOME")
Run the following command to see if everything is working properly:
spark-shell
If Spark is active on your system, then we can run a Spark-based HLSGUtils
functions.
VCF2Delta
reads VCF files and converts them to delta format .
library(HLSGUtils) vcf_file <- system.file("data","1KG_SNP_chr19.vcf.gz", package = "HLSGUtils") VCF2Delta(vcf_file, savePath = "~/Desktop/1KG_SNP_chr19.delta")
The sparklyr
package can read delta files, as shown in the code below.
library(sparklyr) sc <- spark_connect(master = "local", packages = "io.delta:delta-core_2.12:1.0.1") snp <- spark_read_delta(sc, "~/Desktop/1KG_SNP_chr19.delta") head(snp)
CSVJoinDelta
is Another useful function in GenomicsUtils
library.
It is used to join CSV files with datasets that are stored as delta format.
csv_path <- system.file("data","1KG_p_value.csv", package = "HLSGUtils") CSVJoinDelta(csvPath = csv_path, deltaPath = "~/Desktop/1KG_SNP_chr19.delta", byX = "ID", byY = "names", savePath = "~/Desktop/1KG_SNP_chr19_pvalue.csv" )
The final result is:
read_csv("~/Desktop/1KG_SNP_chr19_pvalue.csv/part-00000-2c471c7c-e42e-4943-8afc-637c18b6254c-c000.csv") %>% head()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.