javierluraschi/sparkwarc: Load WARC Files into Apache Spark

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.

Getting started

Package details

MaintainerJavier Luraschi <[email protected]>
LicenseApache License 2.0
Version0.1.4
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("devtools")
library(devtools)
install_github("javierluraschi/sparkwarc")
javierluraschi/sparkwarc documentation built on Oct. 24, 2017, 2:48 a.m.