sparkwarc: Load WARC Files into Apache Spark
Version 0.1.1

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project .

Browse man pages Browse package API and functions Browse package files

AuthorJavier Luraschi [aut, cre]
Date of publication2017-01-13 06:42:24
MaintainerJavier Luraschi <javier@rstudio.com>
LicenseApache License 2.0
Version0.1.1
Package repositoryView on CRAN
InstallationInstall the latest version of this package by entering the following in R:
install.packages("sparkwarc")

Man pages

cc_warc: Provides WARC paths for commoncrawl.org
spark_read_warc: Reads a WARC File into Apache Spark

Functions

cc_warc Man page Source code
onLoad Source code
spark_dependencies Source code
spark_read_warc Man page Source code

Files

inst
inst/java
inst/java/sparkwarc-1.5-2.10.jar
inst/java/sparkwarc-2.0-2.11.jar
inst/java/sparkwarc-1.6-2.10.jar
inst/samples
inst/samples/sample.warc.gz
inst/samples/sample.warc.paths
inst/samples/sample.wat
inst/samples/sample.wet
inst/samples/sample.wet.gz
inst/samples/sample.wat.gz
inst/samples/sample.warc
NAMESPACE
R
R/commoncrawl.R
R/dependencies.R
R/sparkwarc.R
README.md
MD5
java
java/SparkWARC.scala
DESCRIPTION
man
man/cc_warc.Rd
man/spark_read_warc.Rd
sparkwarc documentation built on May 20, 2017, 3:55 a.m.