sparkwarc: Load WARC Files into Apache Spark

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.

Getting started

Package details

AuthorYitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Javier Luraschi [aut]
MaintainerYitao Li <yitao@rstudio.com>
LicenseApache License 2.0
Version0.1.5
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("sparkwarc")

Try the sparkwarc package in your browser

Any scripts or data that you put into this service are public.

sparkwarc documentation built on Jan. 13, 2021, 12:09 p.m.