spark_rcpp_read_warc: Reads a WARC File into using Rcpp

Description Usage Arguments

View source: R/sparkwarc.R

Description

Reads a WARC (Web ARChive) file using Rcpp.

Usage

1
spark_rcpp_read_warc(path, match_warc, match_line)

Arguments

path

The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3n://" and "file://" protocols.

match_warc

include only warc files mathcing this character string.

match_line

include only lines mathcing this character string.


sparkwarc documentation built on Jan. 11, 2022, 9:06 a.m.