Reads a WARC (Web ARChive) file using Rcpp.
1 | spark_rcpp_read_warc(path, match_warc, match_line)
|
path |
The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3n://" and "file://" protocols. |
match_warc |
include only warc files mathcing this character string. |
match_line |
include only lines mathcing this character string. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.