read_ldnws: Read the Livedoor News Corpus

View source: R/ldnws-reader.R

read_ldnwsR Documentation

Read the Livedoor News Corpus

Description

Downloads and reads the Livedoor News Corpus. The result of this function is memoised with memoise::memoise() internally.

Usage

read_ldnws(
  url = "https://www.rondhuit.com/download/ldcc-20140209.tar.gz",
  exdir = tempdir(),
  keep = ldnws_categories(),
  collapse = "\n\n",
  include_title = TRUE
)

Arguments

url

String. If left with NULL, the function will skip downloading the file.

exdir

String. Directory to tempolarily untar text files.

keep

Categories to parse and keep in the tibble.

collapse

String with which base::paste() collapses lines.

include_title

Logical. Whether to include title in text body field. Defaults to TRUE.

Details

This function downloads the Livedoor News Corpus and parses it to a tibble. For details about the Livedoor News Corpus, please see thie page.

Value

A tibble.

See Also

Other ldnws-reader: ldnws_categories()


paithiov909/ldccr documentation built on Feb. 3, 2025, 12:16 a.m.