get_nexis_html | R Documentation |
This extract headings, body texts and meta data (date, byline, length, section, edition) from items in HTML files downloaded by the scraper.
get_nexis_html(path, paragraph_separator = "\n\n", verbosity, ...)
path |
either path to a HTML file or a directory that contains HTML files |
paragraph_separator |
a character to separate paragraphs in body texts |
verbosity |
|
... |
only to trap extra arguments |
## Not run:
irt <- readtext:::get_nexis_html('tests/data/nexis/irish-times_1995-06-12_0001.html')
afp <- readtext:::get_nexis_html('tests/data/nexis/afp_2013-03-12_0501.html')
gur <- readtext:::get_nexis_html('tests/data/nexis/guardian_1986-01-01_0001.html')
sun <- readtext:::get_nexis_html('tests/data/nexis/sun_2000-11-01_0001.html')
spg <- readtext:::get_nexis_html('tests/data/nexis/spiegel_2012-02-01_0001.html',
language_date = 'german')
all <- readtext('tests/data/nexis', source = 'nexis')
all <- readtext('tests/data/nexis', source = 'nexis')
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.