hrbrmstr/jericho: Break Down the Walls of 'HTML' Tags into Usable Text

Structured 'HTML' content can be useful when you need to parse data tables or other tagged data from within a document. However, it is also useful to obtain "just the text" from a document free from the walls of tags that surround it. Tools are provied that wrap methods in the 'Jericho HTML Parser' Java library by Martin Jericho <>. Martin's library is used in many at-scale projects, icluding the 'The Internet Archive'.

Getting started

Package details

MaintainerBob Rudis <[email protected]>
LicenseApache License 2.0 | file LICENSE
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
hrbrmstr/jericho documentation built on Sept. 6, 2017, 4:30 p.m.