Structured 'HTML' content can be useful when you need to parse data tables or other tagged data from within a document. However, it is also useful to obtain "just the text" from a document free from the walls of tags that surround it. Tools are provied that wrap methods in the 'Jericho HTML Parser' Java library by Martin Jericho <http://jericho.htmlparser.net/docs/index.html>. Martin's library is used in many at-scale projects, icluding the 'The Internet Archive'.
Package details |
|
---|---|
Maintainer | Bob Rudis <bob@rud.is> |
License | Apache License 2.0 | file LICENSE |
Version | 0.2.0 |
URL | https://gitlab.com/hrbrmstr/jericho |
Package repository | View on GitHub |
Installation |
Install the latest version of this package by entering the following in R:
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.