Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
|Author||Mario Annau [aut, cre]|
|Maintainer||Mario Annau <[email protected]>|
|License||Apache License (== 2.0)|
|Package repository||View on R-Forge|
Install the latest version of this package by entering the following in R:
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.