boilerpipeR: Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Package overview Introduction to the tm.plugin.webmining Package

Vignettes Man pages API and functions Files

Package details
Author	See AUTHORS file.
Maintainer	Mario Annau <mario.annau@gmail.com>
License	Apache License (== 2.0)
Version	1.3.2
URL	https://github.com/mannau/boilerpipeR
Package repository	View on CRAN
Installation	Install the latest version of this package by entering the following in R: `install.packages("boilerpipeR")`