mannau/boilerpipeR: Interface to the Boilerpipe Java Library
Version 1.3.1

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe (http://code.google.com/p/boilerpipe/) Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Getting started

Package details

AuthorSee AUTHORS file.
MaintainerMario Annau <[email protected]>
LicenseApache License (== 2.0)
Version1.3.1
URL https://github.com/mannau/boilerpipeR
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
install.packages("devtools")
library(devtools)
install_github("mannau/boilerpipeR")
mannau/boilerpipeR documentation built on May 21, 2017, 5:45 p.m.