boilerpipeR: Interface to the boilerpipe Java library by Christian Kohlschutter (http://code.google.com/p/boilerpipe/)
Version 1.2

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

Package details

AuthorMario Annau [aut, cre]
Date of publication2014-05-29 13:46:03
MaintainerMario Annau <mario.annau@gmail.com>
LicenseApache License (== 2.0)
Version1.2
URL https://github.com/mannau/boilerpipeR
Package repositoryView on R-Forge
Installation Install the latest version of this package by entering the following in R:
install.packages("boilerpipeR", repos="http://R-Forge.R-project.org")

Try the boilerpipeR package in your browser

Any scripts or data that you put into this service are public.

boilerpipeR documentation built on May 31, 2017, 3:53 a.m.