R/jericho-package.R

#' Break Down the Walls of 'HTML' Tags into Usable Text
#'
#' Structured 'HTML' content can be useful when you need to parse data tables or
#' other tagged data from within a document. However, it is also useful to obtain
#' "just the text" from a document free from the walls of tags that surround it.
#' Tools are provied that wrap methods in the 'Jericho HTML Parser' Java library
#' by Martin Jericho <http://jericho.htmlparser.net/docs/index.html>. Martin's
#' library is used in many at-scale projects, icluding the 'The Internet Archive'.
#'
#' @md
#' @name jericho
#' @docType package
#' @author Bob Rudis (bob@@rud.is)
#' @import rJava jerichojars
NULL
hrbrmstr/jericho documentation built on May 14, 2019, 9:35 a.m.