htmltab: Assemble Data Frames from HTML Tables

HTML tables are a valuable data source but extracting and recasting these data into a useful format can be tedious. This package allows to collect structured information from HTML tables. It is similar to readHTMLTable() of the XML package but provides three major advantages. First, the function automatically expands row and column spans in the header and body cells. Second, users are given more control over the identification of header and body rows which will end up in the R table, including semantic header information that appear throughout the body. Third, the function preprocesses table code, corrects common types of malformations, removes unneeded parts and so helps to alleviate the need for tedious post-processing.

AuthorChristian Rubba [aut, cre]
Date of publication2016-12-29 01:06:12
MaintainerChristian Rubba <christian.rubba@gmail.com>
LicenseMIT + file LICENSE
Version0.7.1
https://github.com/crubba/htmltab

View on CRAN

Files

htmltab
htmltab/inst
htmltab/inst/doc
htmltab/inst/doc/htmltab.html
htmltab/inst/doc/htmltab.Rmd
htmltab/inst/doc/htmltab.R
htmltab/tests
htmltab/tests/testthat.R
htmltab/tests/testthat
htmltab/tests/testthat/test_multi-dim-header.R
htmltab/tests/testthat/test_find_header.R
htmltab/tests/testthat/test_inputs.R
htmltab/tests/testthat/test_expand_spans.R
htmltab/NAMESPACE
htmltab/NEWS
htmltab/R
htmltab/R/utils.R htmltab/R/header.R htmltab/R/identify_rows.R htmltab/R/setup_and_checks.R htmltab/R/body.R htmltab/R/colnames.R htmltab/R/inbody_header.R htmltab/R/htmltab.R htmltab/R/zzz.R
htmltab/vignettes
htmltab/vignettes/htmltab.Rmd
htmltab/MD5
htmltab/build
htmltab/build/vignette.rds
htmltab/DESCRIPTION
htmltab/man
htmltab/man/eval_body.Rd htmltab/man/rm_empty_rows.Rd htmltab/man/check_type.Rd htmltab/man/normalize_tr.Rd htmltab/man/num_xpath.Rd htmltab/man/eval_header.Rd htmltab/man/get_header_elements.Rd htmltab/man/get_trindex.Rd htmltab/man/select_tab.Rd htmltab/man/create_inbody.Rd htmltab/man/rm_empty_cols.Rd htmltab/man/get_cell_element.Rd htmltab/man/rm_nuisance.Rd htmltab/man/identify_elements.Rd htmltab/man/get_body_xpath.Rd htmltab/man/get_span.Rd htmltab/man/htmltab.Rd htmltab/man/get_head_xpath.Rd
htmltab/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.