subsetHTML: Function subsetHTML

subsetHTMLR Documentation

Function subsetHTML

Description

Extracts a coherent subset of code from HTML-code.

Usage

subsetHTML(
  html,
  tag = "div",
  pattern = NULL,
  edit = F,
  save = F,
  plot = F,
  filename = NULL,
  trim = T
)

Arguments

html

A character element containing HTML-code.

tag

Character element specifying the the subsets of interest. Defaults to "div".

pattern

Regular expression further specifying subsets of interest. If NULL (default) equals tag.

edit

Logical value specifying whether the data.frame should be plotted/edited.

save

Logical value specifying whether the HTML-code should be saved to a csv-file.

plot

Logical value specifying whether to plot the frequency of each HTML-tag found in the html-object.

filename

Character value specifying the filename (if save is TRUE). If NULL (default) as.numeric(Sys.time()) is applied.

trim

Logical value specifying whether to trim text. Defaults to T.

Details

Extracts a coherent subset of code from HTML-code (as returned by quantqual::getHTML, for example).

Examples

subsetHTML(getHTML("https://jobs.meinestadt.de/nuernberg/suche?words=Wissenschaftlicher%20Mitarbeiter",tag="div",pattern="class=\"m-resultListEntries__content\""))

AndreasFischer1985/quantqual documentation built on June 20, 2022, 4:55 p.m.