extractContentDOM: Extract Main HTML Content from DOM
In tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Description Usage Arguments Author(s) References See Also

Function extracts main HTML Content using its Document Object Model. Idea comes basically from the fact, that main content of an HTML Document is in a subnode of the HTML DOM Tree with a high text-to-tag ratio. Internally, this function also calls assignValues, calcDensity, getMainText and removeTags.

1	extractContentDOM(url, threshold, asText = TRUE, ...)

`url`	character, url or filename
`threshold`	threshold for extraction, defaults to 0.5
`asText`	boolean, specifies if url should be interpreted as character
`...`	Additional Parameters to `htmlTreeParse`