Parser-class: CLASS Document

Description Details Fields Methods

Description

Parser

Details

Parses HTML into a Document. Generally best to use one of the more convenient parse methods in Jsoup.

Fields

parser:

Object of class "jobjRef"

Methods

new(...):

Create a new Parser object. ... is used to define the appropriate slots.

getErrors():

Retrieve the parse errors, if any, from the last parse.

parse(html, baseUri):

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

html:

A character string. HTML to parse

baseUri:

A boolean. The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. If NA is specified, absolute URL detection relies on the HTML including a <base href> tag.

htmlParser():

Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.

xmlParser():

Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.


johndharrison/Rsoup documentation built on May 19, 2019, 4:22 p.m.