Jsoup-class: CLASS Jsoup

Description Details Fields Methods

Description

The core public access point to the jsoup functionality.

Details

The core public access point to the jsoup functionality.

Fields

jsoup:

Object of class "jclassName"

Methods

new(...):

Create a new Jsoup object. ... is used to define the appropriate slots.

connect(url):

Creates a new Connection to a URL. Use to fetch and parse a HTML page. To the connection you can add data, cookies, and headers; set the user-agent, referrer, method; and then execute.

url:

URL to connect to. The protocol must be http or https.

parse(html, baseUri):

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

html:

A character string. HTML to parse

baseUri:

A character string. The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. If NA is specified, absolute URL detection relies on the HTML including a <base href> tag.

clean(bodyHtml, whitelist, baseUri = NA):

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

bodyhtml:

A character string. input untrusted HTML (body fragment)

whitelist:

A Whitelist. The default is type = "none". See Whitelist documentation for detail.

baseUri:

A character string. The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag. If NA is specified, absolute URL detection relies on the HTML including a <base href> tag.

isValid(bodyHtml, whitelist):

Test if the input HTML has only tags and attributes allowed by the Whitelist. Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.

bodyhtml:

A character string. input untrusted HTML (body fragment)

whitelist:

A Whitelist. The default is type = "none". See Whitelist documentation for detail.


johndharrison/Rsoup documentation built on May 19, 2019, 4:22 p.m.