pc_extract_from_web: Facilitates extracting strings from web pages

Description Usage Arguments Examples

View source: R/pc_extract_from_web.R

Description

Facilitates extracting strings from web pages

Usage

1
2
3
4
5
6
7
8
9
pc_extract_from_web(
  url,
  container = NULL,
  container_class = NULL,
  container_id = NULL,
  container_instance = NULL,
  subelement = NULL,
  no_children = NULL
)

Arguments

url

A characther vector of length one. URL or path to local html file.

container

Defaults to NULL. If provided, it must be an html element such as "div", "span", etc.

container_class

Defaults to NULL. If provided, also 'container' must be given (and 'container_id' must be NULL). Only text found inside the provided combination of container/class will be extracted.

container_id

Defaults to NULL. If provided, also 'container' must be given (and 'container_class' must be NULL). Only text found inside the provided combination of container/class will be extracted.

container_instance

Defaults to NULL. If given, it must be an integer. If a given element is found more than once in the same page, it keeps only the relevant occurrence for further extraction.

subelement

Defaults to NULL. If provided, also 'container' must be given. Only text within elements of given type under the chosen combination of container/container_class will be extracted. When given, it will tipically be "p", to extract all p elements inside the selected div.

no_children

Defaults to FALSE, i.e. by default all subelements of the selected combination (e.g. div with given class) are extracted. If TRUE, only text found under the given combination (but not its subelements) will be extracted. Corresponds to the xpath string '/node()[not(self::div)]'.

Examples

1
2
3
4
## Not run: 
title <- pc_extract_from_web(url = "https://www.europeandatajournalism.eu/eng/News/Data-news/The-price-of-coastal-flood-mitigation-in-Europe", container = "h1")

## End(Not run)

giocomai/popularitycheckr documentation built on Aug. 1, 2020, 11:50 a.m.