paragraphs_scrap: Website text paragraph scraping

Description Usage Arguments Value Examples

View source: R/paragraphs_scrap.R

Description

This function is used to scrape text paragraphs from a website.

Usage

1
2
3
4
5
6
7
paragraphs_scrap(
  link,
  contain = NULL,
  case_sensitive = FALSE,
  collapse = FALSE,
  askRobot = FALSE
)

Arguments

link

the link of the web page to scrape

contain

filter the paragraphs according to the character string provided.

case_sensitive

logical. Should the contain argument be case sensitive ? defaults to FALSE

collapse

if TRUE the paragraphs will be collapsed into one element and the contain argument ignored.

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.

Value

a character vector.

Examples

1
2
3
4
5
# Extracting the paragraphs displayed on the health page of the New York Times

link     <- "https://www.nytimes.com/section/health"

paragraphs_scrap(link)

ralger documentation built on March 18, 2021, 1:06 a.m.