titles_scrap: Website title scraping

Description Usage Arguments Value Examples

View source: R/titles_scrap.R

Description

This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.

Usage

1
titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

Arguments

link

the link of the web page to scrape

contain

filter the titles according to a character string provided.

case_sensitive

logical. Should the contain argument be case sensitive ? defaults to FALSE

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE

Value

a character vector

Examples

1
2
3
4
5
# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)

ralger documentation built on March 18, 2021, 1:06 a.m.