tidy_scrap: Website Tidy scraping

Description Usage Arguments Value Examples

View source: R/tidy_scrap.R

Description

This function is used to scrap a tibble from a website.

Usage

1
tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)

Arguments

link

the link of the webpage to scrap

nodes

the vector of CSS elements to consider, the SelectorGadget tool is highly recommended.

colnames

the names of the expected columns.

clean

logical. Should the function clean the extracted tibble or not ? Default is FALSE.

askRobot

logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.

Value

a tidy dataframe.

Examples

1
2
3
4
5
6
7
# Extracting imdb movie titles and rating

link     <- "https://www.imdb.com/chart/top/"
my_nodes <- c(".titleColumn a", "strong")
names    <- c("title", "rating")

tidy_scrap(link, my_nodes, names)

feddelegrand7/ralger documentation built on Jan. 14, 2020, 12:33 p.m.