scrape-index: Helper functions for scraping index pages

Description Usage Arguments Value See Also

Description

These are helper functions for package development, not written in man page index but available to advanced users.

get_index_url finds out the newest index page of a board. It takes a board's URL (e.g. https://www.ptt.cc/bbs/Gossiping/index.html) as input and returns a character vector of length 2.

get_index_info takes a board's index URL as input and extract the content of the page into a data frame with 6 variables.

Usage

1
2
3
get_index_url(board_url)

get_index_info(board_url)

Arguments

board_url

Character. A board's index page URL.

Value

get_index_url returns a char vector of length 2. The first element is a number, and the second is a URL.

get_index_info returns a data frame with n rows and 6 variables, where n is the number of post links on an index page.

See Also

extr_post_category


liao961120/pttR documentation built on Dec. 16, 2019, 2:19 a.m.