scrape_thread_content: Scrape thread

Description Usage Arguments Value Examples

View source: R/scrape_thread_content.R

Description

Scrapes a certain thread

Usage

1
2
3
4
5
6
7
scrape_thread_content(
  suffix,
  export_csv = FALSE,
  folder_name = NULL,
  file_name = NULL,
  delay = TRUE
)

Arguments

suffix

A character string containing a thread's suffix (which can be obtained using get_thread_links()). Suffixes need to start with /.

export_csv

A logical vector. Defaults to FALSE. The function can automatically save the output in a csv file. If export_csv = TRUE , a csv file is exported. The output folder can be specified using the folder argument.

folder_name

A character string which specifies the name of the folder the output should be saved in. The folder's name is added to the path of the current working directory which can be obtained using getwd() and modified with setwd(). If nothing is specified and export_csv = TRUE, the function will export the csv file straight into the working directory.

file_name

A character string which specifies the name of the output file. It is not necessary to add '.csv'. If no file name is provided, file_name defaults to scrape_[YYYY-MM-DD].csv.

delay

A logical vector, defaults to TRUE. flashback.org's robots.txt-file asks for putting a five second delay between each iteration. You can deliberately ignore this by setting delay = FALSE. Note that THIS IS NOT RECOMMENDED!

Value

A tibble with the following columns: url contains the thread's URL suffix, date the date the posting was made on, time the time the posting was made at, author_name the respective author's user name, author_url the link to their profile (can be scraped using scrape_user_profile()), quoted_user the user name of the user that is quoted in a posting (NA if the posting does not contain a quote), posting the posting *as is*, i.e., with potential quotes, posting_wo_quote the posting with all quotes removed.

Examples

1
scrape_thread_content(suffix = "/t3145103", export_csv = TRUE, folder_name = "sandbox/results", file_name = "test", delay = FALSE)

fellennert/flashbackscrapR documentation built on Sept. 10, 2021, 4:15 p.m.