Collect.youtube: Collect comments data for youtube videos

Description Usage Arguments Value Note Examples

View source: R/Collect.youtube.R

Description

This function collects public comments data for one or more youtube videos using the YouTube Data API v3 and structures the data into a dataframe with the class names "datasource" and "youtube".

Youtube has a quota unit system as a rate limit with most developers having either 10,000 or 1,000,000 units per day. Many read operations cost a base of 1 unit such as retrieving individual comments, plus 1 or 2 units for text snippets. Retrieving threads or top-level comments with text costs 3 units per request (maximum 100 comments per request). Using this function a video with 250 top-level comments and 10 of those having reply comments of up to 100 each, should cost (9 + 20) 29 quota units and return between 260 and 1260 total comments. There is currently a limit of 100 reply comments collected per top-level comment.

More information about the YouTube Data API v3 can be found here: https://developers.google.com/youtube/v3/getting-started

Usage

1
2
3
4
5
6
7
8
9
## S3 method for class 'youtube'
Collect(
  credential,
  videoIDs,
  verbose = FALSE,
  writeToFile = FALSE,
  maxComments = 1e+10,
  ...
)

Arguments

credential

A credential object generated from Authenticate with class name "youtube".

videoIDs

Character vector. Specifies one or more youtube video IDs. For example, if the video URL is https://www.youtube.com/watch?v=xxxxxxxxxxx then use videoIDs = c("xxxxxxxxxxx").

verbose

Logical. Output additional information about the data collection. Default is FALSE.

writeToFile

Logical. Write collected data to file. Default is FALSE.

maxComments

Numeric integer. Specifies how many top-level comments to collect from each video. This value does not consider replies to top-level comments. The total number of comments returned for a video will usually be greater than maxComments depending on the number of reply comments present.

...

Additional parameters passed to function. Not used in this method.

Value

A tibble object with class names "datasource" and "youtube".

Note

Due to specifications of the YouTube Data API it is currently not efficient to specify the exact number of comments to return from the API using maxComments parameter. The maxComments parameter is applied to top-level comments only and not the replies to these comments. As such the number of comments collected is usually greater than expected. For example, if maxComments is set to 10 and one of the videos 10 top-level comments has 5 reply comments then the total number of comments collected will be 15 for that video. Comments data for multiple youtube videos can be requested in a single operation, maxComments is applied to each individual video and not the combined total of comments.

To help extract video ids for videos the function GetYoutubeVideoIDs can be used. It accepts input of a vector or file containing video urls and creates a chracter vector suitable as input for the videoIDs parameter.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
# create a list of youtube video ids to collect on
videoIDs <- GetYoutubeVideoIDs(c("https://www.youtube.com/watch?v=xxxxxxxx", 
                                 "https://youtu.be/xxxxxxxx"))

# collect approximately 200 threads/comments for each youtube video
youtubeData <- youtubeAuth %>% 
  Collect(videoIDs = videoIDs, writeToFile = TRUE, verbose = FALSE, maxComments = 200)

## End(Not run)

vosonSML documentation built on July 18, 2020, 9:07 a.m.