parse_raw_tweets_to_cascades: This function extracts cascades from a given jsonl file where...

View source: R/tweet.R

parse_raw_tweets_to_cascadesR Documentation

This function extracts cascades from a given jsonl file where each line is a tweet json object. Please refer to the Twitter developer documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

Description

This function extracts cascades from a given jsonl file where each line is a tweet json object. Please refer to the Twitter developer documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

Usage

parse_raw_tweets_to_cascades(
  paths,
  batch = 1e+05,
  cores = 1,
  output_path = NULL,
  keep_user = F,
  keep_absolute_time = F,
  keep_text = F,
  keep_retweet_count = F,
  progress = T,
  return_as_list = T,
  save_temp = F,
  keep_temp_files = T,
  api_version = 1
)

Arguments

paths

Full file paths to the tweets jsonl files

batch

Number of tweets to be read for processing at each iteration, choose the best number for your memory load. Defaults to at most 10000 tweets each iteration.

cores

Number of cores to be used for processing each batch in parallel.

output_path

If provided, the index.csv and data.csv files which define the cascaddes will be generated. In index.csv, each row is a cascade where events can be obtained from data.csv by corresponding indics (start_ind to end_ind). Defaults to NULL.

keep_user

Twitter user ids will be kept.

keep_absolute_time

Keep the absolute tweeting times.

keep_text

Keep the tweet text.

keep_retweet_count

Keep the retweet_count field.

progress

The progress will be reported if set to True (default)

return_as_list

If true then a list of cascades (data.frames) will be returned.

save_temp

If temporary files should be generated while processing. Processing can be resumed on failures.

keep_temp_files

If temporary files should be kept after index and data files generated.

api_version

Version of Twitter API used for collecting the tweets.

Value

If return_as_list is TRUE then a list of data.frames where each data.frame is a retweet cascade. Otherwise there will be no return.


behavioral-ds/evently documentation built on Feb. 3, 2023, 9:42 a.m.