aggregate_tweets: Execute the aggregation task

Description Usage Arguments Details Value See Also Examples

View source: R/aggregate.R

Description

Get all the tweets from the Twitter Standard Search API json files and the geolocated tweets json files obtained by calling (geotag_tweets) and store the results in the series folder as daily Rds files

Usage

1
2
3
4
aggregate_tweets(
  series = list("country_counts", "geolocated", "topwords"),
  tasks = get_tasks()
)

Arguments

series

List of series to aggregate, default: list("country_counts", "geolocated", "topwords")

tasks

Current tasks for reporting purposes, default: get_tasks()

Details

This function will write new aggregated series by launching a SPARK task of aggregating data collected from the Twitter Search API and geolocated from geotag tweets. By doing the following steps: - Identify the last aggregates date by looking into the series folder

- Look for date range of tweets collected since that day by looking at the stat json files produced by the search loop

- For each day that has to be updated a list of all geolocated and search files to load will be produced by looking at the stat files

- For each series passed as a parameter and for each date to update:

- a Spark task will be called that will deduplicate tweets for each topic, join them with geolocation information, and aggregate them to the required level and return to the standard output as json lines

- the result of this task is parsed using jsonlite and saved into RDS files in the series folder

A prerequisite to this function is that the search_loop must have already collected tweets in the search folder and that geotag_tweets has already run. Normally this function is not called directly by the user but from the detect_loop function.

Value

the list of tasks updated with aggregate messages

See Also

detect_loop

geotag_tweets

generate_alerts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if(FALSE){
   library(epitweetr)
   # setting up the data folder
   message('Please choose the epitweetr data directory')
   setup_config(file.choose())

   # aggregating all geolocated tweets collected since last aggregation for producing 
   # all time series
   aggregate_tweets()
}

epitweetr documentation built on April 9, 2021, 1:06 a.m.