get_aggregates: Getting already aggregated time series produced by...

Description Usage Arguments Details Value See Also Examples

View source: R/aggregate.R

Description

Read and returns the required aggregated dataset for the selected period and topics defined by the filter.

Usage

1
get_aggregates(dataset = "country_counts", cache = TRUE, filter = list())

Arguments

dataset

A character(1) vector with the name of the series to request, it must be one of 'country_counts', 'geolocated' or 'topwords', default: 'country_counts'

cache

Whether to use the cache for lookup and storing the returned dataframe, default: TRUE

filter

A named list defining the filter to apply on the requested series, default: list()

Details

This function will look in the 'series' folder, which contains Rds files per weekday and type of series. It will parse the names of file and folders to limit the files to be read. Then it will apply the filters on each dataset for finally joining the matching results in a single dataframe. If no filter is provided all data series are returned, which can end up with millions of rows depending on the time series. To limit by period, the filter list must have an element 'period' containing a date vector or list with two dates representing the start and end of the request.

To limit by topic, the filter list must have an element 'topic' containing a non-empty character vector or list with the names of the topics to return.

The available time series are:

The returned dataset can be cached for further calls if requested. Only one dataset per series is cached.

Value

A dataframe containing the requested series for the requested period

See Also

detect_loop geotag_tweets

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
if(FALSE){
   message('Please choose the epitweetr data directory')
   setup_config(file.choose())
   # Getting all country tweets between 2020-jan-10 and 2020-jan-31 for all topics
   df <- get_aggregates(
     dataset = "country_counts", 
     filter = list(period = c("2020-01-10", "2020-01-31"))
   )

   # Getting all country tweets for the topic dengue
   df <- get_aggregates(dataset = "country_counts", filter = list(topic = "dengue"))

   # Getting all country tweets between 2020-jan-10 and 2020-jan-31 for the topic dengue
    df <- get_aggregates(
        dataset = "country_counts",
         filter = list(topic = "dengue", period = c("2020-01-10", "2020-01-31"))
    )
}

epitweetr documentation built on April 9, 2021, 1:06 a.m.