DataClass: R6 Class containing non-dataset specific methods

Description Details Public fields Methods See Also

Description

A parent class containing non-dataset specific methods.

Details

All data sets have shared methods for extracting geographic codes, downloading, processing, and returning data. These functions are contained within this parent class and so are accessible by all data sets which inherit from here. Individual data sets can overwrite any functions or fields providing they define a method with the same name, and can be extended with additional functionality. See the individual method documentaion for further details.

Public fields

origin

the origin of the data source. For regional data sources this will usually be the name of the country.

data

Once initialised, a list of named data frames: raw (list of named raw data frames) clean (cleaned data) and processed (processed data). Data is accessed using $data.

supported_levels

A list of supported levels.

supported_region_names

A list of region names in order of level.

supported_region_codes

A list of region codes in order of level.

region_name

string Name for the region column, e.g. 'region'. This field is filled at initialisation with the region name for the specified level (supported_region_names$level).

code_name

string Name for the codes column, e.g. 'iso_3166_2' Filled at initialisation with the code name associated with the requested level (supported_region_codes$level).

codes_lookup

string or tibble Region codes for the target origin filled by origin specific codes in set_region_codes()

data_urls

List of named common and shared url links to raw data. Prefers shared if there is a name conflict.

common_data_urls

List of named links to raw data that are common across levels. The first entry should be named main.

level_data_urls

List of named links to raw data that are level specific. Any urls that share a name with a url from common_data_urls will be selected preferentially. Each top level list should be named after a supported level.

source_data_cols

existing columns within the raw data

level

target region level. This field is filled at initialisation using user inputs or defaults in $new()

data_name

string. The country name followed by the level. E.g. "Italy at level 1"

totals

Boolean. If TRUE, returns totalled data per region up to today's date. This field is filled at initialisation using user inputs or defaults in $new()

localise

Boolean. Should region names be localised. This field is filled at initialisation using user inputs or defaults in $new()

verbose

Boolean. Display information at various stages. This field is filled at initialisation. using user inputs or defaults in $new()

steps

Boolean. Keep data from each processing step. This field is filled at initialisation.using user inputs or defaults in $new()

target_regions

A character vector of regions to filter for. Used by the filter method.

process_fns

array, additional, user supplied functions to process the data.

filter_level

Character The level of the data to filter at. Defaults to the target level.

Methods

Public methods


Method set_region_codes()

Place holder for custom country specific function to load region codes.

Usage
DataClass$set_region_codes()

Method new()

Initialize function used by all DataClass objects. Set up the DataClass class with attributes set to input parameters. Should only be called by a DataClass class object.

Usage
DataClass$new(
  level = "1",
  filter_level,
  regions,
  totals = FALSE,
  localise = TRUE,
  verbose = TRUE,
  steps = FALSE,
  get = FALSE,
  process_fns
)
Arguments
level

A character string indicating the target administrative level of the data with the default being "1". Currently supported options are level 1 ("1) and level 2 ("2").

filter_level

A character string indicating the level to filter at. Defaults to the level of the data if not specified and if not otherwise defined in the class. Use get_available_datasets() for supported options by dataset.

regions

A character vector of target regions to be assigned to thetarget_regions field if present.

totals

Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region.

localise

Logical, defaults to TRUE. Should region names be localised.

verbose

Logical, defaults to TRUE. Should verbose processing

steps

Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list.

get

Logical, defaults to FALSE. Should the class get method be called (this will download, clean, and process data at initialisation).

process_fns

Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is set_negative_values_to_zero. if process_fns is not set (see process_fns field for all defaults). If you want to keep this when supplying your own processing functions remember to add it to your list also. If you feel you have created a cool processing function that others could benefit from please submit a Pull Request to our github repository and we will consider adding it to the package.


Method download()

Download raw data from data_urls, stores a named list of the data_url name and the corresponding raw data table in data$raw

Usage
DataClass$download()

Method download_JSON()

Download raw data from data_urls, stores a named list of the data_url name and the corresponding raw data table in data$raw. Designed as a drop-in replacement for download so it can be used in sub-classes.

Usage
DataClass$download_JSON()

Method clean()

Cleans raw data (corrects format, converts column types, etc). Works on raw data and so should be called after download() Calls the specific class specific cleaning method (clean_common) followed by level specific cleaning methods. clean_level_[1/2]. Cleaned data is stored in data$clean

Usage
DataClass$clean()

Method clean_common()

Cleaning methods that are common across a class. By default this method is empty as if any code is required it should be defined in a child class specific clean_common method.

Usage
DataClass$clean_common()

Method available_regions()

Show regions that are available to be used for filtering operations. Can only be called once clean() has been called. Filtering level is determined by checking the filter_level field.

Usage
DataClass$available_regions(level)
Arguments
level

A character string indicating the level to filter at. Defaults to using the filter_level field if not specified


Method filter()

Filter cleaned data for a specific region To be called after clean()

Usage
DataClass$filter(regions, level)
Arguments
regions

A character vector of target regions. Overrides the current class setting for target_regions.

level

Character The level of the data to filter at. Defaults to the lowest level in the data.


Method process()

Processes data by adding and calculating absent columns. Called on clean data (after clean()). Some countries may have data as new events (e.g. number of new cases for that day) whilst others have a running total up to that date. Processing calculates these based on what the data comes with via the functions region_dispatch() and process_internal(), which does the following:

Usage
DataClass$process(process_fns)
Arguments
process_fns

Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is set_negative_values_to_zero. if process_fns is not set (see process_fns field for all defaults).


Method get()

Get data related to the data class. This runs each distinct step in the workflow in order. Internally calls download(), clean(), filter() and process() download, clean, filter and process methods.

Usage
DataClass$get()

Method return()

Return data. Designed to be called after process() this uses the steps argument to return either a list of all the data preserved at each step or just the processed data. For most datasets a custom method should not be needed.

Usage
DataClass$return()

Method summary()

Create a table of summary information for the data set being processed.

Usage
DataClass$summary()
Returns

Returns a single row summary tibble containing the origin of the data source, class, level 1 and 2 region names, the type of data, the urls of the raw data and the columns present in the raw data.


Method test()

Run tests on a country class instance. Calling test() on a class instance runs tests with the settings in use. For example, if you set level = "1" and localise = FALSE the tests will be run on level 1 data which is not localised. Rather than downloading data for a test users can provide a path to a snapshot file of data to test instead. Tests are run on a clone of the class. This method calls generic tests for all country class objects. It also calls country specific tests which can be defined in an individual country class method called specific_tests(). The snapshots contain the first 1000 rows of data. For more details see the 'testing' vignette: vignette(testing).

Usage
DataClass$test(
  download = FALSE,
  snapshot_dir = paste0(tempdir(), "/snapshots"),
  all = FALSE,
  ...
)
Arguments
download

logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.

snapshot_dir

character_array the name of a directory to save the downloaded data or read from. If not defined a directory called 'snapshots' will be created in the temp directory. Snapshots are saved as rds files with the class name and level: e.g. Italy_level_1.rds.

all

logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.

...

Additional parameters to pass to specific_tests


Method clone()

The objects of this class are cloneable with this method.

Usage
DataClass$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Data interface functions CountryDataClass, get_available_datasets(), get_national_data(), get_regional_data(), initialise_dataclass()


covidregionaldata documentation built on Feb. 7, 2022, 9:07 a.m.