Introduction to the raustats package

knitr::opts_chunk$set(collapse=TRUE, comment="#>")


The raustats package allows researchers to quickly search and download selected Australian Bureau of Statistics (ABS) and Reserve Bank of Australia (RBA) data in a programmatic and reproducible fashion. This facilitates seamless integration of ABS and/or RBA data into analysis projects, and enables automatic update to the latest available data.

Australian Bureau of Statistics

The Australian Bureau of Statistics (ABS) is Australia’s national statistical agency, providing trusted official statistics on a wide range of economic, social, population and environmental matters of importance to Australia. Key ABS statistical collections include:

Reserve Bank of Australia

The Reserve Bank of Australia (RBA) is Australia's central bank. In addition to its legislative responsibilities, it collects and publishes statistics on money, credit, the Australian banking systems and other relevant economic metrics. Key RBA statistics include:

ABS & RBA data availability

The ABS and RBA currently make most, or all, of their statistics primarily available through Excel- and/or CSV-format spreadsheets. The former typically require some additional re-formatting to produce well-formatted (tidy) data.

The functions in this package help facilitate access to ABS and RBA statistics through R, returning data in long-format (tidy-like) tables.

The ABS is also developing a data access API: ABS.Stat, which is currently a Beta release and provides access to a subset of all ABS statistics. This package include functions to search and download data via the ABS.Stat API. Please note that as the API is still in development, changes in the API back-end may break the functions in this package.

Main features of the raustats package:

Quick-start guide

This section provides a quick-start guide on how to download ABS and RBA statistics.

First, load the library:


Downloading ABS Catalogue Statistics

The abs_cat_stats function downloads ABS statistics by ABS Catalogue number. For example, the latest Consumer Price Index (CPI) data set (ABS Catalogue no. 6401.0) can be downloaded with the following call:

cpi_all <- abs_cat_stats("6401.0")

To download the latest CPI data reported in Table 1 only (ABS groups Tables 1 and 2 together), simply supply a regular expression to the tables argument:

cpi <- abs_cat_stats("6401.0", tables="Table.+1\\D")

Alternatively, tables also accepts a regular expression that matches the table name as specified on the ABS data access page, as follows:

cpi <- abs_cat_stats("6401.0", tables="CPI: All Groups, Index Numbers and Percentage Change")

Statistics from previous releases can be accessed using the releases argument. A more detailed explanation of the abs_cat_stats function and further examples are provided below.

Downloading ABS data via ABS.Stat

The abs_stats function is used to download ABS statistics via the ABS.Stat API. For example, to download the latest CPI data series for 'All Groups' changes in prices across Australia and each of the eight capital cities, simply call:

cpi_api <- abs_stats("CPI", filter=list(MEASURE=1, REGION=c(1:8,50),
                                        INDEX=10001, TSEST=10, FREQUENCY="Q"))

The package includes additional functions to search for datasets and data series available on ABS.Stat. These are documented further below.

Downloading RBA data

The rba_stats function downloads statistics available on the RBA website. For example, the latest statistics covering the RBA's assets and liabilities (RBA Statistical Table A1) can be downloaded with the following call:

rba_bs <- rba_stats("A1")

A more detailed explanation of rba_stats and other RBA data access functions is provided below.

ABS statistics access functions

ABS Catalogue statistics functions {#abs-cat-stats-functions}

The ABS Catalogue statistics functions are split into core functions:

and several helper functions:

The helper functions are called by the core functions and should generally not need to be called directly by users---though there are some cases where these functions may be useful.

Finding available ABS Catalogue statistics

The ABS does not provide a consolidated searchable list of all current statistical collections available through the ABS Catalogue. A text search facility is available on the ABS website:, but this package does not currently provide any functionality to access this facility. Instead, the abs_cat_cachelist data set contained in this package lists the more common ABS Catalogue statistics.

Accessing ABS Catalogue statistics with abs_cat_stats

As shown in the Quick Start section, the abs_cat_stats function provides easy access to ABS statistics by ABS Catalogue Number. The following examples demonstrate typical uses of the function and the various function arguments.

The simplest use of the abs_cat_stats function is to download all tables available in a specified ABS Catalogue series. The Quick Start illustrated how to download all CPI tables. The following example downloads the latest quarterly national accounts statistics (ABS Catalogue no. 5206.0):

ana_q <- abs_cat_stats(cat_no = "5206.0")

The function returns a long-format (tidy) table with the following columns:

## Latest quarterly national accounts

The tables argument allows users to select the set of catalogue tables to be downloaded by specifying a regular expression to pattern match the ABS table names, as specified on the ABS web page for the specified Catalogue number---by default (tables = "all") the function automatically downloads all available tables from the specified catalogue number. The following sample code downloads Tables 1 and 2 from Catalogue no. 5206.0.

ana_q <- abs_cat_stats(cat_no = "5206.0", tables=c("^Table 1\\D", "^Table 2\\D"))

The same result may be achieved by specifying one or more regular expressions matching one or more table names. For example:

ana_q <- abs_cat_stats(cat_no = "5206.0",
                       tables=c(".*Key National Accounts Aggregates",
                                ".*Expenditure on Gross Domestic Product (GDP), Chain volume measures"))

The releases argument enables users to download data from a specified release. By default, the function downloads the latest available data (i.e. releases="Latest"). The format is a date object or character string specifying the month and year of release. For example, the following sample code downloads Table 1 from the December 2017 release of the quarterly national accounts.

ana_2017Q4 <- abs_cat_stats(cat_no="5206.0", tables="Table 1", releases="Dec 2017")
## or
ana_2017Q4 <- abs_cat_stats(cat_no="5206.0", tables="Table 1", releases=as.Date("2017-12-01"))

The releases argument accepts multiple elements, as per the following example, which downloads Table 1 from each of the December 2016 and 2017 quarter national accounts:

ana_Q4 <- abs_cat_stats(cat_no="5206.0", tables="Table 1", releases=c("Dec 2017","Dec 2016"))

The abs_cat_stats function is designed to download both ABS time series spreadsheets (types="tss") and cross-section spreadsheets (types="css")---'Data Cubes' in ABS terminology. ABS time series spreadsheets generally have a standard format, with a single column for each series, several header rows containing series metadata and a single row with the unique series identifier. ABS cross-section spreadsheet formats vary, depending on the number of dimensions (categories) available in the data set. These tables typically have multiple uniquely-identifying header rows.

Presently, the abs_cat_stats only includes functionality to download and process time series spreadsheets---functionality to handle ABS Data Cubes (cross-section spreadsheets, types="css") is planned to be added in future versions.

In the meantime, it is possible to download data cubes using a sequence of abs_cat_tables--abs_cat_download--abs_cat_unzip and piping the result into a read_excel function call. The following example downloads ABS labour force table: LM1 - Labour force status by Age, Greater Capital City and Rest of State (ASGS), Marital status and Sex:

lf_age <- abs_cat_tables("6291.0.55.001", include_urls=TRUE) %>%
  filter(grepl("LM1.+", item_name)) %>%
  .$path_zip %>%
  abs_cat_download %>%
  abs_cat_unzip %>%
  read_excel(sheet="Data 1", skip=3)

Finding available ABS Catalogue tables with abs_cat_tables

The abs_cat_tables function returns a list of all tables for one or more specified ABS Catalogue numbers. It can be used to identify the relevant table(s) to download. The following two examples return all available tables for the latest quarterly national accounts (5206.0) and CPI (6401.0), respectively.

ana_tables <- abs_cat_tables(cat_no="5206.0")
## CPI
cpi_tables <- abs_cat_tables(cat_no="6401.0")

The abs_cat_tables also has three additional arguments: releases, types and include_urls. The releases argument returns the list of downloadable tables from the specified release---by default releases="Latest". Lists of available tables from earlier releases can be obtained by specifying the month and year of release, e.g.releases = "Jun 2017". The types argument enables users to specify which file types to include. Options are 'tss' -- ABS Time Series Spreadsheets, 'css' -- ABS Data Cubes, and 'pub' -- ABS Publications. The default is types = c('tss', 'css'). The include_urls argument specifies whether or not to include the URLs of available data tables in the returned results. The default is include_urls=FALSE.

The following example, returns all quarterly national accounts tables in the September and December 2017 quarter releases, and includes the table URLs.

ana_tables <- abs_cat_tables(cat_no="5206.0", releases=c("Sep 2017", "Dec 2017"),

And the following example illustrates use of the abs_cat_tables to return all available downloadable Data Cubes for a non-time series collection---the Australian Statistical Geography Standard (ASGS) main structure classification and digital boundaries (Catalogue no. 1270.0.55.001).

asgs_files <- abs_cat_tables(cat_no="1270.0.55.001", types="css", include_urls=TRUE)

Listing all available Catalogue releases with abs_cat_releases

The abs_cat_releases function returns a table of all available past releases for a specified ABS Catalogue number. If include_urls=TRUE (default is FALSE), the function includes the full URL for each of the available releases. The default is include_urls=FALSE. The following examples return all available releases for the quarterly national accounts (5206.0) and CPI (6401.0), respectively.

ana_releases <- abs_cat_releases(cat_no="5206.0")
## CPI
cpi_releases <- abs_cat_releases(cat_no="6401.0", include_urls=TRUE)

Other ABS Catalogue helper functions

As already noted, there are several ABS Catalogue helper functions that are called by abs_cat_stats and abs_cat_tables---that download and parse the ABS Catalogue table files. The main ones are:

The following examples illustrate the use of these functions.

The abs_cat_download function downloads and saves ABS Catalogue tables from a supplied URL. It is called inside the abs_cat_stats and can be used directly to download one or more ABS Catalogue table files. It is most usefully used in conjunction with the abs_cat_tables function, as follows:

tables <- abs_cat_tables("5206.0", releases="Latest", include_urls=TRUE)
downloaded_tables <- abs_cat_download(tables$path_xls, exdir=tempdir())

The abs_cat_unzip function extracts Excel files from compressed ABS zip archives (see example below). It uses the utils::unzip function (using some standard file locations). There are two arguments: files and exdir which have a similar meaning to the utils::unzip equivalent arguments. By default exdir = tempdir().

extracted_files <- abs_cat_unzip(downloaded_tables)

The abs_read_tss function extracts data from standard-formatted ABS Catalogue time series spreadsheets and returns it as a long-format (tidy) data frame. The next example shows use of the function to read Table 1 from the national accounts (Catalogue 5206.0).

ana_q <- abs_read_tss(extracted_files)

ABS.Stat statistics access functions {#abs-api-stats-functions}

The raustats package also includes a range of functions to list, search and download data sets and statistics available through ABS.Stat API. The following subsections outline the key functions.

Finding available data with abs_datasets

The abs_datasets function returns a list of all datasets available through ABS.Stat. The function has two arguments: lang (default is English: lang="en") and include_notes (default: include_notes=FALSE). The following example shows the results with notes included.

datasets <- abs_datasets()

Cached list of available ABS.Stat datasets abs_cachelist

For performance, a cached list of datasets available through the ABS.Stat API is provided in the abs_cachelist data set included with raustats. abs_cachelist is the default source used in abs_search() and abs_stats() to find matching ABS datasets.

By default, abs_cachelist is in English. To search indicators in a different language, you can download an updated copy of abs_cachelist using abs_datasets() ans specifying a different language.

Checking dataset dimensions with abs_dimensions()

The abs_dimensions() functions lists the name of all available dimensions and the respective dimension type. Typical dimension types are: 'Dimension', 'TimeDimension' and 'Attribute'. 'Dimension' attributes are used in the filter argument of abs_stats function. The following example lists the data dimensions of the 'CPI' dataset.


A list of all available dimension codes and descriptions for a particular dataset can be viewed by selecting the relevant dataset from abs_cachelist or an updated cache list returned by abs_cache.


Search available data with abs_search()

The abs_search function essentially has two modes of operation:

i) 'dataset search' mode, and ii) 'indicator search' mode.

In dataset search mode, the function searches and returns datasets matching the specified regular expression. The following examples demonstrate use of the function to find ABS datasets relating to the CPI and labour force. The code_only argument specifies whether the function returns all information or just the matching dataset identifiers.

abs_search("CPI|consumer price index")

abs_search("CPI|consumer price index", code_only=TRUE)
abs_search("labour force")

abs_search("^labour force$")

In indicator search mode, the function searches through all dimensions of a specific dataset and returns a list of dimensions and dimension contents matching all the provided regular expressions. The following examples demonstrates use of the function to find indicators within the CPI data set.

abs_search("All groups", dataset="CPI")
abs_search(c("All groups CPI","Sydney"), dataset="CPI")

If code_only=TRUE, the indicator search function returns only codes for each matching dimension. This, in turn, can be used directly as input to the filter argument of the abs_stats function. The following two examples returns dimension codes for datasets matching "All groups CPI" (example 1) and matching both "All groups CPI" and "Sydney" (example 2).

abs_search("All groups CPI", dataset="CPI", code_only=TRUE)
abs_search(c("All groups CPI","Sydney"), dataset="CPI", code_only=TRUE)

Downloading data with abs_stats()

The abs_stats() function returns data from specified datasets available via the ABS.Stat API. The following section outlines typical use of the abs_stats() function, and also describes each of the core function arguments.

The following example downloads original All groups CPI index numbers for each of the eight Australian state and territory capital cities and also the average for all capital cities.

cpi <- abs_stats(dataset="CPI", filter=list(MEASURE=1, REGION=c(1:8,50),
                                            INDEX=10001, TSEST=10, FREQUENCY="Q"))

The filter conditions are:


The filter argument can also be set equal to "all", in which case the function will attempt to download all observations available for the specified dataset.

ABS.Stat query download constraints

If the data request is large it may breach the ABS.Stat API query length, record and/or session time constraints. ABS.Stat has a query string length limit (maximum of 1000 characters) on URL queries, a record return limit (1 million records) and session time limits (maximum 10-minute session time limit).^[See the ABS.Stat Web Services User Guide FAQ for further details.] Queries that breach these limits will need to be re-specified as multiple separate calls to obtain the required data.

For example, the following abs_stats function call, attempts to download all series available for the CPI dataset, but the specified query length (1191 characters) exceeds maximum request URL character limit.

cpi <- abs_stats(dataset="CPI", filter="all")

By default, abs_stats checks whether the query string length and the estimated number of records to be returned and will halt execution if the query breaches any of these conditions. Setting the enforce_api_limits = FALSE (default: TRUE) will ignore these checks and submit the query to the ABS.Stat API anyway---though this is not recommended.

cpi <- abs_stats(dataset="CPI", filter="all", enforce_api_limits=FALSE)

Setting the return_url = TRUE (default: FALSE) will return the RESTful URL query string, but does not submit the query to the ABS.Stat API, see the following example function call and output.

abs_stats(dataset="CPI", filter=list(MEASURE=1, REGION=c(1:8,50),
                                     INDEX=10001, TSEST=10, FREQUENCY="Q"),

The abs_search function can be used to specify the filter. For example, the following code block produces the same filter list, specified in the previous example, and can subsequently be supplied to the abs_stats filter argument.

filter_lst <- abs_search(c("Index numbers", "All groups",
                           "Sydney|Melbourne|Brisbane|Adelaide|Perth|Hobart|Darwin|Canberra|capital cities",
                           "Original", "Quarterly"),
                         dataset="CPI", code_only=TRUE)
cpi <- abs_stats("CPI", filter = filter_lst)

Splitting an ABS filter list into multiple short queries

The purrr package functions: map and cross can be used to split a long query into a list of several shorter length queries to submit to ABS.Stat. For example, the following code block first uses abs_search to create a filter list to download estimated resident population (ERP), by state/territory, age (year) and sex, for June 2017, and splits the filter into groups no longer than 26 elements in length (purrr::map and split), and combines them into complete list (purrr::cross). The final step imports the data by iterating over the filter list (ds_filter) using lapply and combining the results into a data frame using bind_rows.

## ERP dataset ID
abs_ds <- abs_search(pattern="quarterly.*population.*estimates") %>%
  select(id) %>% unlist;

## Create filter 
ds_filter <- abs_search(pattern=c("Estimated Resident Population", "Males|Females|Persons",
                                  "^\\d{1,3}$", "Jun-2017"),
                        dataset=abs_ds, code_only=TRUE) %>%
  map(~ .x %>% split(., ceiling(seq_along(.)/26))) %>%

## Download Jun 2017 ERP
erp_st_age_sex_2017 <-
                     start_date="2017-Q2", end_date="2017-Q2",
         ) %>%

Other ABS.Stat query arguments

Users can also limit the date range to return by specifying one or bothstart_date and end_date arguments. These accept either a numeric or character arguments---if numeric it must be a four-digit year. If a character string it can be either a monthly, quarterly, half-year or financial year as formatted as follows: month -- '2016-M01', quarter -- '2016-Q1', half-year -- '2016-B2', financial year -- '2016-17'. The following example returns all CPI observations between September 2015 and June 2018.

cpi <- abs_stats(dataset="CPI", filter=filter_lst,
                 start_date = "2015-Q3", end_date = "2018-Q2")

The other arguments dimensionAtObservation and detail provide refinements to the URL query. These need not be modified by the user---the defaults are: dimensionAtObservation='AllDimensions and detail='Full'.

RBA statistics access functions {#rba-stats-functions}

Finding available RBA data tables with rba_table_cache

The rba_table_cache function returns a dataset of all available RBA statistical tables. The function scans the RBA website and returns a list of all Statistical tables, Historical data tables and Discontinued data tables. (The rba_cachelist data set included in this package contains a pre-saved list of all available RBA statistical tables.)

The dataset has four columns:

The rba_table_cache function has no arguments. The following example shows use of the function and the returned output.

rba_cache <- rba_table_cache()

Search available data with rba_search()

The rba_search function scans the RBA table cache for statistical tables matching a regular expression supplied to pattern. The fields argument specifies the fields in the RBA table cache to search---by default the function searches the table_code and table_name fields only. The argument allows for case sensitive regular expression matching---the default is case insensitive matching ( The update_cache argument specifies whether the function uses the list of RBA tables supplied with the package rba_cachelist (update_cache=FALSE, the default) or to update list of tables from the RBA website (update_cache=TRUE).

rba_search(pattern = "Liabilities and Assets")

Accessing RBA statistical tables with rba_stats()

As previously outlined in the Quick Start section, the rba_stats() function provides easy access to RBA statistical tables. The rba_stats() function has three mutually-exclusive table selection arguments: table_no, pattern, and url, for selecting RBA statistical tables. Specifying table_no selects tables matching the specified RBA table number, e.g. A1, B1, B11.1. Specifying pattern selects all RBA tables matching the regular expression specified in pattern. The url argument can be used to specify one or more valid RBA statistical table URLs.

The function returns a long-format (tidy) table with the following columns:

The following examples demonstrate typical uses of the function and the various function arguments. The first example downloads RBA Statistical Table A1 -- Liabilities and Assets using table_no.

rba_a1 <- rba_stats(table_no = "A1")

The second example downloads the same tables using the pattern argument to download tables matching the regular expression: 'Liabilities and Assets.+A1'.

rba_a1 <- rba_stats(pattern = "Liabilities and Assets.+Summary")

And the third example downloads the same tables using the relevant URLs. The example presented below first returns a list of all RBA tables matching the supplied regular expression ('Liabilities and Assets.+A1') and then uses the returned URLs to download each data set.

a1_tables <- rba_search(pattern = "Liabilities and Assets.+Summary")
rba_a1 <- rba_stats(url = a1_tables$url)

Again, the update_cache argument specifies whether the function uses the list of RBA tables supplied with the package with the package rba_cachelist (the default) or to update list of tables from the RBA website. The rba_stats function also optionally accepts rba_search arguments: fields and, for informing pattern matching.

At present, the rba_stats function only handles Statistical tables, which have a consistently structured format comprising full metadata and observations. Functionality to handle Historical data tables and Discontinued data tables are yet to be added.

Other RBA helper functions

There are also two RBA helper functions that are called by rba_stats---that download and parse the RBA statistical tables. These additional functions should not ordinarily need to be called directly, but there may be situations in which users wish to access these functions directly. The functions are:

The following examples illustrate the use of these functions.

The rba_download function downloads and saves RBA tables from a supplied URL, and returns a character vector of the saved local filenames. It is called inside the rba_stats function and can be used to directly download one or more RBA statistical table files. It is most usefully used in conjunction with the rba_search function.

a1_tables <- rba_search(pattern = "Liabilities and Assets.+Summary")
downloaded_tables <- rba_file_download(a1_tables$url, exdir=tempdir())

The rba_read_tss function extracts data from standard-format RBA statistical tables and returns it as a long-format (tidy) data frame. It is also called by the rba_stats function. The following call will extract data from the previously downloaded tables.

a1_data <- rba_read_tss(downloaded_tables)

Using data returned by raustats

Statistics returned by the raustats package functions are generally in long (tidy) format data frames. This provides for easy integration with packages like ggplot2, tidyr and dplyr. The following example illustrates how data downloaded using the raustats can be easily transformed and plotted. This example uses data from ABS' Private New Capital Expenditure and Expected Expenditure (ABS Catalogue no. 5265.0).

First download selected tables from Catalogue no. 5265.0.

capex_q <-
                tables=c("Actual Expenditure by Type of Asset and Industry - Current Prices",
                         "Actual Expenditure, By Type of Industry - Chain Volume Measures",
                         "Actual and Expected Capital Expenditure by Industry.+:Current Prices"))

Then add a new variable denoting Australian state/territory.

## Add state/territory variable
capex_q <- capex_q %>%
  mutate(state = sub(sprintf(".*(%s).*",
                             paste(c("New South Wales","Victoria","Queensland","South Australia",
                                     "Western Australia","Tasmania","Northern Territory",
                                     "Australian Capital Territory","Total \\(State\\)"),
                     "\\1", data_item_description,

Finally, plot quarterly time series mining sector capital expenditure (at current prices) by Australian state and territory using ggplot.

## Filter mining capital expenditure
capex_q_min <- capex_q %>%
  filter(grepl("mining", data_item_description, %>%
  filter(grepl("actual", data_item_description, %>%
  filter(grepl("current price", data_item_description, %>%
  filter(grepl("Total \\(Type of Asset.+\\)", data_item_description,

ggplot(data=capex_q_min) +
  geom_line(aes(x=date, y=value/10^3, colour=state)) +
  scale_x_date(date_labels="%b\n%Y") +
  scale_y_continuous(limits=c(0, NA)) +
  labs(title="Australian mining sector capital expenditure, by state",
       y="Capital expenditure ($ billion)", x=NULL) +
  guides(colour = guide_legend(title=NULL)) + 
  theme(plot.title = element_text(hjust=0.5), = "horizontal",
        legend.position = "bottom",
        axis.text.x=element_text(angle=0, size=8))

Unresolved issues

There are a few behaviours of the ABS.Stat API that may help explain any unexpected results. As the ABS.Stat API is still in Beta release and subject to revision, some of these issues will be addressed in future versions of the raustats package, as ABS.Stat transitions towards a stable release version.

Searching in other languages

The ABS.Stat API (Beta version) includes scope to cater to multiple languages. However, at the time of writing, French appears to be the only other language included in the data sets. Moreover, the text for many records that are denoted as French appear to be in English. The ABS API calling functions included in this library allow users to specify their preferred language, but this functionality has been little tested to date.

Query, data and session constraints

As already noted, the ABS.Stat API has a query string length limit (1000 characters) and record return limit (1 million records). We have not tested how rigorously these limits are enforced. Accordingly, the abs_stats function includes the enforce_api_limits argument to catch these limits before the query is submitted. The argument also allows the user to override the limits and submit the query anyway. This argument may be deprecated in future versions.

The ABS.Stat API also has a 10-minute session time limit for users to download datasets via the SDMX service. The functions in raustats almost exclusively use the SDMX-JSON query interface, so it is not clear whether the session time limit applies. If time limits are an issue, the ABS advises users to submit multiple smaller requests to retrieve the required data.


Try the raustats package in your browser

Any scripts or data that you put into this service are public.

raustats documentation built on Jan. 11, 2020, 9:31 a.m.