knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
Download and tidy time series data from the Australian Bureau of Statistics.
This package is still in active development, please use with caution.
getabs will soon be merged with the package readabs and the name will change to readabs.
I'd welcome Github issues containing error reports or feature requests. Alternatively you can email me at mattcowgill at gmail.
You can install getabs from github with:
# if you don't have devtools installed, first run: # install.packages("devtools") devtools::install_github("mattcowgill/getabs")
Get all tables from the latest release of ABS 3101.0 (Australian Demographic Statistics) and create a dataframe called demog
containing all the data from this catalogue number.^[Note that running read_abs()
will save ABS time series spreadsheets to your disk; by default these go in a data/ABS subdirectory of your working directory. Modify this using the path argument of read_abs()
.]
library(getabs) demog <- read_abs("3101.0")
Now you can use your downloaded and tidied data to make a graph. You'll first need to filter the data frame so it only contains the series of interest. The series
column reflects the series names in the first row of ABS time series; these can be long and unwieldy.
library(tidyverse) demog %>% filter(series == "Estimated Resident Population (ERP) ; Australia ;") %>% ggplot(aes(x = date, y = value)) + geom_line() + theme_minimal() + labs(title = "There's more of us than there used to be", subtitle = "Estimated resident population of Australia over time (thousands)", caption = "Source: ABS 3101.0") + theme(axis.title = element_blank() )
Note that different time series can share the same series name, as reflected in the series
column of your data frame. For example, there are multiple series named "Estimated Resident Population ; Persons ;". Some of these series refer to the whole country; some refer to individual states. In this particular dataset, the ABS splits states and national totals into different tables, with identically-named columns.
You can filter by table_no
as well as series
to ensure you get the series you want. Here's one way you could filter the data to give you the distribution of the national population by age as at the latest data collection, and then graph it.^[Future versions of the package will have the table titles as another column in your data frame, which will make this filtering process easier and require less frequent reference to the spreadsheets or ABS website.]
age_distrib <- demog %>% filter(grepl("Estimated Resident Population ; Persons ;", series), # In this case we only want to retain rows where the series contains a digit str_detect(series, "\\d"), # We only want the latest date date == max(date), # We only want data from table 59, which refers to the whole country table_no == "3101059") %>% mutate(age = parse_number(series)) age_distrib %>% ggplot(aes(x = age, y = value)) + geom_col(col = "white") + theme_minimal() + scale_y_continuous(labels = scales::number) + labs(title = "People in their 30s rule Australia", subtitle = paste0("Distribution of Australian residents by age, ", lubridate::year(age_distrib$date[1])), caption = "Source: ABS 3101.0") + theme(axis.title = element_blank() )
Another way to filter your data is by using the unique ABS time series identifier. Every ABS time series has one; it's in the tenth row of any spreadsheet containing ABS time series and looks something like "A2158920J". The time series identifier is stored in the series_id
column of your data frame.
To graph time series "A2158920J", which is the estimated population of 0 year old males in Australia, you can just filter your data frame like this:
demog %>% filter(series_id == "A2158920J") %>% ggplot(aes(x = date, y = value)) + geom_line() + theme_minimal() + labs(title = "Hello little babies!", subtitle = "Estimated resident population of 0 year old males over time", caption = "Source: ABS 3101.0") + theme(axis.title = element_blank() )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.