knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
Warning: tidyABS is in early development. Please use for experimentation only.
The tidyABS
package converts ABS excel tables to tidy data frames. It uses information on cells' formatting (indenting, bolding, italics) and some rules-of-thumb to detect the structure of excel tables. It sometimes needs pointers from the user.
Note: tidyABS works with .xlsx files only.
The tidyABS package is not available on CRAN. It can be installed from github with the following script:
# install.packages("devtools") devtools::install_github("ianmoran11/tidyABS")
options(tibble.print_min = 3, tibble.print_max = 3) library(magick)
library(tidyABS) library(tidyverse)
Below is a short demonstration that tidies a table from the Australian Industry publication (Cat No. 8155.0).
tidy_aus_industry_df <- tidyABS_example("australian-industry.xlsx") %>% process_sheet(sheets = "Table_1") %>% assemble_table_components() tidy_aus_industry_df %>% str()
The tidyABS package contains several example files. Use the helper tidyABS_example()
function with no arguments to list these files:
tidyABS_example()
magick::image_read("australian-industry_table-1.png")
Above is the first sheet of an excel workbook in the Australian Industry publication. We can retrieve the path of this file using the tidyABS_example
function:
ai_path <- tidyABS_example("australian-industry.xlsx")
To process the sheet above, we pass the workbook file path to the process_sheet
function and identify the sheet we'd like to tidy.
ai_processed <- process_sheet(path = ai_path, sheets = "Table_1") ai_processed %>% str(1)
This produces a list of three data frames. They store the location and format information of row names (row_groups
), column names (col_groups
) and table data (tabledata
).
We can inspect the row and column names in ai_processed
using the inspect_table_components
function.
inspect_table_components(ai_processed)
We can use plot_table_components
to visualize how these groups are spatially layed out in the spreadsheet and see how tidyABS
will relate them to table values. Row names directly to the left of their data points should be labelled "W", and column names directly above should be labelled "N". (See the unpivotr package for more information.)
plot_table_components(ai_processed) + ylim(-30, 0)
Finally, we can assemble the components into a tidy data frame using assemble_table_components
.
assemble_table_components(ai_processed) %>% glimpse()
magick::image_read("environmental-economic-accounts_table-6-01.png")
Here's an example that requires some manual work, the Environmental-Economic Accounts. Let's retrieve the path of our example workbook and proces Table 6.1
:
eea_path <- tidyABS_example("environmental-economic-accounts.xlsx") eea_processed <- process_sheet(path = eea_path, sheets = "Table 6.1")
On visual inspection, we can see row_group_01
has been given a "W" orientation to the data, not "WNW".
plot_table_components(eea_processed)
Luckily, we can fix this with the change_direction
function.
eea_processed <- eea_processed %>% change_direction("row_group_01", "WNW")
Plotting the table confirms the direction has been corrected.
plot_table_components(eea_processed)
Finally, we can assemble the components into a tidy dataframe using assemble_table_components
.
assemble_table_components(eea_processed) %>% glimpse()
magick::image_read("consumer-price-index_data01.png")
Time series data require the user to manually identify the inner table cells. This is because some of the column names are numeric --- for example, collection month.
I recommend using the readABS
package for this. It was created for importing ABS time series data and does not require manual identifcation of table cells.
That said, here's how you would process this table with tidyABS
.
cpi_path <- tidyABS_example("consumer-price-index.xlsx")
We need to identify the inner table cells using the manual_value_references
argument.This argument takes a vector of addresses, identifying the inner corners of the table.
cpi_processed <- process_sheet( path = cpi_path, sheets = "Data1", manual_value_references = "B11:AB292" )
Here is the resulting data frame.
assemble_table_components(cpi_processed) %>% glimpse()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.