knitr::opts_chunk$set(echo = TRUE)
# Load all packages library(dplyr) library(stringr) library(locatr)
The locate functions work much like unpivotr::behead.
The key difference is that, rather than progressively removing headers, locate functions annotate the tidyxl data frame with .direction, .header_group and .value columns, leaving reshaping to a final function call.
locate
Here's a minimal example involving a table with two row headers and two column headers.
knitr::include_graphics("https://unpivotr.s3.amazonaws.com/pivot-annotations.png")
The first step is to locate the data cells with the locate_data
function. Calling locate_data
and providing an expression that filters for data cells sends these cells to an attribute named data_cells
.
locatr_example("worked-examples.xlsx") %>% xlsx_cells_fmt(sheets = "pivot-annotations") %>% locate_data(data_type == "numeric") %>% attr("data_cells")
plot_cells
produces a plot that indicates which cells are now labelled as data.
locatr_example("worked-examples.xlsx") %>% xlsx_cells_fmt(sheets = "pivot-annotations") %>% locate_data(data_type == "numeric") %>% plot_cells()
Once the data cells are identified, we can add header information to the tidyxl data frame (including .direction, .header_group and .value columns) using the locate
function. This function requires direction and variable names. Again, plot_cells
can be called to check that data cells have been identified correctly.
Once all header have directions and names, migrate
reshapes the tidyxl data frame into a tidy structure.
The gif below illustrate how direction informations is progressively added to the data frame.
knitr::include_graphics("https://unpivotr.s3.amazonaws.com/unpivotr-locate-demo-02.gif")
And below is the code used in the gif.
locatr::locatr_example("worked-examples.xlsx") %>% xlsx_cells_fmt(sheets = "pivot-annotations") %>% locate_data(data_type == "numeric") %>% locate(direction = "WNW", name = subject_type) %>% locate(direction = "W", name = subject) %>% locate(direction = "NNW", name = gender) %>% locate(direction = "N", name = name) %>% migrate()
locate_if
Sometimes not all headers in the same column or row belong to the same group. For example, in the table below, the row headers in column B represent a mix of subject type and subject name.
knitr::include_graphics("https://unpivotr.s3.amazonaws.com/pivot-hierarchy.png")
To deal with this we create a variable that represents the indenting of cells, and then use locate_if
to selectively associate cells with directions and header groups.
locatr_example("worked-examples.xlsx") %>% xlsx_cells_fmt(sheets = "pivot-hierarchy") %>% append_fmt(fmt_alignment_indent) %>% locate_data(data_type == "numeric") %>% locate_if(fmt_alignment_indent == 0, direction = "WNW", name = subject_type) %>% locate_if(fmt_alignment_indent == 1, direction = "W", name = subject) %>% locate(direction = "N", name = student) %>% migrate()
locate_groups
We can deal with multiple headers differentiated by formatting more concisely using locate_groups
. The .grouping
argument allows us to indicate which formats differentiate headers. In this case, hierarchy is indicated by indenting, which can be accessed with the fmt_alignment_indent
function.
The .hook_if
argument receives an expression with hook
that indicates which header_groups are "WNW" rather than "N". The .hook_if_rev
argument will switch directions from "N" to "WSW". Importantly the hook
expression is passed into summarise
so it needs to reduce columns to a single boolean value. This is the reason for using any
in the example below.
locatr_example("worked-examples.xlsx") %>% xlsx_cells_fmt(sheets = "pivot-hierarchy") %>% append_fmt(fmt_alignment_indent) %>% locate_data(data_type == "numeric") %>% locate_groups(direction = "W", .groupings = groupings(fmt_alignment_indent), .hook_if = hook_if(any(fmt_alignment_indent == 0))) %>% locate(direction = "N", name = student) %>% plot_cells()
To browse different aspects of formatting on which to separate headers, type fmt_
and tab
Here's a more complicate table.
knitr::include_graphics(locatr_example(path = "newconst.png"))
We can tidy this table by:
header_fill
argument to deal with merged cells.annotated_df <- locatr_example("newresconst.xlsx") %>% xlsx_cells_fmt(sheets = "Table 1 - Permits") %>% append_fmt(fmt_font_bold) %>% filter_fmt(row < min(row[str_detect(character,"RSE")],na.rm = TRUE)) %>% locate_data(data_type == "numeric" & col > 1) %>% locate_groups(direction = "W", .groupings = groupings(is.na(numeric)), .hook_if = hook_if(any(data_type == "numeric"))) %>% locate_groups(direction = "N", header_fill = "style") annotated_df %>% plot_cells() annotated_df %>% migrate()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.