knitr::opts_chunk$set(echo = TRUE)
# Load all packages 
library(dplyr)
library(stringr)
library(locatr)

The locate functions work much like unpivotr::behead. The key difference is that, rather than progressively removing headers, locate functions annotate the tidyxl data frame with .direction, .header_group and .value columns, leaving reshaping to a final function call.

Minimal example : locate

Here's a minimal example involving a table with two row headers and two column headers.

knitr::include_graphics("https://unpivotr.s3.amazonaws.com/pivot-annotations.png")

The first step is to locate the data cells with the locate_data function. Calling locate_data and providing an expression that filters for data cells sends these cells to an attribute named data_cells.

locatr_example("worked-examples.xlsx") %>% 
  xlsx_cells_fmt(sheets = "pivot-annotations") %>%
  locate_data(data_type == "numeric") %>% 
  attr("data_cells")

plot_cells produces a plot that indicates which cells are now labelled as data.

locatr_example("worked-examples.xlsx") %>% 
  xlsx_cells_fmt(sheets = "pivot-annotations") %>%
  locate_data(data_type == "numeric") %>% 
  plot_cells()

Once the data cells are identified, we can add header information to the tidyxl data frame (including .direction, .header_group and .value columns) using the locate function. This function requires direction and variable names. Again, plot_cells can be called to check that data cells have been identified correctly.

Once all header have directions and names, migrate reshapes the tidyxl data frame into a tidy structure.

The gif below illustrate how direction informations is progressively added to the data frame.

knitr::include_graphics("https://unpivotr.s3.amazonaws.com/unpivotr-locate-demo-02.gif")

And below is the code used in the gif.

locatr::locatr_example("worked-examples.xlsx") %>%
  xlsx_cells_fmt(sheets = "pivot-annotations") %>%
  locate_data(data_type == "numeric") %>% 
  locate(direction = "WNW", name = subject_type) %>% 
  locate(direction = "W", name = subject) %>% 
  locate(direction = "NNW", name = gender) %>% 
  locate(direction = "N", name = name) %>% 
  migrate()

Conditional headers : locate_if

Sometimes not all headers in the same column or row belong to the same group. For example, in the table below, the row headers in column B represent a mix of subject type and subject name.

knitr::include_graphics("https://unpivotr.s3.amazonaws.com/pivot-hierarchy.png")

To deal with this we create a variable that represents the indenting of cells, and then use locate_if to selectively associate cells with directions and header groups.

locatr_example("worked-examples.xlsx") %>% 
  xlsx_cells_fmt(sheets = "pivot-hierarchy") %>%
  append_fmt(fmt_alignment_indent) %>%
  locate_data(data_type == "numeric") %>%
  locate_if(fmt_alignment_indent == 0, direction = "WNW", name = subject_type) %>% 
  locate_if(fmt_alignment_indent == 1, direction = "W", name = subject) %>% 
  locate(direction = "N", name = student) %>% 
  migrate()

A more concise syntax : locate_groups

We can deal with multiple headers differentiated by formatting more concisely using locate_groups. The .grouping argument allows us to indicate which formats differentiate headers. In this case, hierarchy is indicated by indenting, which can be accessed with the fmt_alignment_indent function. The .hook_if argument receives an expression with hook that indicates which header_groups are "WNW" rather than "N". The .hook_if_rev argument will switch directions from "N" to "WSW". Importantly the hook expression is passed into summarise so it needs to reduce columns to a single boolean value. This is the reason for using any in the example below.

locatr_example("worked-examples.xlsx") %>%
  xlsx_cells_fmt(sheets = "pivot-hierarchy") %>%
  append_fmt(fmt_alignment_indent) %>% 
  locate_data(data_type == "numeric") %>%
  locate_groups(direction = "W",
                .groupings = groupings(fmt_alignment_indent),
                .hook_if =     hook_if(any(fmt_alignment_indent == 0))) %>%
  locate(direction = "N", name = student) %>% 
  plot_cells()

To browse different aspects of formatting on which to separate headers, type fmt_ and tab

A more complicated example: Tidying new residential construction data from the US Census Bureau

Here's a more complicate table.

knitr::include_graphics(locatr_example(path = "newconst.png"))

We can tidy this table by:

annotated_df <- 
  locatr_example("newresconst.xlsx") %>% 
  xlsx_cells_fmt(sheets = "Table 1 - Permits") %>%
  append_fmt(fmt_font_bold) %>% 
  filter_fmt(row < min(row[str_detect(character,"RSE")],na.rm = TRUE)) %>% 
  locate_data(data_type == "numeric" & col > 1) %>%
  locate_groups(direction = "W", 
                .groupings = groupings(is.na(numeric)), 
                .hook_if = hook_if(any(data_type == "numeric"))) %>% 
  locate_groups(direction = "N", header_fill = "style")  

annotated_df %>% plot_cells()

annotated_df %>% migrate()


ianmoran11/locatr documentation built on June 14, 2020, 3:32 a.m.