knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/"
)

unpivotr

Travis-CI Build Status AppVeyor Build Status Cran Status Cran Downloads codecov R-CMD-check

unpivotr deals with non-tabular data, especially from spreadsheets. Use unpivotr when your source data has any of these 'features':

If that list makes your blood boil, you'll enjoy the function names.

More positive, corrective functions:

Make cells tidy

Unpivotr uses data where each cells is represented by one row in a dataframe. Like this.

Gif of tidyxl converting cells into a tidy representation of one row per cell

What can you do with tidy cells? The best places to start are:

Otherwise the basic idea is:

  1. Read the data with a specialist tool.
  2. For spreadsheets, use tidyxl.
  3. For plain text files, you might soon be able to use readr, but for now you'll have to install a pull-request on that package with devtools::install_github("tidyverse/readr#760").
  4. For tables in html pages, use unpivotr::tidy_html()
  5. For data frames, use unpivotr::as_cells() -- this should be a last resort, because by the time the data is in a conventional data frame, it is often too late -- formatting has been lost, and most data types have been coerced to strings.
  6. Either behead() straight away, else dplyr::filter() separately for the header cells and the data cells, and then recombine with enhead().
  7. spatter() so that each column has one data type.
library(unpivotr)
library(tidyverse)
x <- purpose$`up-left left-up`
x # A pivot table in a conventional data frame.  Four levels of headers, in two
  # rows and two columns.

y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y

rectify(y) # useful for reviewing the melted form as though in a spreadsheet

y %>%
  behead("up-left", "sex") %>%               # Strip headers
  behead("up", "life-satisfication") %>%  # one
  behead("left-up", "qualification") %>%     # by
  behead("left", "age-band") %>%            # one.
  select(-row, -col, -data_type, count = chr) %>% # cleanup
  mutate(count = as.integer(count))

Note the compass directions in the code above, which hint to behead() where to find the header cell for each data cell.

Installation

# install.packages("devtools") # If you don't already have devtools
devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)

The version 0.4.0 release had somee breaking changes. See NEWS.md for details. The previous version can be installed as follow:

devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")

Similar projects

unpivotr is inspired by Databaker, a collaboration between the United Kingdom Office of National Statistics and The Sensible Code Company. unpivotr.

jailbreaker attempts to extract non-tabular data from spreadsheets into tabular structures automatically via some clever algorithms. unpivotr differs by being less magic, and equipping you to express what you want to do.



nacnudus/unpivotr documentation built on Feb. 6, 2023, 4:55 a.m.