knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This is article serves mostly as internal documentation of how various representations of cell ranges relate to each other in googlesheets4.

library(tidyverse)
library(googlesheets4)

gs4_deauth()

User-specified range

Several googlesheets4 functions allow the user to specify which cells to read or write. This range specification comes via the sheet, range, and (possibly) skip arguments. Examples of functions with this interface: range_read(), range_read_cells(), and range_write().

(You might think n_max should also be mentioned here, but for technical reasons, in general, n_max can only be enforced after we've read data. So it's not part of the range ingest problem.)

Loose ends and hangnails:

Range spec

range_spec is an internal S3 class that is typically used inside any function that accepts the (sheet, range, skip) trio.

We need some sort of intermediate storage, in order to translate between the various ways the user can express their range and the requirements of different Sheets API endpoints (which differ more than you'd expect!). We generally require metadata about the associated Sheet in order to form a range_spec, because we potentially need to lookup the (work)sheet by position or determine whether a name refers to a named range or a (work)sheet.

The internal generic as_range_spec() dispatches on primary argument x, which maps to range. Its job is to transform user-supplied (sheet, range, skip) into:

Here's how various user-specified ranges are stored as a range_spec.

# switched from read_csv because https://github.com/tidyverse/readr/issues/1237
df <- read.csv(text = '
    "sheet",                          "range", "skip"
     "NULL",                           "NULL",    "0"
   "Africa",                           "NULL",    "0"
        "4",                           "NULL",    "0"
     "NULL",                           "NULL",    "2"
   "Africa",                           "NULL",    "3"
        "4",                           "NULL",    "4"
"<ignored>",                     "Asia!A1:B2",    "0"
"<ignored>",                         "canada",    "0"
"<ignored>",                         "Europe",    "3"
     "NULL",                          "A1:B2",    "0"
  "Oceania",                          "A1:B2",    "0"
        "2",                          "A1:B2",    "0"
     "NULL", "cell_limits(c(2, 3), c(NA, 5))",   "0"
 "Americas", "cell_limits(c(2, 3), c(NA, 5))",   "0"
', strip.white = TRUE)

escape_brackets <- function(x) {
  #str_replace_all(x, c("<" = "\\\\<", ">" = "\\\\>"))
  str_replace_all(x, c("<" = "&lt;", ">" = "&gt;"))
}

formatize <- function(x) paste(escape_brackets(format(x)), collapse = "<br>")
ss <- gs4_example("gapminder")
ss_meta <- gs4_get(ss)

df <- df %>% 
  mutate(
    sheet_orig = escape_brackets(sheet),
    range_orig = range,
    sheet = map(sheet, parse_guess),
    range = map(range, parse_guess)
  )
df$sheet[df$sheet == "NULL"] <- list(NULL)
df$range[df$range == "NULL"] <- list(NULL)
df$range <- modify_at(
  df$range,
  str_which(df$range_orig, "^cell_limits"),
  ~ eval(parse(text = .x))
)

df <- df %>%
  rename(x = range) %>% # make the generic happy
  mutate(range_spec = pmap(.,
    googlesheets4:::as_range_spec,
    sheets_df = ss_meta$sheets,
    nr_df     = ss_meta$named_ranges
  )) %>% 
  rename(range = x)
df %>% 
  select(-sheet, -range) %>% 
  rename(range = range_orig, sheet = sheet_orig) %>% 
  mutate(range_spec = map_chr(range_spec, formatize)) %>% 
  select(sheet, range, skip, everything()) %>% 
  knitr::kable()

Cell rectangles

When a range_spec is first formed, it may hold specifics for a cell rectangle, although it does not have to. This info appears in one of these fields:

A cell_limits object is, overall, a much more sane way to hold this information, because explicit NAs can be used to convey unboundedness. Some types of unboundedness simply can't be conveyed in A1 notation, even though those partially open rectangles are as legitimate as those that can be recorded in A1 notation.

In any case, between user expectations and API idiosyncrasies, we have to be able to translate between cell_range and cell_limits.

A1 range from cell_limits

The reading functions range_read() and range_read_cells() hit the sheets.spreadsheets.get endpoint. Bizarrely, this endpoint requires the range to be specified in A1-notation. If the user specifies the cell rectangle in A1-notation, things are easy and we essentially just pass that along. But users can describe certain partially open cell rectangles via cell_limits that can't be literally expressed in A1-notation. The table below shows how all possible combinations of row and cell limits are translated into an A1-style range, using technical limits on the number of rows and columns in a Google Sheet.

f <- function(start_col, start_row, end_col, end_row) {
  cellranger::cell_limits(
    ul = c(start_row, start_col), lr = c(end_row, end_col)
  )
}

g <- function(col) {
  ifelse(is.na(col), "?", cellranger::num_to_letter(col))
}

h <- function(row) ifelse(is.na(row), "?", row)

foo <- function(start_col2, start_row2, end_col2, end_row2, ...) {
  paste0(start_col2, start_row2, ":", end_col2, end_row2)
}

df <- crossing(
  start_col = c(NA, 2), start_row = c(NA, 2),
  end_col = c(NA, 4), end_row = c(NA, 4)
)

df <- df %>% 
  mutate(cell_limits = pmap(., f)) %>% 
  mutate(start_col2 = g(start_col), start_row2 = h(start_row)) %>% 
  mutate(end_col2 = g(end_col), end_row2 = h(end_row)) %>% 
  mutate(naive_range = pmap(., foo)) %>%
  mutate(range = map(cell_limits, googlesheets4:::as_sheets_range),
         range = map_chr(range, 1, .default = "&lt;NULL&gt;"))
knitr::kable(
  select(
    df,
    start_col, start_row, end_col, end_row,
    naive_range, range)
)

cell_limits from A1 range

When editing a sheet, we send an UpdateCellsRequest, which takes the location of the write via a union field. We must send one of the following:

range_write() is the only writing function in googlesheets4 that lets the user target a specific range. This function is why we need the ability to convert A1-notation to cell_limits and to decide whether user's range should be sent via start or range. gs4_create(), sheet_write(), and sheet_append() all offer a higher-level API, focused on the holistic management of (work)sheets that hold a single data table.

Here's a table that translates the A1-style ranges created above into cell_limits. Some scenarios are covered more than once in this input, because above we were tackling a different (harder) problem. But that's harmless and these inputs are good, in terms of covering all relevant scenarios. Note I dropped the row corresponding to "no input".

# apparently I didn't know about format methods when I wrote cellranger :(
formaticate <- function(x) escape_brackets(capture.output(print(x)))
df %>%
  select(range) %>% 
  filter(!str_detect(range, "NULL")) %>% 
  mutate(cell_limits = map(range, googlesheets4:::limits_from_range)) %>% 
  mutate(cell_limits = map_chr(cell_limits, formaticate)) %>% 
  knitr::kable()

How to decide whether to send user's range via start or range? Possible scenarios for (sheet, range) (remember: so far, range_write() doesn't offer skip):



tidyverse/googlesheets4 documentation built on Jan. 4, 2024, 10:20 a.m.