knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This is article serves mostly as internal documentation of how various representations of cell ranges relate to each other in googlesheets4.
library(tidyverse) library(googlesheets4) gs4_deauth()
Several googlesheets4 functions allow the user to specify which cells to read or write. This range specification comes via the sheet
, range
, and (possibly) skip
arguments. Examples of functions with this interface: range_read()
, range_read_cells()
, and range_write()
.
range
is the primary and most powerful argument and takes precedence over
all others. range
can be:cell_limits
object made with helpers from the cellranger packagesheet
can be used to specify a (work)sheet by name, such as "arts", or
by position, such as 3skip
is an optional argument that appears in range_read()
and
range_read_cells()
, to be compatible with functions like read.table()
,
readr::read_delim()
, and readxl::read_excel()
.(You might think n_max
should also be mentioned here, but for technical reasons, in general, n_max
can only be enforced after we've read data. So it's not part of the range ingest problem.)
Loose ends and hangnails:
cell_limits
objects can hold sheet information and I have made no
arrangements for that.range_write()
have a skip argument, just for internal consistency?range_spec
is an internal S3 class that is typically used inside any function that accepts the (sheet, range, skip)
trio.
We need some sort of intermediate storage, in order to translate between the various ways the user can express their range and the requirements of different Sheets API endpoints (which differ more than you'd expect!). We generally require metadata about the associated Sheet in order to form a range_spec
, because we potentially need to lookup the (work)sheet
by position or determine whether a name refers to a named range or a (work)sheet
.
The internal generic as_range_spec()
dispatches on primary argument x
, which maps to range
. Its job is to transform user-supplied (sheet, range, skip)
into:
sheet_name
: (Work)sheet name, can be NULL
.named_range
: Name of range, can be NULL
. But if not NULL
, then this is
the complete specification of the range, i.e. it cannot be combined with any
other information.cell_range
and/or cell_limits
: Ways to specify a rectangle of cells, can
be NULL
. If user specifies skip
, that is re-expressed as cell_limits
. At intake, at most one of cell_range
and cell_limits
is populated. But internally, we might populate one from the other, as we prepare the range to
meet the requirements of a specific API endpoint.shim
: indicates whether the user specified a specific cell rectangle
(shim = TRUE
) or we filled in some specifics pragmatically
(shim = FALSE
), which is necessary to express some partially open
rectangles in A1-notation. This aspect of the user-supplied data needs to
be retained to inform downstream post-processing.sheets_df
and nr_df
are retained in a range_spec
, if they were
provided to as_range_spec()
.)range_spec
should have a field for (work)sheet id.Here's how various user-specified ranges are stored as a range_spec
.
# switched from read_csv because https://github.com/tidyverse/readr/issues/1237 df <- read.csv(text = ' "sheet", "range", "skip" "NULL", "NULL", "0" "Africa", "NULL", "0" "4", "NULL", "0" "NULL", "NULL", "2" "Africa", "NULL", "3" "4", "NULL", "4" "<ignored>", "Asia!A1:B2", "0" "<ignored>", "canada", "0" "<ignored>", "Europe", "3" "NULL", "A1:B2", "0" "Oceania", "A1:B2", "0" "2", "A1:B2", "0" "NULL", "cell_limits(c(2, 3), c(NA, 5))", "0" "Americas", "cell_limits(c(2, 3), c(NA, 5))", "0" ', strip.white = TRUE) escape_brackets <- function(x) { #str_replace_all(x, c("<" = "\\\\<", ">" = "\\\\>")) str_replace_all(x, c("<" = "<", ">" = ">")) } formatize <- function(x) paste(escape_brackets(format(x)), collapse = "<br>")
ss <- gs4_example("gapminder") ss_meta <- gs4_get(ss) df <- df %>% mutate( sheet_orig = escape_brackets(sheet), range_orig = range, sheet = map(sheet, parse_guess), range = map(range, parse_guess) ) df$sheet[df$sheet == "NULL"] <- list(NULL) df$range[df$range == "NULL"] <- list(NULL) df$range <- modify_at( df$range, str_which(df$range_orig, "^cell_limits"), ~ eval(parse(text = .x)) ) df <- df %>% rename(x = range) %>% # make the generic happy mutate(range_spec = pmap(., googlesheets4:::as_range_spec, sheets_df = ss_meta$sheets, nr_df = ss_meta$named_ranges )) %>% rename(range = x)
df %>% select(-sheet, -range) %>% rename(range = range_orig, sheet = sheet_orig) %>% mutate(range_spec = map_chr(range_spec, formatize)) %>% select(sheet, range, skip, everything()) %>% knitr::kable()
When a range_spec
is first formed, it may hold specifics for a cell rectangle, although it does not have to. This info appears in one of these fields:
cell_range
: an A1-style spreadsheet range,
with NO sheet informationcell_limits
object, in the cellranger sensecell_limits(ul = c(2, 3), lr = c(NA, 5)))
A cell_limits
object is, overall, a much more sane way to hold this information, because explicit NA
s can be used to convey unboundedness. Some types of unboundedness simply can't be conveyed in A1 notation, even though those partially open rectangles are as legitimate as those that can be recorded in A1 notation.
In any case, between user expectations and API idiosyncrasies, we have to be able to translate between cell_range
and cell_limits
.
cell_limits
The reading functions range_read()
and range_read_cells()
hit the sheets.spreadsheets.get
endpoint. Bizarrely, this endpoint requires the range to be specified in A1-notation. If the user specifies the cell rectangle in A1-notation, things are easy and we essentially just pass that along. But users can describe certain partially open cell rectangles via cell_limits
that can't be literally expressed in A1-notation. The table below shows how all possible combinations of row and cell limits are translated into an A1-style range, using technical limits on the number of rows and columns in a Google Sheet.
f <- function(start_col, start_row, end_col, end_row) { cellranger::cell_limits( ul = c(start_row, start_col), lr = c(end_row, end_col) ) } g <- function(col) { ifelse(is.na(col), "?", cellranger::num_to_letter(col)) } h <- function(row) ifelse(is.na(row), "?", row) foo <- function(start_col2, start_row2, end_col2, end_row2, ...) { paste0(start_col2, start_row2, ":", end_col2, end_row2) } df <- crossing( start_col = c(NA, 2), start_row = c(NA, 2), end_col = c(NA, 4), end_row = c(NA, 4) ) df <- df %>% mutate(cell_limits = pmap(., f)) %>% mutate(start_col2 = g(start_col), start_row2 = h(start_row)) %>% mutate(end_col2 = g(end_col), end_row2 = h(end_row)) %>% mutate(naive_range = pmap(., foo)) %>% mutate(range = map(cell_limits, googlesheets4:::as_sheets_range), range = map_chr(range, 1, .default = "<NULL>"))
knitr::kable( select( df, start_col, start_row, end_col, end_row, naive_range, range) )
cell_limits
from A1 rangeWhen editing a sheet, we send an UpdateCellsRequest
, which takes the location of the write via a union field. We must send one of the following:
start
: An instance of GridCoordinate
which identifies one cell. This is
where the write starts and the cells hit are determined by the structure of
the accompanying RowData
. Think of start
as specifying the upper left
corner of a target rectangle. Fields:sheetId
: ID of the (work)sheet. The only required field.rowIndex
: Zero-based row index. Unspecified means 0.columnIndex
: Zero-based column index. Unspecified means 0.range
: An instance of GridRange
which identifies a rectangle of cells.
Fields:
sheetId
: ID of the (work)sheet. The only required field.startRowIndex
: Zero-based inclusive start row. Unspecified means 0.endRowIndex
: Zero-based exclusive end row. Unspecified means unbounded.startColumnIndex
: Zero-based inclusive start column. Unspecified means
0.endColumnIndex
: Zero-based exclusive end column. Unspecified means
unbounded.All the start/end row/column indices are optional. If omitted, it means the rectangle is unbounded on that side. Unbounded on the left or top is boring, because that just amounts to starting at the first row or column. But unbounded on the right or bottom is a more interesting scenario because of this:
If the RowData
doesn't cover the entire range
rectangle, the targetted
cells that don't receive new data are cleared, according to the accompanying
field mask in fields
.
range_write()
is the only writing function in googlesheets4 that lets the user target a specific range. This function is why we need the ability to convert A1-notation to cell_limits
and to decide whether user's range should be sent via start
or range
. gs4_create()
, sheet_write()
, and sheet_append()
all offer a higher-level API, focused on the holistic management of (work)sheets that hold a single data table.
Here's a table that translates the A1-style ranges created above into cell_limits
. Some scenarios are covered more than once in this input, because above we were tackling a different (harder) problem. But that's harmless and these inputs are good, in terms of covering all relevant scenarios. Note I dropped the row corresponding to "no input".
# apparently I didn't know about format methods when I wrote cellranger :( formaticate <- function(x) escape_brackets(capture.output(print(x)))
df %>% select(range) %>% filter(!str_detect(range, "NULL")) %>% mutate(cell_limits = map(range, googlesheets4:::limits_from_range)) %>% mutate(cell_limits = map_chr(cell_limits, formaticate)) %>% knitr::kable()
How to decide whether to send user's range via start
or range
? Possible scenarios for (sheet, range)
(remember: so far, range_write()
doesn't offer skip
):
sheet
not given: populate with first (visible) sheet.range
: Send sheet ID as GridCoordinate
via start
. We will
overwrite data (and usually formats) in cells covered by the accompanying
data
. But we don't touch the rest of the sheet. Why not send as range
?
That feels too aggressive because it clears the data (and usually formats)
of any cells in the sheet that are not covered by the data
. But if user wants that, there's always range_write()
.range
as A1 notation: Re-express it as cell_limits
.range
as cell_limits
breaks down into a couple scenarios:cell_limits
contains at least 1 NA
: send as GridRange
via range
.
Analysis of the "all NA
" edge case: we send just the sheet ID as a
GridRange
, so we will clear the targeted fields for the whole sheet.
This seems OK here, because no one can do this by accident, i.e. you
have to go out of your way to make an explicit cell_limits
object
containing only NA
s. Why would you do this instead of range_write()
?
Maybe you need to suppress the inclusion of col_names
or you don't
like the table formatting applied by range_write()
.cell_limits
has no NA
s: If rectangle includes more than 1 cell, send
as GridRange
via range
. If rectangle is exactly 1 cell, send as
GridCoordinate
via start
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.