NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", purl = NOT_CRAN, eval = NOT_CRAN )
Load googlesheets
and dplyr
, from which we use the %>%
pipe operator and which gives us nicer printing of data frames (tbl_df
s)
library(googlesheets) suppressMessages(library(dplyr))
To see how your data comes in as a data frame without numeric formatting, try this:
gs_read(..., literal = FALSE)
The googlesheets
package comes with functions to access a public Sheet with formulas and formatted numbers. Visit it in the browser or check out this screenshot.
We use it to demo the effect of literal
in gs_read()
. First we accept the default, which is literal = TRUE
.
gs_ff() %>% gs_read(range = cell_cols("B:C"))
See the problem? Numeric formatting causes the first column to come in as character.
Try again with literal = FALSE
:
gs_ff() %>% gs_read(literal = FALSE, range = cell_cols("B:C"))
Fixed it! First column is numeric. And we've also gained precision in the second column, previously lost to rounding.
If you want full access to cell contents, use gs_read_cellfeed(..., literal = FALSE)
to get a data frame with one per cell. Then take your pick from value
, input_value
, and numeric_value
. Here's an example with lots of formulas:
gs_ff() %>% gs_read_cellfeed(range = cell_cols("E")) %>% select(-cell_alt, -row, -col) %>% knitr::kable()
Read on if you want to know more.
When working with Google Sheets via the cell feed, there are three ways to define cell contents:
googlesheets
returns by default, because it's what the API returns by default.=HYPERLINK()
, an image specified via =IMAGE()
.inputValue
attribute of a cell entry always contains the value that a user would otherwise type into the Google Sheets user interface to manipulate the cell (i.e. either a literal value or a formula)."0.12345%
to simultaneously enter the numeric value 0.12345 and format it as a percentage. Therefore, the numeric value 0.12345 will have input value 0.12345%
if formatted as a percentage and 0.12345 otherwise. Why, Google, why?numericValue
attribute of a cell entry, when present, indicates that the cell was determined to have a numeric value, and its numeric value is indicated with this attributed [sic]."numericValue
attribute doesn't even exist in the underlying XML and it will be an NA
in any object googlesheets
creates from reading the Sheet.Click on the Format menu in Google Sheets and you'll gain access to a "Number" sub-menu and ... lots of other stuff. Let's agree that "formatting" can mean two different things:
Decorative formatting is completely invisible to the Sheets API. It is also a terrible idea to encode data in decorative formatting, though it can be used to visually reinforce information that is properly stored in data (Google Sheets is capable of conditional formatting). Nothing in googlesheets
or the rest of this vignette addresses decorative formatting. We shall not speak of it again. From now on, "formatting" means numeric formatting.
We've created a formula and formatting ~~nightmare~~ sampler Sheet. Go visit it in the browser!. Or check out this screenshot.
It's one of the built-in example sheets. Access it with various functions that start with gs_ff
.
Here's how it comes in as a data frame by default: you get "literal values" (suppressing a boring column in order to show the interesting ones).
gs_ff() %>% gs_read() %>% select(-integer)
What if you want unformatted numbers? What if you want the actual formulas? You can now get them the cell feed, which, in googlesheets
, means you must use gs_read_cellfeed()
. You can cause gs_read()
to consult the cell feed by specifying literal = FALSE
.
Default methods of reading Sheet data assume that the data occupies a neat rectangle in the upper left corner, that you want all of it, and that you want the literal values.
What if you need more control over which cells? What if you want input or numeric values? Use the cell feed via gs_read_cellfeed()
. Under the hood, gs_read()
will use the cell feed whenever a cell range is provided, i.e. when the call is like gs_read(..., range = "B4:D9")
or gs_read(..., range = cell_cols(4:6))
, or when the new argument literal = FALSE
.
gs_read_cellfeed()
has been extended. As before, we return a data frame with one row per cell, but now we return all 3 notions of cell contents:
value
: The variable previously known as cell_text
. Described as "literal value", what you see in the browser, and what is returned by all other methods of reading.input_value
: What you would have typed into the cell (if you are a total spreadsheet nerd, when it comes to percentages).numeric_value
: The actual number, if such exists.cf <- gs_read_cellfeed(gs_ff())
cf_printme <- cf %>% arrange(col, row) %>% select(cell, value, input_value, numeric_value) ## work with purrr v0.2.2 but avoid deprecation warning with v0.2.2.1 ## modify_if() is ideal but not in v0.2.2 cf_printme[] <- cf_printme %>% purrr::map(~ if(purrr::is_character(.x)) { gsub('$', '\\$', .x, fixed = TRUE) } else .x) knitr::kable(cf_printme)
We explore the different cell contents for different variables. This motivates the logic behind what happens when gs_read(..., literal = FALSE)
and gs_simply_cellfeed(..., literal = FALSE)
.
Column 2, number_formatted
, holds variously formatted numbers. It is quite pathological, because in real life numeric formatting is likely to be uniform within a column, which helps readr
make good decisions about type conversion.
value
(what you get by default) imports as character. Not good.input_value
is attractive for the first number, because an integer looks like an integer, which is ultimately good for type conversion. But this variable still imports as character, because of the percent sign.numeric_value
is usually what you want for numbers.cf %>% filter(row > 1, col == 2) %>% select(value, input_value, numeric_value) %>% readr::type_convert()
Column 3, number_rounded
, holds numbers with four decimal places, rounded to show just two. Here we want numeric_value
.
cf %>% filter(row > 1, col == 3) %>% select(value, input_value, numeric_value) %>% readr::type_convert()
Column 5, formula
, holds various formulas, not necessarily numeric. Note we had to truncate input_value
for printing purposes.
value
is what you want ... except for the formula which evaluates to numeric and is formatted.input_value
holds the actual formulas.numeric_value
is what you want for the single formula that is numeric.cf %>% filter(row > 1, col == 5) %>% select(value, input_value, numeric_value) %>% mutate(input_value = substr(input_value, 1, 43)) %>% readr::type_convert()
Column 6, formula_formatted
, holds formatted numeric formulas:
value
(default) will come in as character.input_value
holds the actual formulas.numeric_value
(what you usualy want, when it exists) holds the calcuated numbers.cf %>% filter(row > 1, col == 6) %>% select(value, input_value, numeric_value) %>% readr::type_convert()
literal = FALSE
Based on the above examples (and more), here's the current logic for which cell contents are used in gs_read(..., literal = FALSE)
and gs_reshape_cellfeed(..., literal = FALSE)
. The goal is to create an input that gives the desired result most often with default behavior of readr::type_convert()
. If you think this is wrong, please discuss in an issue.
numeric_value
exist?numeric_value
?numeric_value
does not exist, use value
(business as usual)input_value
numeric_value
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.